Hi, for my master thesis in computer science I succeed in implementing 4-profiles calculus (https://arxiv.org/abs/1510.02215 - http://eelenberg.github.io/Elenberg4profileWWW16.pdf) using giraph-1.3.0-snapshot (compiled with -Phadoop_yarn profile) and hadoop-2.8.4.
I configured a cluster on amazon ec2 composed of 1 namenode and 5 datanodes using t2.2xlarge (32GB, 8CPU) instances and I obtained results described in attached file (available also here https://we.tl/t-7DuNJSSuN3) with input graphs of small/medium dimensions. If I try to give in input to my giraph program bigger input graphs (e.g. like http://snap.stanford.edu/data/web-NotreDame.html) in some cases I obtain many errors related to netty and the yarn application FAILS, in other cases the yarn application remains in a RUNNING UNDEFINED state (then I killed it instead of waiting the default timeout) without apparently no error. I also tried to use m5.4xlarge (64GB, 16CPU) but I obtained same problems. I reported log errors of first case here: - log of errors by giraph worker on datanodes pasted here: https://pastebin.com/CGHUd0za (same errors in all datanodes) - log of errors by giraph master pasted here: https://pastebin.com/JXYN6y4L I'm quite sure that errors are not related to insufficient memory of EC2 instances because in the log I always saw messages like "(free/total/max) = 23038.28M / 27232.00M / 27232.00M". *Please help me because my master thesis is blocked with this problem :-(* This is an example of command that I used to run giraph, could you please check if parameters that I used are correct? Any other tuning will be appreciated! giraph 4Profiles-0.0.1-SNAPSHOT.jar it.uniroma1.di.fourprofiles.worker.superstep0.gas1.Worker_Superstep0_GAS1 -ca giraph.numComputeThreads=8 // Since t2.2xlarge has 8 CORES, is it correct to set these parameters to 8? -ca giraph.numInputThreads=8 -ca giraph.numOutputThreads=8 -w 8 // I set 8 workers since: // - 5 datanodes on EC2" // - every datanode configured for max 2 containers in order to reduce messages between datanodes // - 2 containers are reserved for application master and giraph master // - (5 datanodes * 2 max containers) - 2 reserved = 8 workers // Is it correct as reasoning? -yh 15360 // I set 15360 since it corresponds to // - yarn.scheduler.minimum-allocation-mb property in yarn-site.xml // - mapreduce.map.memory.mb property in mapred-site.xml // Is it correct as reasoning? -ca giraph.pure.yarn.job=true -mc it.uniroma1.di.fourprofiles.master.Master_FourProfiles -ca io.edge.reverse.duplicator=true -eif it.uniroma1.di.fourprofiles.io.format.IntEdgeData_TextEdgeInputFormat_ReverseEdgeDuplicator -eip INPUT_GRAPHS/HU_edges.txt-processed -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op output -ca giraph.SplitMasterWorker=true -ca giraph.messageCombinerClass=it.uniroma1.di.fourprofiles.worker.msgcombiner.Worker_MsgCombiner -ca giraph.master.observers=it.uniroma1.di.fourprofiles.master.observer.Observer_FourProfiles -ca giraph.metrics.enable=true -ca giraph.useInputSplitLocality=false -ca giraph.useBigDataIOForMessages=true -ca giraph.useMessageSizeEncoding=true -ca giraph.oneToAllMsgSending=true -ca giraph.isStaticGraph=true Furthermore I tried to use following netty parameters but I didn't resolve the problems. Could you please help me if I miss some important parameter or maybe I used it in a wrong way? I tried to generalize the value passed to netty parameters with a trivial formula nettyFactor*defaultValue where nettyFactor can be 1, 2, 3, ... (passed by shell parameter) -ca giraph.nettyAutoRead=true -ca giraph.channelsPerServer=$((nettyFactor*1)) -ca giraph.nettyClientThreads=$((nettyFactor*4)) -ca giraph.nettyClientExecutionThreads=$((nettyFactor*8)) -ca giraph.nettyServerThreads=$((nettyFactor*16)) -ca giraph.nettyServerExecutionThreads=$((nettyFactor*8)) -ca giraph.clientSendBufferSize=$((nettyFactor*524288)) -ca giraph.clientReceiveBufferSize=$((nettyFactor*32768)) -ca giraph.serverSendBufferSize=$((nettyFactor*32768)) -ca giraph.serverReceiveBufferSize=$((nettyFactor*524288)) -ca giraph.vertexRequestSize=$((nettyFactor*524288)) -ca giraph.edgeRequestSize=$((nettyFactor*524288)) -ca giraph.msgRequestSize=$((nettyFactor*524288)) -ca giraph.nettyRequestEncoderBufferSize=$((nettyFactor*32768)) ... I have other questions: 1) This is my hadoop configuration https://we.tl/t-t1ItNYFe7H Please check it but I'm quite sure that is correct. I have only a question about it: since giraph does not use "reduce", is it correct to assing 0 MB to mapreduce.reduce.memory.mb in mapred-site.xml? 2) In order to avoid ClassNotFoundException error I copied the jar of my giraph application and all giraph jars from $GIRAPH_HOME and $GIRAPH_HOME/lib to $HADOOP_HOME/share/hadoop/yarn/lib. Is there a better solution? 3) Last but not least: Here https://we.tl/t-tdhuZFsVJW you can find the completed hadoop/yarn log of my giraph program with following graph http://snap.stanford.edu/data/web-NotreDame.html as input. In this case the yarn application reamins in RUNNING UNDEFINED state. Thanks -- Cristina Bovi
4-PROFILES_OUTPUT.docx
Description: MS-Word 2007 document