I just reported a bug about that the other day - GIRAPHÂ797 for which someone has proposed a patch and I believe has/will be committed soon so should avoid this issue in future
Rob From: Young Han <[email protected]> Reply-To: <[email protected]> Date: Sunday, 24 November 2013 19:19 To: <[email protected]>, <[email protected]> Subject: Re: Giraph EC2 Map task fails > Actually, it turned out to be a dumber error than that... The name of the > input file was wrong, so it was using an empty/non-existent graph. > > We'll keep the zookeeper bit in mind if we run into further problems. > > Thanks, > Young > > > On Sun, Nov 24, 2013 at 2:06 PM, Gustavo Enrique Salazar Torres > <[email protected]> wrote: >> I guess from your stacktrace that you didn't start the zookeeper cluster. >> >> Cheers >> Gustavo >> >> >> On Sunday, November 24, 2013, Young Han <[email protected]> wrote: >>> > Hi, >>> > >>> > We are attempting to get Giraph running on EC2, using Hadoop 1.0.4. We are >>> using page rank with the following command: >>> > >>> > hadoop jar >>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.0.2-j >>> ar-with-dependencies.jar org.apache.giraph.GiraphRunner >>> org.apache.giraph.examples.SimplePageRankVertex -c >>> org.apache.giraph.combiner.DoubleSumCombiner -vif >>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip >>> /user/ubuntu/giraph-input/tiny_graph.txt -of >>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op >>> /user/ubuntu/giraph-output/pagerank -w 1 >>> > >>> > >>> > The input graph is the sample graph provided on the website: >>> > >>> > [0,0,[[1,1],[3,3]]] >>> > [1,0,[[0,1],[2,2],[3,1]]] >>> > [2,0,[[1,2],[4,4]]] >>> > [3,0,[[0,3],[1,1],[4,4]]] >>> > [4,0,[[3,4],[2,4]]] >>> > >>> > >>> > We've tried small, medium, and xlarge instances; 4 instances and 3 >>> instances; and various number of workers (-w 1, -w 2, -w 5, -w 10, etc.). >>> Hadoop has xmx (max Java heap size) set to 1024m. >>> > >>> > The pattern is that the *first* map task will always fail. The error >>> appears in the Hadoop's jobtracker log: >>> > >>> > 2013-11-24 03:07:43,414 INFO org.apache.hadoop.mapred.JobInProgress: >>> job_201311240306_0001: nMaps=2 nReduces=0 max=-1 >>> > 2013-11-24 03:07:43,417 INFO org.apache.hadoop.mapred.JobTracker: Job >>> job_201311240306_0001 added successfully for user >>> > 'ubuntu' to queue 'default' >>> > 2013-11-24 03:07:43,418 INFO org.apache.hadoop.mapred.JobTracker: >>> Initializing job_201311240306_0001 >>> > 2013-11-24 03:07:43,419 INFO org.apache.hadoop.mapred.JobInProgress: >>> Initializing job_201311240306_0001 >>> > 2013-11-24 03:07:43,422 INFO org.apache.hadoop.mapred.AuditLogger: >>> USER=ubuntu IP=172.31.14.182 OPERATION=SUBMIT >>> > _JOB TARGET=job_201311240306_0001 RESULT=SUCCESS >>> > 2013-11-24 03:07:43,828 INFO org.apache.hadoop.mapred.JobInProgress: >>> jobToken generated and stored with users keys in /h >>> > >>> ome/ubuntu/hadoop_data/hadoop_tmp-ubuntu/mapred/system/job_201311240306_0001 >>> /jobToken >>> > 2013-11-24 03:07:43,846 INFO org.apache.hadoop.mapred.JobInProgress: Input >>> size for job job_201311240306_0001 = 0. Number of splits = 2 >>> > 2013-11-24 03:07:43,846 INFO org.apache.hadoop.mapred.JobInProgress: >>> job_201311240306_0001 LOCALITY_WAIT_FACTOR=0.0 >>> > 2013-11-24 03:07:43,847 INFO org.apache.hadoop.mapred.JobInProgress: Job >>> job_201311240306_0001 initialized successfully with 2 map tasks and 0 reduce >>> tasks. >>> > 2013-11-24 03:07:45,152 INFO org.apache.hadoop.mapred.JobTracker: Adding >>> task (JOB_SETUP) 'attempt_201311240306_0001_m_000003_0' to tip >>> task_201311240306_0001_m_000003, for tracker >>> 'tracker_cloud3:localhost/127.0.0.1:47021 <http://127.0.0.1:47021> ' >>> > 2013-11-24 03:07:54,222 INFO org.apache.hadoop.mapred.JobInProgress: Task >>> 'attempt_201311240306_0001_m_000003_0' has completed >>> task_201311240306_0001_m_000003 successfully. >>> > 2013-11-24 03:07:54,228 INFO org.apache.hadoop.mapred.JobInProgress: >>> Choosing a non-local task task_201311240306_0001_m_000000 >>> > 2013-11-24 03:07:54,229 INFO org.apache.hadoop.mapred.JobTracker: Adding >>> task (MAP) 'attempt_201311240306_0001_m_000000_0' to tip >>> task_201311240306_0001_m_000000, for tracker >>> 'tracker_cloud3:localhost/127.0.0.1:47021 <http://127.0.0.1:47021> ' >>> > 2013-11-24 03:07:54,361 INFO org.apache.hadoop.mapred.JobInProgress: >>> Choosing a non-local task task_201311240306_0001_m_000001 >>> > 2013-11-24 03:07:54,362 INFO org.apache.hadoop.mapred.JobTracker: Adding >>> task (MAP) 'attempt_201311240306_0001_m_000001_0' to tip >>> task_201311240306_0001_m_000001, for tracker >>> 'tracker_cloud2:localhost/127.0.0.1:55161 <http://127.0.0.1:55161> ' >>> > 2013-11-24 03:08:03,243 INFO org.apache.hadoop.mapred.TaskInProgress: >>> Error from attempt_201311240306_0001_m_000000_0: java.lang.Throwable: Child >>> Error >>> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) >>> > Caused by: java.io.IOException: Task process exit with nonzero status of >>> 1. >>> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) >>> > >>> > >>> > Thereafter, all other workers will fail with: >>> > >>> > 2013-11-24 03:08:42,471 INFO org.apache.hadoop.mapred.TaskInProgress: >>> Error from attempt_201311240306_0001_m_000001_0: >>> java.lang.IllegalStateException: run: Caught an unrecoverable exception >>> exists: Failed to check >>> /_hadoopBsp/job_201311240306_0001/_applicationAttemptsDir/0/_superstepDir/-1 >>> /_addressesAndPartitions after 3 tries! >>> > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102) >>> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) >>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >>> > at java.security.AccessController.doPrivileged(Native Method) >>> > at javax.security.auth.Subject.doAs(Subject.java:396) >>> > at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja >>> va:1121) >>> > at org.apache.hadoop.mapred.Child.main(Child.java:249) >>> > Caused by: java.lang.IllegalStateException: exists: Failed to check >>> /_hadoopBsp/job_201311240306_0001/_applicationAttemptsDir/0/_superstepDir/-1 >>> /_addressesAndPartitions after 3 tries! >>> > at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369) >>> > at >>> org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.ja >>> va:689) >>> > at >>> org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:488) >>> > at >>> org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230) >>> > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92) >>> > ... 7 more >>> > >>> > >>> > Any suggestions about why this might be happening? >>> > >>> > Thanks, >>> > Young >>> > >
