I guess from your stacktrace that you didn't start the zookeeper cluster. Cheers Gustavo
On Sunday, November 24, 2013, Young Han <[email protected]> wrote: > Hi, > > We are attempting to get Giraph running on EC2, using Hadoop 1.0.4. We are using page rank with the following command: > > hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.0.2-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimplePageRankVertex -c org.apache.giraph.combiner.DoubleSumCombiner -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/ubuntu/giraph-input/tiny_graph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/ubuntu/giraph-output/pagerank -w 1 > > > The input graph is the sample graph provided on the website: > > [0,0,[[1,1],[3,3]]] > [1,0,[[0,1],[2,2],[3,1]]] > [2,0,[[1,2],[4,4]]] > [3,0,[[0,3],[1,1],[4,4]]] > [4,0,[[3,4],[2,4]]] > > > We've tried small, medium, and xlarge instances; 4 instances and 3 instances; and various number of workers (-w 1, -w 2, -w 5, -w 10, etc.). Hadoop has xmx (max Java heap size) set to 1024m. > > The pattern is that the *first* map task will always fail. The error appears in the Hadoop's jobtracker log: > > 2013-11-24 03:07:43,414 INFO org.apache.hadoop.mapred.JobInProgress: job_201311240306_0001: nMaps=2 nReduces=0 max=-1 > 2013-11-24 03:07:43,417 INFO org.apache.hadoop.mapred.JobTracker: Job job_201311240306_0001 added successfully for user > 'ubuntu' to queue 'default' > 2013-11-24 03:07:43,418 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201311240306_0001 > 2013-11-24 03:07:43,419 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201311240306_0001 > 2013-11-24 03:07:43,422 INFO org.apache.hadoop.mapred.AuditLogger: USER=ubuntu IP=172.31.14.182 OPERATION=SUBMIT > _JOB TARGET=job_201311240306_0001 RESULT=SUCCESS > 2013-11-24 03:07:43,828 INFO org.apache.hadoop.mapred.JobInProgress: jobToken generated and stored with users keys in /h > ome/ubuntu/hadoop_data/hadoop_tmp-ubuntu/mapred/system/job_201311240306_0001/jobToken > 2013-11-24 03:07:43,846 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201311240306_0001 = 0. Number of splits = 2 > 2013-11-24 03:07:43,846 INFO org.apache.hadoop.mapred.JobInProgress: job_201311240306_0001 LOCALITY_WAIT_FACTOR=0.0 > 2013-11-24 03:07:43,847 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201311240306_0001 initialized successfully with 2 map tasks and 0 reduce tasks. > 2013-11-24 03:07:45,152 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_SETUP) 'attempt_201311240306_0001_m_000003_0' to tip task_201311240306_0001_m_000003, for tracker 'tracker_cloud3:localhost/ 127.0.0.1:47021' > 2013-11-24 03:07:54,222 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201311240306_0001_m_000003_0' has completed task_201311240306_0001_m_000003 successfully. > 2013-11-24 03:07:54,228 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a non-local task task_201311240306_0001_m_000000 > 2013-11-24 03:07:54,229 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201311240306_0001_m_000000_0' to tip task_201311240306_0001_m_000000, for tracker 'tracker_cloud3:localhost/ 127.0.0.1:47021' > 2013-11-24 03:07:54,361 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a non-local task task_201311240306_0001_m_000001 > 2013-11-24 03:07:54,362 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201311240306_0001_m_000001_0' to tip task_201311240306_0001_m_000001, for tracker 'tracker_cloud2:localhost/ 127.0.0.1:55161' > 2013-11-24 03:08:03,243 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201311240306_0001_m_000000_0: java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > Caused by: java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > > Thereafter, all other workers will fail with: > > 2013-11-24 03:08:42,471 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201311240306_0001_m_000001_0: java.lang.IllegalStateException: run: Caught an unrecoverable exception exists: Failed to check /_hadoopBsp/job_201311240306_0001/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions after 3 tries! > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.IllegalStateException: exists: Failed to check /_hadoopBsp/job_201311240306_0001/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions after 3 tries! > at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369) > at org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:689) > at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:488) > at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230) > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92) > ... 7 more > > > Any suggestions about why this might be happening? > > Thanks, > Young >
