ok, please ignore this last email, i found the right log on the other tasks... :)
On Tue, Jan 10, 2012 at 8:10 PM, Claudio Martella <claudio.marte...@gmail.com> wrote: > Hello, > > I'm having some issues with debugging of GIRAPH-45. Code passes local > tests but currently fails > > testBspCheckpoint(org.apache.giraph.TestManualCheckpoint) > testPartitioners(org.apache.giraph.TestGraphPartitioner) > > The first one is particularly tricky as the autocheckpointing is > passed and because this is the only error i get from stderr: > > <testcase time="111.559" > classname="org.apache.giraph.TestManualCheckpoint" > name="testBspCheckpoint"> > <failure > type="junit.framework.AssertionFailedError">junit.framework.AssertionFailedError > at junit.framework.Assert.fail(Assert.java:47) > at junit.framework.Assert.assertTrue(Assert.java:20) > at junit.framework.Assert.assertTrue(Assert.java:27) > at > org.apache.giraph.TestManualCheckpoint.testBspCheckpoint(TestManualCheckpoint.java:108) > </failure> > <system-out>Setting tasks to 3 for testBspCheckpoint since > JobTracker exists... > setup: Sending job to job tracker localhost:9001 with jar path > target/giraph-0.70-jar-with-dependencies.jar for testBspCheckpoint > testBspCheckpoint: Restarting from superstep 2 with checkpoint path = > /tmp/testBspCheckpoints > setup: Sending job to job tracker localhost:9001 with jar path > target/giraph-0.70-jar-with-dependencies.jar for testBspCheckpoint > </system-out> > <system-err>java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > Caused by: java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > attempt_201201092336_0002_m_000000_0: 2012-01-09 23:38:27.816 > java[12460:1903] Unable to load realm info from SCDynamicStore > </system-err> > </testcase> > > So i checked in hadoop logs, and that's what i found for the failed task: > > 2012-01-10 20:01:45,760 INFO org.apache.giraph.graph.BspServiceMaster: > barrierOnWorkerList: 0 out of 3 workers finished on superstep 2 on > path > /_hadoopBsp/job_201201101959_0002/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir > 2012-01-10 20:01:45,925 ERROR > org.apache.giraph.graph.BspServiceMaster: superstepChosenWorkerAlive: > Missing chosen worker Worker(hostname=tyler.local, MRpartition=2, > port=30002) on superstep 2 > 2012-01-10 20:01:45,925 INFO org.apache.giraph.graph.MasterThread: > masterThread: Coordination of superstep 2 took 9.113 seconds ended > with state WORKER_FAILURE and is now on superstep 2 > 2012-01-10 20:01:45,957 INFO org.apache.giraph.graph.BspServiceMaster: > getLastGoodCheckpoint: Found last good checkpoint 6 from > file:/tmp/testBspCheckpoints/6.finalized > 2012-01-10 20:01:46,006 ERROR org.apache.giraph.graph.MasterThread: > masterThread: Master algorithm failed: > java.lang.RuntimeException: retartFromCheckpoint: KeeperException > at > org.apache.giraph.graph.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1219) > at org.apache.giraph.graph.MasterThread.run(MasterThread.java:133) > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for > /_hadoopBsp/job_201201101959_0002/_inputSplitDir > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:102) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728) > at org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:238) > at > org.apache.giraph.graph.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1214) > ... 1 more > 2012-01-10 20:01:46,006 FATAL org.apache.giraph.graph.GraphMapper: > uncaughtException: OverrideExceptionHandler on thread > org.apache.giraph.graph.MasterThread, msg = > java.lang.RuntimeException: retartFromCheckpoint: KeeperException, > exiting... > java.lang.RuntimeException: java.lang.RuntimeException: > retartFromCheckpoint: KeeperException > at org.apache.giraph.graph.MasterThread.run(MasterThread.java:177) > Caused by: java.lang.RuntimeException: retartFromCheckpoint: KeeperException > at > org.apache.giraph.graph.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1219) > at org.apache.giraph.graph.MasterThread.run(MasterThread.java:133) > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for > /_hadoopBsp/job_201201101959_0002/_inputSplitDir > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:102) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728) > at org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:238) > at > org.apache.giraph.graph.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1214) > ... 1 more > 2012-01-10 20:01:46,008 WARN org.apache.giraph.zk.ZooKeeperManager: > onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper > process. > > which is unlikely caused by my code. Any ideas? > > maybe a hadoop issue? just freshly installed in pseudo-distributed on > my machine. > > -- > Claudio Martella > claudio.marte...@gmail.com -- Claudio Martella claudio.marte...@gmail.com