ok, please ignore this last email, i found the right log on the other
tasks... :)

On Tue, Jan 10, 2012 at 8:10 PM, Claudio Martella
<claudio.marte...@gmail.com> wrote:
> Hello,
>
> I'm having some issues with debugging of GIRAPH-45. Code passes local
> tests but currently fails
>
>  testBspCheckpoint(org.apache.giraph.TestManualCheckpoint)
>  testPartitioners(org.apache.giraph.TestGraphPartitioner)
>
> The first one is particularly tricky as the autocheckpointing is
> passed and because this is the only error i get from stderr:
>
>  <testcase time="111.559"
> classname="org.apache.giraph.TestManualCheckpoint"
> name="testBspCheckpoint">
>    <failure 
> type="junit.framework.AssertionFailedError">junit.framework.AssertionFailedError
>        at junit.framework.Assert.fail(Assert.java:47)
>        at junit.framework.Assert.assertTrue(Assert.java:20)
>        at junit.framework.Assert.assertTrue(Assert.java:27)
>        at 
> org.apache.giraph.TestManualCheckpoint.testBspCheckpoint(TestManualCheckpoint.java:108)
> </failure>
>    <system-out>Setting tasks to 3 for testBspCheckpoint since
> JobTracker exists...
> setup: Sending job to job tracker localhost:9001 with jar path
> target/giraph-0.70-jar-with-dependencies.jar for testBspCheckpoint
> testBspCheckpoint: Restarting from superstep 2 with checkpoint path =
> /tmp/testBspCheckpoints
> setup: Sending job to job tracker localhost:9001 with jar path
> target/giraph-0.70-jar-with-dependencies.jar for testBspCheckpoint
> </system-out>
>    <system-err>java.lang.Throwable: Child Error
>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> Caused by: java.io.IOException: Task process exit with nonzero status of 1.
>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
>
> attempt_201201092336_0002_m_000000_0: 2012-01-09 23:38:27.816
> java[12460:1903] Unable to load realm info from SCDynamicStore
> </system-err>
>  </testcase>
>
> So i checked in hadoop logs, and that's what i found for the failed task:
>
> 2012-01-10 20:01:45,760 INFO org.apache.giraph.graph.BspServiceMaster:
> barrierOnWorkerList: 0 out of 3 workers finished on superstep 2 on
> path 
> /_hadoopBsp/job_201201101959_0002/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir
> 2012-01-10 20:01:45,925 ERROR
> org.apache.giraph.graph.BspServiceMaster: superstepChosenWorkerAlive:
> Missing chosen worker Worker(hostname=tyler.local, MRpartition=2,
> port=30002) on superstep 2
> 2012-01-10 20:01:45,925 INFO org.apache.giraph.graph.MasterThread:
> masterThread: Coordination of superstep 2 took 9.113 seconds ended
> with state WORKER_FAILURE and is now on superstep 2
> 2012-01-10 20:01:45,957 INFO org.apache.giraph.graph.BspServiceMaster:
> getLastGoodCheckpoint: Found last good checkpoint 6 from
> file:/tmp/testBspCheckpoints/6.finalized
> 2012-01-10 20:01:46,006 ERROR org.apache.giraph.graph.MasterThread:
> masterThread: Master algorithm failed:
> java.lang.RuntimeException: retartFromCheckpoint: KeeperException
>        at 
> org.apache.giraph.graph.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1219)
>        at org.apache.giraph.graph.MasterThread.run(MasterThread.java:133)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for
> /_hadoopBsp/job_201201101959_0002/_inputSplitDir
>        at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>        at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
>        at org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:238)
>        at 
> org.apache.giraph.graph.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1214)
>        ... 1 more
> 2012-01-10 20:01:46,006 FATAL org.apache.giraph.graph.GraphMapper:
> uncaughtException: OverrideExceptionHandler on thread
> org.apache.giraph.graph.MasterThread, msg =
> java.lang.RuntimeException: retartFromCheckpoint: KeeperException,
> exiting...
> java.lang.RuntimeException: java.lang.RuntimeException:
> retartFromCheckpoint: KeeperException
>        at org.apache.giraph.graph.MasterThread.run(MasterThread.java:177)
> Caused by: java.lang.RuntimeException: retartFromCheckpoint: KeeperException
>        at 
> org.apache.giraph.graph.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1219)
>        at org.apache.giraph.graph.MasterThread.run(MasterThread.java:133)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for
> /_hadoopBsp/job_201201101959_0002/_inputSplitDir
>        at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>        at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
>        at org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:238)
>        at 
> org.apache.giraph.graph.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1214)
>        ... 1 more
> 2012-01-10 20:01:46,008 WARN org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper
> process.
>
> which is unlikely caused by my code. Any ideas?
>
> maybe a hadoop issue? just freshly installed in pseudo-distributed on
> my machine.
>
> --
>    Claudio Martella
>    claudio.marte...@gmail.com



-- 
   Claudio Martella
   claudio.marte...@gmail.com

Reply via email to