Hi,
I'm testing a custom PageRank implementation using trunk on Hadoop
1.0.4. I seem to run into a deadlock after the input superstep.
The workers report "finishSuperstep: (all workers done) WORKER_ONLY -
Attempt=0, Superstep=0" and the master reports that all workers are done
with superstep -1.
I reconstructed this using a local setup and seems to me that the
BspServiceMaster hangs in the cleanUpZooKeeper method infinitely.
I'm not using a dedicated zk instance, I just have Giraph start one. Any
ideas what can be done to fix my problem?
Best,
Sebastian
excerpt from jstack
"org.apache.giraph.master.MasterThread" prio=10 tid=0x00007f29fc385000
nid=0x29d1 waiting on condition [0x00007f2a09a5f000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f38967d8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2116)
at
org.apache.giraph.zk.PredicateLock.waitMsecs(PredicateLock.java:112)
at
org.apache.giraph.zk.PredicateLock.waitForever(PredicateLock.java:138)
at
org.apache.giraph.master.BspServiceMaster.cleanUpZooKeeper(BspServiceMaster.java:1602)
at
org.apache.giraph.master.BspServiceMaster.cleanup(BspServiceMaster.java:1692)
at org.apache.giraph.master.MasterThread.run(MasterThread.java:144)