Interesting. Dedicated zk instance doesn't work with hadoop-2.0.x or trunk either when running Giraph on YARN/MRv2. I would like to look into this more if I have time. Anyone have any ideas? And, anyone have a definitely timeline on how long this has been broken? Most of my work with Giraph last summer was on a cluster with its own ZK so I have not used the feature much. I do rememebr it working on 1.0.something hadoop profile at maybe christmas of 2011? But that was a long time ago...
On Fri, Jan 25, 2013 at 3:07 AM, Sebastian Schelter <[email protected]> wrote: > Hi, > > I get exactly the same deadlock when using a dedicated (non-distributed) > ZK instance. I tried 3.3.6 and 3.4.5. > > I haven't used Giraph for a while, so I can't say whether this has > worked recently... > > Best, > Sebastian > > > > On 23.01.2013 05:14, Eli Reisman wrote: > > Hi Sebastian, > > > > This seems to be a new issue related to our recent upgrade to > > multithreading. I have not seen this before. All my other emails related > to > > the array index out of bounds error you found over the weekend. > > > > however, I have had trouble with the local zk instance for some time now > on > > a number of Giraph profiles and pretty much exclusively use a separate ZK > > instance of my own. Last summer I was running a lot of jobs on a 1.0.x > > hadoop cluster with Giraph, and I was told to use the on-cluster > dedicated > > ZK quorum due to "problems" with Giraph's local ZK instanantiation. No > one > > got more specific with me than that. I also can't get the local ZK > > instances to come up on Hadoop-2.0.x -- perhaps this feature of Giraph > has > > had problems for a while. Was it working for you recently? > > > > If you notice any other clues as to the cause, please post them I'm > hoping > > to do some work aorund this soon. > > > > On Tue, Jan 22, 2013 at 11:05 AM, Claudio Martella < > > [email protected]> wrote: > > > >> Hi Sebastian, > >> > >> I do not know what is happening, I am also having problems of jobs > >> blocking while waiting to setup the zookeeper instance. > >> We should look into this. > >> > >> Best, > >> Claudio > >> > >> > >> On Mon, Jan 21, 2013 at 1:59 PM, Sebastian Schelter <[email protected] > >wrote: > >> > >>> Hi, > >>> > >>> I'm testing a custom PageRank implementation using trunk on Hadoop > >>> 1.0.4. I seem to run into a deadlock after the input superstep. > >>> > >>> The workers report "finishSuperstep: (all workers done) WORKER_ONLY - > >>> Attempt=0, Superstep=0" and the master reports that all workers are > done > >>> with superstep -1. > >>> > >>> I reconstructed this using a local setup and seems to me that the > >>> BspServiceMaster hangs in the cleanUpZooKeeper method infinitely. > >>> > >>> I'm not using a dedicated zk instance, I just have Giraph start one. > Any > >>> ideas what can be done to fix my problem? > >>> > >>> Best, > >>> Sebastian > >>> > >>> > >>> excerpt from jstack > >>> > >>> "org.apache.giraph.master.MasterThread" prio=10 tid=0x00007f29fc385000 > >>> nid=0x29d1 waiting on condition [0x00007f2a09a5f000] > >>> java.lang.Thread.State: TIMED_WAITING (parking) > >>> at sun.misc.Unsafe.park(Native Method) > >>> - parking to wait for <0x00000000f38967d8> (a > >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > >>> at > >>> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) > >>> at > >>> > >>> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2116) > >>> at > >>> org.apache.giraph.zk.PredicateLock.waitMsecs(PredicateLock.java:112) > >>> at > >>> org.apache.giraph.zk.PredicateLock.waitForever(PredicateLock.java:138) > >>> at > >>> > >>> > org.apache.giraph.master.BspServiceMaster.cleanUpZooKeeper(BspServiceMaster.java:1602) > >>> at > >>> > >>> > org.apache.giraph.master.BspServiceMaster.cleanup(BspServiceMaster.java:1692) > >>> at > >>> org.apache.giraph.master.MasterThread.run(MasterThread.java:144) > >>> > >>> > >>> > >> > >> > >> -- > >> Claudio Martella > >> [email protected] > >> > > > >
