Unfortunately, the barrier recipes aren’t widely used (from what I know). So, there may well be a bug. If you could get a test to show the problem that would be ideal.
-JZ From: Brian Phillips [email protected] Reply: [email protected] [email protected] Date: March 25, 2014 at 2:38:40 PM To: [email protected] [email protected] Subject: Curator barriers missing watch events Hi guys, I’ve been integrating curator into my project and have recently run into an issue I just can’t seem to make sense of. I’m running two JVMs on the same host machine, each with their own curator connection. At the beginning of my program I’m using the DistributedDoubleBarrier recipe, and once again at the end of my program. A bunch of work is done in-between, including zookeeper set/get/watches of other nodes. I’m finding that the first double barrier, everyone always making it through. The job-end barrier, sometimes everyone gets through, but more often than not one of the programs hangs in enter's wait(), and never gets the watch event for the ready path which notifies it to proceed. If I look in zookeeper, I can see that the ready path is actually set in there. It would seem that the watch for one of the programs just never triggers. To simplify debugging, I’ve set both double barriers to only ever call enter() and not leave(). Both barriers have their own separate path. Also, the program never shuts down or disconnects from zookeeper. It just sleeps infinitely after it gets out of the final barrier. Any idea on how to debug this issue? I don’t mind hacking up zookeeper/curator code to insert my own debugging statements if it comes to that. _Brian=
