Which version of enter() are you using? I see a potential bug when the no arg 
version of enter() is used.


From: Brian Phillips [email protected]
Reply: Brian Phillips [email protected]
Date: March 25, 2014 at 4:19:36 PM
To: Jordan Zimmerman [email protected]
Subject:  Re: Curator barriers missing watch events  

Good idea, but yes I am. The connection state doesn’t change while I’m 
executing the barrier code. It seems to be some kind of race condition I think, 
as sometimes it work and sometimes it doesn’t. I’ve looked through the recipe 
code and it looks good as far as I can tell though. I’m practically pulling my 
hair out at this point.

I may try a non-curator zookeeper only barrier tomorrow. See if that works. Or 
I may start trying to debug the zookeeper client, see if its actually getting 
the watches but not delivering them.

_B

On Mar 25, 2014, at 4:54 PM, Jordan Zimmerman <[email protected]> 
wrote:

Are you setting a ConnectionStateListener? If the connection gets SUSPENDED or 
LOST then you’d need to reinitialize your barrier.

-JZ


From: Brian Phillips [email protected]
Reply: [email protected] [email protected]
Date: March 25, 2014 at 2:51:42 PM
To: [email protected] [email protected]
Subject:  Re: Curator barriers missing watch events 

I have tried writing a test program which launches two programs in the same 
manor, each makes a connection then loops over barriers with a 
Thread.sleep(random) in-between. This run indefinitely and everything works out 
fine.

I have also tried writing my own barrier, which uses a SharedCount, where each 
guy tries to increment it until it hits a memberQty. This too missed watch 
events and does not work properly.

It’s almost as if something else that I’ve done during the running of my 
program has broken zookeepers watch events somehow. Is there any good way to 
debug watch events in general? I’ve tried to look at the DEBUG output for my 
zookeeper server log, but it looks the same for the working vs non-working 
barriers...

_B

On Mar 25, 2014, at 3:42 PM, Jordan Zimmerman <[email protected]> 
wrote:

Unfortunately, the barrier recipes aren’t widely used (from what I know). So, 
there may well be a bug. If you could get a test to show the problem that would 
be ideal.

-JZ


From: Brian Phillips [email protected]
Reply: [email protected] [email protected]
Date: March 25, 2014 at 2:38:40 PM
To: [email protected] [email protected]
Subject:  Curator barriers missing watch events 

Hi guys, 

I’ve been integrating curator into my project and have recently run into an 
issue I just can’t seem to make sense of. 

I’m running two JVMs on the same host machine, each with their own curator 
connection. At the beginning of my program I’m using the 
DistributedDoubleBarrier recipe, and once again at the end of my program. A 
bunch of work is done in-between, including zookeeper set/get/watches of other 
nodes. 

I’m finding that the first double barrier, everyone always making it through. 
The job-end barrier, sometimes everyone gets through, but more often than not 
one of the programs hangs in enter's wait(), and never gets the watch event for 
the ready path which notifies it to proceed. If I look in zookeeper, I can see 
that the ready path is actually set in there. 

It would seem that the watch for one of the programs just never triggers. 

To simplify debugging, I’ve set both double barriers to only ever call enter() 
and not leave(). Both barriers have their own separate path. 

Also, the program never shuts down or disconnects from zookeeper. It just 
sleeps infinitely after it gets out of the final barrier. 

Any idea on how to debug this issue? I don’t mind hacking up zookeeper/curator 
code to insert my own debugging statements if it comes to that. 

_Brian=

Reply via email to