Yes, there's two barrier sessions. But different barrier instances, and different barrier paths. ):
Sent from my iPhone > On Mar 25, 2014, at 8:34 PM, "Jordan Zimmerman" <[email protected]> > wrote: > > Are you saying there are two barrier sessions? The first one works, but the > second doesn’t? Are you re-using the same path? I wonder if there are znodes > left in the path or something. Before running the second barrier session, > double check that the path is empty (do a getChildren on it). If it’s not > empty that could be the problem. > > -JZ > > > From: Brian Phillips [email protected] > Reply: [email protected] [email protected] > Date: March 25, 2014 at 6:10:46 PM > To: [email protected] [email protected] > Subject: Re: Curator barriers missing watch events > >> I’ve tried, but it seems to be timing specific. Its in a rather large >> complicated program, where the first barrier always works but the one at the >> end of the program usually gets stuck. I’ve spent all day trying to make >> sense of it, as my project really needs it to work. >> >> I’d like to be able to figure out if the zookeeper server is actually >> sending my clients the watch events. >> >> _B >> >> >> On Mar 25, 2014, at 6:53 PM, "Jordan Zimmerman" <[email protected]> >> wrote: >> >>> There’s no way you can distill your usage into a test? >>> >>> -JZ >>> >>> >>> From: Brian Phillips [email protected] >>> Reply: [email protected] [email protected] >>> Date: March 25, 2014 at 5:51:37 PM >>> To: [email protected] [email protected] >>> Subject: Re: Curator barriers missing watch events >>> >>>> Hmm, I made that change, but it didn't seem to help. The first program >>>> made it to the barrier enter, then the second program entered, exited, and >>>> the first program never left the barrier. >>>> >>>> The second program got a node created event, but the first program never >>>> got any event from its watcher. >>>> >>>> I appreciate the help! Must be something else. >>>> >>>> _B >>>> >>>> On Mar 25, 2014, at 6:28 PM, "Jordan Zimmerman" >>>> <[email protected]> wrote: >>>> >>>>> Look at line 313 and line 331. The noarg version of enter() causes >>>>> internalEnter() to call wait even though the watcher may have already >>>>> notified. I believe line 331 should be: >>>>> >>>>> else if ( !hasBeenNotified.get() ) >>>>> >>>>> -JZ >>>>> >>>>> >>>>> From: Brian Phillips [email protected] >>>>> Reply: [email protected] [email protected] >>>>> Date: March 25, 2014 at 5:25:48 PM >>>>> To: [email protected] [email protected] >>>>> Subject: Re: Curator barriers missing watch events >>>>> >>>>>> I am using the no arg version! What's the bug? >>>>>> >>>>>> _B >>>>>> >>>>>> On Mar 25, 2014, at 6:23 PM, "Jordan Zimmerman" >>>>>> <[email protected]> wrote: >>>>>> >>>>>>> Which version of enter() are you using? I see a potential bug when the >>>>>>> no arg version of enter() is used. >>>>>>> >>>>>>> >>>>>>> From: Brian Phillips [email protected] >>>>>>> Reply: Brian Phillips [email protected] >>>>>>> Date: March 25, 2014 at 4:19:36 PM >>>>>>> To: Jordan Zimmerman [email protected] >>>>>>> Subject: Re: Curator barriers missing watch events >>>>>>> >>>>>>>> Good idea, but yes I am. The connection state doesn’t change while I’m >>>>>>>> executing the barrier code. It seems to be some kind of race condition >>>>>>>> I think, as sometimes it work and sometimes it doesn’t. I’ve looked >>>>>>>> through the recipe code and it looks good as far as I can tell though. >>>>>>>> I’m practically pulling my hair out at this point. >>>>>>>> >>>>>>>> I may try a non-curator zookeeper only barrier tomorrow. See if that >>>>>>>> works. Or I may start trying to debug the zookeeper client, see if its >>>>>>>> actually getting the watches but not delivering them. >>>>>>>> >>>>>>>> _B >>>>>>>> >>>>>>>>> On Mar 25, 2014, at 4:54 PM, Jordan Zimmerman >>>>>>>>> <[email protected]> wrote: >>>>>>>>> >>>>>>>>> Are you setting a ConnectionStateListener? If the connection gets >>>>>>>>> SUSPENDED or LOST then you’d need to reinitialize your barrier. >>>>>>>>> >>>>>>>>> -JZ >>>>>>>>> >>>>>>>>> >>>>>>>>> From: Brian Phillips [email protected] >>>>>>>>> Reply: [email protected] [email protected] >>>>>>>>> Date: March 25, 2014 at 2:51:42 PM >>>>>>>>> To: [email protected] [email protected] >>>>>>>>> Subject: Re: Curator barriers missing watch events >>>>>>>>> >>>>>>>>>> I have tried writing a test program which launches two programs in >>>>>>>>>> the same manor, each makes a connection then loops over barriers >>>>>>>>>> with a Thread.sleep(random) in-between. This run indefinitely and >>>>>>>>>> everything works out fine. >>>>>>>>>> >>>>>>>>>> I have also tried writing my own barrier, which uses a SharedCount, >>>>>>>>>> where each guy tries to increment it until it hits a memberQty. This >>>>>>>>>> too missed watch events and does not work properly. >>>>>>>>>> >>>>>>>>>> It’s almost as if something else that I’ve done during the running >>>>>>>>>> of my program has broken zookeepers watch events somehow. Is there >>>>>>>>>> any good way to debug watch events in general? I’ve tried to look at >>>>>>>>>> the DEBUG output for my zookeeper server log, but it looks the same >>>>>>>>>> for the working vs non-working barriers... >>>>>>>>>> >>>>>>>>>> _B >>>>>>>>>> >>>>>>>>>>> On Mar 25, 2014, at 3:42 PM, Jordan Zimmerman >>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> Unfortunately, the barrier recipes aren’t widely used (from what I >>>>>>>>>>> know). So, there may well be a bug. If you could get a test to show >>>>>>>>>>> the problem that would be ideal. >>>>>>>>>>> >>>>>>>>>>> -JZ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: Brian Phillips [email protected] >>>>>>>>>>> Reply: [email protected] [email protected] >>>>>>>>>>> Date: March 25, 2014 at 2:38:40 PM >>>>>>>>>>> To: [email protected] [email protected] >>>>>>>>>>> Subject: Curator barriers missing watch events >>>>>>>>>>> >>>>>>>>>>>> Hi guys, >>>>>>>>>>>> >>>>>>>>>>>> I’ve been integrating curator into my project and have recently >>>>>>>>>>>> run into an issue I just can’t seem to make sense of. >>>>>>>>>>>> >>>>>>>>>>>> I’m running two JVMs on the same host machine, each with their own >>>>>>>>>>>> curator connection. At the beginning of my program I’m using the >>>>>>>>>>>> DistributedDoubleBarrier recipe, and once again at the end of my >>>>>>>>>>>> program. A bunch of work is done in-between, including zookeeper >>>>>>>>>>>> set/get/watches of other nodes. >>>>>>>>>>>> >>>>>>>>>>>> I’m finding that the first double barrier, everyone always making >>>>>>>>>>>> it through. The job-end barrier, sometimes everyone gets through, >>>>>>>>>>>> but more often than not one of the programs hangs in enter's >>>>>>>>>>>> wait(), and never gets the watch event for the ready path which >>>>>>>>>>>> notifies it to proceed. If I look in zookeeper, I can see that the >>>>>>>>>>>> ready path is actually set in there. >>>>>>>>>>>> >>>>>>>>>>>> It would seem that the watch for one of the programs just never >>>>>>>>>>>> triggers. >>>>>>>>>>>> >>>>>>>>>>>> To simplify debugging, I’ve set both double barriers to only ever >>>>>>>>>>>> call enter() and not leave(). Both barriers have their own >>>>>>>>>>>> separate path. >>>>>>>>>>>> >>>>>>>>>>>> Also, the program never shuts down or disconnects from zookeeper. >>>>>>>>>>>> It just sleeps infinitely after it gets out of the final barrier. >>>>>>>>>>>> >>>>>>>>>>>> Any idea on how to debug this issue? I don’t mind hacking up >>>>>>>>>>>> zookeeper/curator code to insert my own debugging statements if it >>>>>>>>>>>> comes to that. >>>>>>>>>>>> >>>>>>>>>>>> _Brian=
