I am using the no arg version! What's the bug? _B
> On Mar 25, 2014, at 6:23 PM, "Jordan Zimmerman" <[email protected]> > wrote: > > Which version of enter() are you using? I see a potential bug when the no arg > version of enter() is used. > > > From: Brian Phillips [email protected] > Reply: Brian Phillips [email protected] > Date: March 25, 2014 at 4:19:36 PM > To: Jordan Zimmerman [email protected] > Subject: Re: Curator barriers missing watch events > >> Good idea, but yes I am. The connection state doesn’t change while I’m >> executing the barrier code. It seems to be some kind of race condition I >> think, as sometimes it work and sometimes it doesn’t. I’ve looked through >> the recipe code and it looks good as far as I can tell though. I’m >> practically pulling my hair out at this point. >> >> I may try a non-curator zookeeper only barrier tomorrow. See if that works. >> Or I may start trying to debug the zookeeper client, see if its actually >> getting the watches but not delivering them. >> >> _B >> >>> On Mar 25, 2014, at 4:54 PM, Jordan Zimmerman <[email protected]> >>> wrote: >>> >>> Are you setting a ConnectionStateListener? If the connection gets SUSPENDED >>> or LOST then you’d need to reinitialize your barrier. >>> >>> -JZ >>> >>> >>> From: Brian Phillips [email protected] >>> Reply: [email protected] [email protected] >>> Date: March 25, 2014 at 2:51:42 PM >>> To: [email protected] [email protected] >>> Subject: Re: Curator barriers missing watch events >>> >>>> I have tried writing a test program which launches two programs in the >>>> same manor, each makes a connection then loops over barriers with a >>>> Thread.sleep(random) in-between. This run indefinitely and everything >>>> works out fine. >>>> >>>> I have also tried writing my own barrier, which uses a SharedCount, where >>>> each guy tries to increment it until it hits a memberQty. This too missed >>>> watch events and does not work properly. >>>> >>>> It’s almost as if something else that I’ve done during the running of my >>>> program has broken zookeepers watch events somehow. Is there any good way >>>> to debug watch events in general? I’ve tried to look at the DEBUG output >>>> for my zookeeper server log, but it looks the same for the working vs >>>> non-working barriers... >>>> >>>> _B >>>> >>>>> On Mar 25, 2014, at 3:42 PM, Jordan Zimmerman >>>>> <[email protected]> wrote: >>>>> >>>>> Unfortunately, the barrier recipes aren’t widely used (from what I know). >>>>> So, there may well be a bug. If you could get a test to show the problem >>>>> that would be ideal. >>>>> >>>>> -JZ >>>>> >>>>> >>>>> From: Brian Phillips [email protected] >>>>> Reply: [email protected] [email protected] >>>>> Date: March 25, 2014 at 2:38:40 PM >>>>> To: [email protected] [email protected] >>>>> Subject: Curator barriers missing watch events >>>>> >>>>>> Hi guys, >>>>>> >>>>>> I’ve been integrating curator into my project and have recently run into >>>>>> an issue I just can’t seem to make sense of. >>>>>> >>>>>> I’m running two JVMs on the same host machine, each with their own >>>>>> curator connection. At the beginning of my program I’m using the >>>>>> DistributedDoubleBarrier recipe, and once again at the end of my >>>>>> program. A bunch of work is done in-between, including zookeeper >>>>>> set/get/watches of other nodes. >>>>>> >>>>>> I’m finding that the first double barrier, everyone always making it >>>>>> through. The job-end barrier, sometimes everyone gets through, but more >>>>>> often than not one of the programs hangs in enter's wait(), and never >>>>>> gets the watch event for the ready path which notifies it to proceed. If >>>>>> I look in zookeeper, I can see that the ready path is actually set in >>>>>> there. >>>>>> >>>>>> It would seem that the watch for one of the programs just never >>>>>> triggers. >>>>>> >>>>>> To simplify debugging, I’ve set both double barriers to only ever call >>>>>> enter() and not leave(). Both barriers have their own separate path. >>>>>> >>>>>> Also, the program never shuts down or disconnects from zookeeper. It >>>>>> just sleeps infinitely after it gets out of the final barrier. >>>>>> >>>>>> Any idea on how to debug this issue? I don’t mind hacking up >>>>>> zookeeper/curator code to insert my own debugging statements if it comes >>>>>> to that. >>>>>> >>>>>> _Brian= >>
