I am using the no arg version! What's the bug?

_B

> On Mar 25, 2014, at 6:23 PM, "Jordan Zimmerman" <[email protected]> 
> wrote:
> 
> Which version of enter() are you using? I see a potential bug when the no arg 
> version of enter() is used.
> 
> 
> From: Brian Phillips [email protected]
> Reply: Brian Phillips [email protected]
> Date: March 25, 2014 at 4:19:36 PM
> To: Jordan Zimmerman [email protected]
> Subject:  Re: Curator barriers missing watch events 
> 
>> Good idea, but yes I am. The connection state doesn’t change while I’m 
>> executing the barrier code. It seems to be some kind of race condition I 
>> think, as sometimes it work and sometimes it doesn’t. I’ve looked through 
>> the recipe code and it looks good as far as I can tell though. I’m 
>> practically pulling my hair out at this point.
>> 
>> I may try a non-curator zookeeper only barrier tomorrow. See if that works. 
>> Or I may start trying to debug the zookeeper client, see if its actually 
>> getting the watches but not delivering them.
>> 
>> _B
>> 
>>> On Mar 25, 2014, at 4:54 PM, Jordan Zimmerman <[email protected]> 
>>> wrote:
>>> 
>>> Are you setting a ConnectionStateListener? If the connection gets SUSPENDED 
>>> or LOST then you’d need to reinitialize your barrier.
>>> 
>>> -JZ
>>> 
>>> 
>>> From: Brian Phillips [email protected]
>>> Reply: [email protected] [email protected]
>>> Date: March 25, 2014 at 2:51:42 PM
>>> To: [email protected] [email protected]
>>> Subject:  Re: Curator barriers missing watch events 
>>> 
>>>> I have tried writing a test program which launches two programs in the 
>>>> same manor, each makes a connection then loops over barriers with a 
>>>> Thread.sleep(random) in-between. This run indefinitely and everything 
>>>> works out fine.
>>>> 
>>>> I have also tried writing my own barrier, which uses a SharedCount, where 
>>>> each guy tries to increment it until it hits a memberQty. This too missed 
>>>> watch events and does not work properly.
>>>> 
>>>> It’s almost as if something else that I’ve done during the running of my 
>>>> program has broken zookeepers watch events somehow. Is there any good way 
>>>> to debug watch events in general? I’ve tried to look at the DEBUG output 
>>>> for my zookeeper server log, but it looks the same for the working vs 
>>>> non-working barriers...
>>>> 
>>>> _B
>>>> 
>>>>> On Mar 25, 2014, at 3:42 PM, Jordan Zimmerman 
>>>>> <[email protected]> wrote:
>>>>> 
>>>>> Unfortunately, the barrier recipes aren’t widely used (from what I know). 
>>>>> So, there may well be a bug. If you could get a test to show the problem 
>>>>> that would be ideal.
>>>>> 
>>>>> -JZ
>>>>> 
>>>>> 
>>>>> From: Brian Phillips [email protected]
>>>>> Reply: [email protected] [email protected]
>>>>> Date: March 25, 2014 at 2:38:40 PM
>>>>> To: [email protected] [email protected]
>>>>> Subject:  Curator barriers missing watch events 
>>>>> 
>>>>>> Hi guys, 
>>>>>> 
>>>>>> I’ve been integrating curator into my project and have recently run into 
>>>>>> an issue I just can’t seem to make sense of. 
>>>>>> 
>>>>>> I’m running two JVMs on the same host machine, each with their own 
>>>>>> curator connection. At the beginning of my program I’m using the 
>>>>>> DistributedDoubleBarrier recipe, and once again at the end of my 
>>>>>> program. A bunch of work is done in-between, including zookeeper 
>>>>>> set/get/watches of other nodes. 
>>>>>> 
>>>>>> I’m finding that the first double barrier, everyone always making it 
>>>>>> through. The job-end barrier, sometimes everyone gets through, but more 
>>>>>> often than not one of the programs hangs in enter's wait(), and never 
>>>>>> gets the watch event for the ready path which notifies it to proceed. If 
>>>>>> I look in zookeeper, I can see that the ready path is actually set in 
>>>>>> there. 
>>>>>> 
>>>>>> It would seem that the watch for one of the programs just never 
>>>>>> triggers. 
>>>>>> 
>>>>>> To simplify debugging, I’ve set both double barriers to only ever call 
>>>>>> enter() and not leave(). Both barriers have their own separate path. 
>>>>>> 
>>>>>> Also, the program never shuts down or disconnects from zookeeper. It 
>>>>>> just sleeps infinitely after it gets out of the final barrier. 
>>>>>> 
>>>>>> Any idea on how to debug this issue? I don’t mind hacking up 
>>>>>> zookeeper/curator code to insert my own debugging statements if it comes 
>>>>>> to that. 
>>>>>> 
>>>>>> _Brian=
>> 

Reply via email to