So I'm still working on this issue. I grabbed a zookeeper only barrier 
implementation from here:


http://zookeeper.apache.org/doc/r3.3.3/zookeeperTutorial.html


This barrier makes it's own zookeeper connection separately from the curator 
connection that my program uses. When I put this barrier into my program, 
everything works as it should, and nobody gets stuck on the barriers. I then 
modified the barrier to use curators connection, passing in 
CuratorFramework.getZookeeperClient().getZooKeeper() instead of connecting 
separately. Once I did this, it breaks exactly as it did before when using the 
curator barrier.


This seems to indicate to me that something else I've done in the program has 
'broken' the zookeeper session associated with my curator connection, to the 
point where some watch events no longer work.


I'm going to embark on the arduous process of trying to figure out what I'm 
doing thats breaking my sessions watches. Watches not working properly is 
disturbing, and will certainly prevent other parts of my program from 
functioning correctly, probably in less obvious ways.


_Brian=                
  _____  

From: Brian Phillips [mailto:[email protected]]
To: [email protected] [mailto:[email protected]]
Sent: Tue, 25 Mar 2014 20:39:31 -0500
Subject: Re: Curator barriers missing watch events


Yes, there's two barrier sessions. But different barrier instances, and 
different barrier paths. ):

Sent from my iPhone

On Mar 25, 2014, at 8:34 PM, "Jordan Zimmerman" <[email protected]> 
wrote:



Are you saying there are two barrier sessions? The first one works, but the 
second doesn’t? Are you re-using the same path? I wonder if there are znodes 
left in the path or something. Before running the second barrier session, 
double check that the path is empty (do a getChildren on it). If it’s not empty 
that could be the problem.


-JZ 


 

From: Brian Phillips [email protected]
Reply: [email protected] [email protected]
Date: March 25, 2014 at 6:10:46 PM
To: [email protected] [email protected]
Subject:  Re: Curator barriers missing watch events 

 


                
  

  I’ve tried, but it seems to be timing specific. Its in a rather  large 
complicated program, where the first barrier always works but  the one at the 
end of the program usually gets stuck. I’ve spent  all day trying to make sense 
of it, as my project really needs it  to work.  

  I’d like to be able to figure out if the zookeeper server is  actually 
sending my clients the watch events.   

  _B    

  On Mar 25, 2014, at 6:53 PM, "Jordan Zimmerman" <[email protected]>  
wrote:
  
    
  
  There’s no way you can distill your usage into a test?  
  
  
  -JZ  
  
  
    

  From: Brian Phillips [email protected]
    Reply: [email protected]  [email protected]
  Date: March 25, 2014 at 5:51:37  PM
  To: [email protected]  [email protected]
  Subject:  Re: Curator barriers  missing watch events
  
    
  
  
Hmm, I made that change, but it didn't seem to help. The  first program made it 
to the barrier enter, then the second program  entered, exited, and the first 
program never left the  barrier.  

  
The second program got a node created event, but the  first program never got 
any event from its watcher.  

  
I appreciate the help! Must be something  else.  

  _B  

  On Mar 25, 2014, at 6:28 PM, "Jordan Zimmerman" <[email protected]>  
wrote:
  
    
  
  Look at line 313 and line 331. The noarg version of enter()  causes 
internalEnter() to call wait even though the watcher  may have already 
notified. I believe line 331 should  be:  
  
  
  else if ( !hasBeenNotified.get() )  
  
  
  -JZ  
  
  
    

  From: Brian Phillips [email protected]
    Reply: [email protected]  [email protected]
  Date: March 25, 2014 at 5:25:48  PM
  To: [email protected]  [email protected]
  Subject:  Re: Curator barriers  missing watch events
  
    
  
  
I am using the no arg version! What's the bug?
  
  _B  

  On Mar 25, 2014, at 6:23 PM, "Jordan Zimmerman" <[email protected]>  
wrote:
  
    
  
  Which version of enter() are you using? I see a potential bug  when the no 
arg version of enter() is used.  
  
  
    

  From: Brian Phillips [email protected]
    Reply: Brian Phillips  [email protected]
    Date: March 25, 2014 at 4:19:36  PM
  To: Jordan Zimmerman [email protected]
    Subject:  Re: Curator barriers  missing watch events
  
    
  
  
Good idea, but yes I am. The connection state doesn’t  change while I’m 
executing the barrier code. It seems to be some  kind of race condition I 
think, as sometimes it work and sometimes  it doesn’t. I’ve looked through the 
recipe code and it looks good  as far as I can tell though. I’m practically 
pulling my hair out at  this point.  

  
I may try a non-curator zookeeper only barrier tomorrow.  See if that works. Or 
I may start trying to debug the zookeeper  client, see if its actually getting 
the watches but not delivering  them.  

  
_B  
  
  
On Mar 25, 2014, at 4:54 PM, Jordan Zimmerman  <[email protected]>  
wrote:  
    
  
  Are you setting a ConnectionStateListener? If the connection  gets SUSPENDED 
or LOST then you’d need to reinitialize your  barrier.  
  
  
  -JZ  
  
  
    

  From: Brian Phillips [email protected]
    Reply: [email protected] [email protected]
    Date: March 25, 2014 at 2:51:42 PM
  To: [email protected] [email protected]
    Subject:  Re: Curator barriers missing  watch events 
  
    
  
  
I have tried writing a test program which launches two  programs in the same 
manor, each makes a connection then loops over  barriers with a 
Thread.sleep(random) in-between. This run  indefinitely and everything works 
out fine.  

  
I have also tried writing my own barrier, which uses a  SharedCount, where each 
guy tries to increment it until it hits a  memberQty. This too missed watch 
events and does not work  properly.  

  
It’s almost as if something else that I’ve done during  the running of my 
program has broken zookeepers watch events  somehow. Is there any good way to 
debug watch events in general?  I’ve tried to look at the DEBUG output for my 
zookeeper server log,  but it looks the same for the working vs non-working  
barriers...  

  
_B  
  
  
On Mar 25, 2014, at 3:42 PM, Jordan Zimmerman  <[email protected]>  
wrote:  
    
  
  Unfortunately, the barrier recipes aren’t widely used (from  what I know). 
So, there may well be a bug. If you could get a test  to show the problem that 
would be ideal.  
  
  
  -JZ  
  
  
    

  From: Brian Phillips [email protected]
    Reply: [email protected] [email protected]
    Date: March 25, 2014 at 2:38:40 PM
  To: [email protected] [email protected]
    Subject:  Curator barriers missing watch  events 
  
  Hi guys, 
  
  I’ve been integrating curator into my project and have recently run  into an 
issue I just can’t seem to make sense of. 
  
  I’m running two JVMs on the same host machine, each with their own  curator 
connection. At the beginning of my program I’m using the  
DistributedDoubleBarrier recipe, and once again at the end of my  program. A 
bunch of work is done in-between, including zookeeper  set/get/watches of other 
nodes. 
  
  I’m finding that the first double barrier, everyone always making  it 
through. The job-end barrier, sometimes everyone gets through,  but more often 
than not one of the programs hangs in enter's  wait(), and never gets the watch 
event for the ready path which  notifies it to proceed. If I look in zookeeper, 
I can see that the  ready path is actually set in there. 
  
  It would seem that the watch for one of the programs just never  triggers. 
  
  To simplify debugging, I’ve set both double barriers to only ever  call 
enter() and not leave(). Both barriers have their own separate  path. 
  
  Also, the program never shuts down or disconnects from zookeeper.  It just 
sleeps infinitely after it gets out of the final  barrier. 
  
  Any idea on how to debug this issue? I don’t mind hacking up  
zookeeper/curator code to insert my own debugging statements if it  comes to 
that. 
  
  _Brian=                    
                                    

Reply via email to