I setup a single zookeeper instance using the binaries distributed with 
Ubuntu 12.04. I downloaded the 3.3.5 source and compiled the C based locking 
recipe. I built this into a program of mine and ran into a problem. So I had 
some questions.

  If i wanted to create 1000 locks, do i setup the locks as follows?
  /lock/0
  /lock/1
  ...
  /lock/999

  is this correct?

  I was running an example with two clients competing for 1 lock running on the 
same machine the zookeeper instance was running on. I found that 
zkr_lock_lock() would often fail to acquire the lock, so i put that in a loop 
with 1000 retries. That seems to make it work most of the time, but other times 
there would still be a failure at zoo_lock.c:301

                // cannot watch my predecessor i am giving up
                // we need to be able to watch the predecessor
                // since if we do not become a leader the others
                // will keep waiting
[301]           if (ret != ZOK) {
                    free_String_vector(vector);


  I put a printf to see what ret was and it was ZNONODE. Now looking at the 
code above this spot, get_children is called and then it sorts the results and 
later calls zoo_wexists. It seems reasonable that the state could change 
between these two calls? I added a statement that if the result was ZNONODE, it 
does a goto back to above where get_children is called so it runs the algorithm 
again.

  That changes seems to make the code work all the time now, but I'm not sure 
if that change is correct. I've included the diff below. So is it expected that 
zkr_lock_lock will fail periodically since it only tries to acquire the lock 4 
times? 

thanks for any help,
kevin

--- zoo_lock.c.orig     2012-06-15 00:37:53.880508812 -0500
+++ zoo_lock.c  2012-06-15 00:41:41.304518262 -0500
@@ -273,6 +273,7 @@ static int zkr_lock_operation(zkr_lock_m
             mutex->id = getName(retbuf);
         }
         
+tryagain:
         if (mutex->id != NULL) {
             ret = ZCONNECTIONLOSS;
             ret = retry_getchildren(zh, path, vector, ts, retry);
@@ -299,7 +300,9 @@ static int zkr_lock_operation(zkr_lock_m
                 // will keep waiting
                 if (ret != ZOK) {
                     free_String_vector(vector);
+                    if (ret == ZNONODE) goto tryagain;
                     LOG_WARN(("unable to watch my predecessor"));
+                    printf("zret = %d\n", ret);
                     ret = zkr_lock_unlock(mutex);
                     while (ret == 0) {
                         //we have to give up our leadership

Reply via email to