Mahadev any insight on this?

On Fri, Jun 15, 2012 at 9:25 AM, Kevin Harms <[email protected]> wrote:
>
>  I setup a single zookeeper instance using the binaries distributed with 
> Ubuntu 12.04. I downloaded the 3.3.5 source and compiled the C based locking 
> recipe. I built this into a program of mine and ran into a problem. So I had 
> some questions.
>
>  If i wanted to create 1000 locks, do i setup the locks as follows?
>  /lock/0
>  /lock/1
>  ...
>  /lock/999
>
>  is this correct?
>
>  I was running an example with two clients competing for 1 lock running on 
> the same machine the zookeeper instance was running on. I found that 
> zkr_lock_lock() would often fail to acquire the lock, so i put that in a loop 
> with 1000 retries. That seems to make it work most of the time, but other 
> times there would still be a failure at zoo_lock.c:301
>
>                // cannot watch my predecessor i am giving up
>                // we need to be able to watch the predecessor
>                // since if we do not become a leader the others
>                // will keep waiting
> [301]           if (ret != ZOK) {
>                    free_String_vector(vector);
>
>
>  I put a printf to see what ret was and it was ZNONODE. Now looking at the 
> code above this spot, get_children is called and then it sorts the results 
> and later calls zoo_wexists. It seems reasonable that the state could change 
> between these two calls? I added a statement that if the result was ZNONODE, 
> it does a goto back to above where get_children is called so it runs the 
> algorithm again.
>
>  That changes seems to make the code work all the time now, but I'm not sure 
> if that change is correct. I've included the diff below. So is it expected 
> that zkr_lock_lock will fail periodically since it only tries to acquire the 
> lock 4 times?
>
> thanks for any help,
> kevin
>
> --- zoo_lock.c.orig     2012-06-15 00:37:53.880508812 -0500
> +++ zoo_lock.c  2012-06-15 00:41:41.304518262 -0500
> @@ -273,6 +273,7 @@ static int zkr_lock_operation(zkr_lock_m
>             mutex->id = getName(retbuf);
>         }
>
> +tryagain:
>         if (mutex->id != NULL) {
>             ret = ZCONNECTIONLOSS;
>             ret = retry_getchildren(zh, path, vector, ts, retry);
> @@ -299,7 +300,9 @@ static int zkr_lock_operation(zkr_lock_m
>                 // will keep waiting
>                 if (ret != ZOK) {
>                     free_String_vector(vector);
> +                    if (ret == ZNONODE) goto tryagain;
>                     LOG_WARN(("unable to watch my predecessor"));
> +                    printf("zret = %d\n", ret);
>                     ret = zkr_lock_unlock(mutex);
>                     while (ret == 0) {
>                         //we have to give up our leadership
>

Reply via email to