Mahadev any insight on this?
On Fri, Jun 15, 2012 at 9:25 AM, Kevin Harms <[email protected]> wrote: > > I setup a single zookeeper instance using the binaries distributed with > Ubuntu 12.04. I downloaded the 3.3.5 source and compiled the C based locking > recipe. I built this into a program of mine and ran into a problem. So I had > some questions. > > If i wanted to create 1000 locks, do i setup the locks as follows? > /lock/0 > /lock/1 > ... > /lock/999 > > is this correct? > > I was running an example with two clients competing for 1 lock running on > the same machine the zookeeper instance was running on. I found that > zkr_lock_lock() would often fail to acquire the lock, so i put that in a loop > with 1000 retries. That seems to make it work most of the time, but other > times there would still be a failure at zoo_lock.c:301 > > // cannot watch my predecessor i am giving up > // we need to be able to watch the predecessor > // since if we do not become a leader the others > // will keep waiting > [301] if (ret != ZOK) { > free_String_vector(vector); > > > I put a printf to see what ret was and it was ZNONODE. Now looking at the > code above this spot, get_children is called and then it sorts the results > and later calls zoo_wexists. It seems reasonable that the state could change > between these two calls? I added a statement that if the result was ZNONODE, > it does a goto back to above where get_children is called so it runs the > algorithm again. > > That changes seems to make the code work all the time now, but I'm not sure > if that change is correct. I've included the diff below. So is it expected > that zkr_lock_lock will fail periodically since it only tries to acquire the > lock 4 times? > > thanks for any help, > kevin > > --- zoo_lock.c.orig 2012-06-15 00:37:53.880508812 -0500 > +++ zoo_lock.c 2012-06-15 00:41:41.304518262 -0500 > @@ -273,6 +273,7 @@ static int zkr_lock_operation(zkr_lock_m > mutex->id = getName(retbuf); > } > > +tryagain: > if (mutex->id != NULL) { > ret = ZCONNECTIONLOSS; > ret = retry_getchildren(zh, path, vector, ts, retry); > @@ -299,7 +300,9 @@ static int zkr_lock_operation(zkr_lock_m > // will keep waiting > if (ret != ZOK) { > free_String_vector(vector); > + if (ret == ZNONODE) goto tryagain; > LOG_WARN(("unable to watch my predecessor")); > + printf("zret = %d\n", ret); > ret = zkr_lock_unlock(mutex); > while (ret == 0) { > //we have to give up our leadership >
