Hi Will, We have done something similar with a custom realtime distributed queue. It's basically a Queue divided into channels, with even hashing on push, and a single consumer thread per channel. We catch all disconnect Exceptions and simply call worker.stop() on the worker that is actually reading data from the queue. The worker is a Runnable that is submitted to the thread pool and checks if it should run each time the pool runs the worker.
This occasionally results in workers that pause, then restart while our ZK connections normalize, however this prevents us from consuming when we aren't sure we have a lock. As you said, our consumer checks if it should be running with every iteration of the loop, however there is no other way around this that I have found. Todd On Wed, 2011-07-20 at 17:03 -0400, Will Johnson wrote: > The Lock recipe has a overview description of "Fully distributed locks that > are globally synchronous, meaning at any snapshot in time no two clients > think they hold the same lock." We've implemented this pattern but we've > run into an issue handling zookeeper errors that seem to violate the > semantics of 'no two clients think they have the lock.' for example: > > Thread1.Client1.lock(); > Thread2.Client2.lock(); > > // client1 gets the lock so he starts some work > Thread1.client1.doWork(); > > // but now i get a session timeout > // in the worst case it's because the doWork() method caused a full GC that > took > sessionTimeout > // my client then has to reconnect with a new session ID > Thread1.client1.reconnect(); > > But now my question is, how have people handled this case to notify > Thread1.client1 that he is no longer holding the lock? Without a lot of > pedantic calls to Thread1.client1.doIStillHaveTheLock() inside the doWork() > method it seems like 2 clients both think they have the lock. Even if you > make repeated calls to check the state of your lock you still have small > windows of time where 2 clients are in the lock. i could interrupt Thread1 > when reconnecting but if you're using the lock for multithreaded > synchronization that won't help. > > I realize the limitations of zookeeper in this case but i also hope someone > else has solved this problem intelligently before. > > - will
