On 08/05/2010 06:31 PM, Jonathan Holloway wrote:
I'm looking at using Zookeeper for distributed sequence number generation.
What's the best way to do this currently? Is there a particular recipe
available for this?
My so far involve:
a) Creating a node with PERSISTENT_SEQUENTIAL then deleting it - this gives
me the monotonically increasing number, but the sequence number isn't
b) Storing the sequence number in the data portion of a persistent node -
then updating this (using the version number - aka optimistic locking). The
problem with this is that under high load I'm assuming there'll be a lot of
contention and hence failures with regards to updates.
What are your thoughts on the above?
I just ran into this exact situation, and handled it like so:
I wrote a library that uses the option (b) you described above. Only
instead of requesting a single sequence number, you request a block of
them at a time from Zookeeper, and then locally use them up one by one
from the block you retrieved. Retrieving by block (e.g., by blocks of
10000 at a time) eliminates the contention issue.
Then, if you're finished assigning ID's from that block, but still have
a bunch of ID's left in the block, the library has another function to
"push back" the unused ID's. They'll then get pulled again in the next
We don't actually have this code running in production yet, so I can't
vouch for how well it works. But the design was reviewed and given the
thumbs up by the core developers on the team, and the implementation
passes all my unit tests.
HTH. Feel free to email back with specific questions if you'd like more