Re: Sequence Number Generation With Zookeeper

2010-08-11 Thread Adam Rosien
What happens during a network partition and different clients are
incrementing different counters, and then the partition goes away?
Won't (potentially) the same sequence value be given out to two
clients?

.. Adam

On Thu, Aug 5, 2010 at 5:38 PM, Jonathan Holloway
jonathan.hollo...@gmail.com wrote:
 Hi Ted,

 Thanks for the comments.

 I might have overlooked something here, but is it also possible to do the
 following:

 1. Create a PERSISTENT node
 2. Have multiple clients set the data on the node, e.g.  Stat stat =
 zookeeper.setData(SEQUENCE, ArrayUtils.EMPTY_BYTE_ARRAY, -1);
 3. Use the version number from stat.getVersion() as the sequence (obviously
 I'm limited to Integer.MAX_VALUE)

 Are there any weird race conditions involved here which would mean that a
 client would receive the wrong Stat object back?

 Many thanks again,
 Jon.

 On 5 August 2010 16:09, Ted Dunning ted.dunn...@gmail.com wrote:

 (b)

 BUT:

 Sequential numbering is a special case of now.  In large diameters, now
 gets very expensive.  This is a special case of that assertion.  If there
 is
 a way to get away from this presumption of the need for sequential
 numbering, you will be miles better off.

 HOWEVER:

 ZK can do better than you suggest.  Incrementing a counter does involve
 potential contention, but you will very likely be able to get to pretty
 high
 rates before the optimistic locking begins to fail.  If you code your
 update
 with a few tries at full speed followed by some form of retry back-off, you
 should get pretty close to the best possible performance.

 You might also try building a lock with an ephemeral file before updating
 the counter.  I would expect that this will be slower than the back-off
 option if only because involves more transactions in ZK.  IF you wanted to
 get too complicated for your own good, you could have a secondary strategy
 flag that is only sampled by all clients every few seconds and is updated
 whenever a client needs to back-off more than say 5 steps.  If this flag
 has
 been updated recently, then clients should switch to the locking protocol.
  You might even have several locks so that you don't exclude all other
 updaters, merely thin them out a bit.  This flagged strategy would run as
 fast as optimistic locking as long as optimistic locking is fast and then
 would limit the total number of transactions needed under very high load.

 On Thu, Aug 5, 2010 at 3:31 PM, Jonathan Holloway 
 jonathan.hollo...@gmail.com wrote:

  My so far involve:
  a) Creating a node with PERSISTENT_SEQUENTIAL then deleting it - this
 gives
  me the monotonically increasing number, but the sequence number isn't
  contiguous
  b) Storing the sequence number in the data portion of a persistent node -
  then updating this (using the version number - aka optimistic locking).
   The
  problem with this is that under high load I'm assuming there'll be a lot
 of
  contention and hence failures with regards to updates.
 




Re: Sequence Number Generation With Zookeeper

2010-08-11 Thread Adam Rosien
Ah thanks, I forgot the majority-commit property because I also
forgot that all servers know what the cluster should look like, rather
than act adaptively (which wouldn't make sense after all).

.. Adam

On Wed, Aug 11, 2010 at 3:23 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 Can't happen.

 In a network partition, the side without a quorum can't update the file
 version.

 On Wed, Aug 11, 2010 at 3:11 PM, Adam Rosien a...@rosien.net wrote:

 What happens during a network partition and different clients are
 incrementing different counters, and then the partition goes away?
 Won't (potentially) the same sequence value be given out to two
 clients?

 .. Adam

 On Thu, Aug 5, 2010 at 5:38 PM, Jonathan Holloway
 jonathan.hollo...@gmail.com wrote:
  Hi Ted,
 
  Thanks for the comments.
 
  I might have overlooked something here, but is it also possible to do the
  following:
 
  1. Create a PERSISTENT node
  2. Have multiple clients set the data on the node, e.g.  Stat stat =
  zookeeper.setData(SEQUENCE, ArrayUtils.EMPTY_BYTE_ARRAY, -1);
  3. Use the version number from stat.getVersion() as the sequence
 (obviously
  I'm limited to Integer.MAX_VALUE)
 
  Are there any weird race conditions involved here which would mean that a
  client would receive the wrong Stat object back?
 
  Many thanks again,
  Jon.
 
  On 5 August 2010 16:09, Ted Dunning ted.dunn...@gmail.com wrote:
 
  (b)
 
  BUT:
 
  Sequential numbering is a special case of now.  In large diameters,
 now
  gets very expensive.  This is a special case of that assertion.  If
 there
  is
  a way to get away from this presumption of the need for sequential
  numbering, you will be miles better off.
 
  HOWEVER:
 
  ZK can do better than you suggest.  Incrementing a counter does involve
  potential contention, but you will very likely be able to get to pretty
  high
  rates before the optimistic locking begins to fail.  If you code your
  update
  with a few tries at full speed followed by some form of retry back-off,
 you
  should get pretty close to the best possible performance.
 
  You might also try building a lock with an ephemeral file before
 updating
  the counter.  I would expect that this will be slower than the back-off
  option if only because involves more transactions in ZK.  IF you wanted
 to
  get too complicated for your own good, you could have a secondary
 strategy
  flag that is only sampled by all clients every few seconds and is
 updated
  whenever a client needs to back-off more than say 5 steps.  If this flag
  has
  been updated recently, then clients should switch to the locking
 protocol.
   You might even have several locks so that you don't exclude all other
  updaters, merely thin them out a bit.  This flagged strategy would run
 as
  fast as optimistic locking as long as optimistic locking is fast and
 then
  would limit the total number of transactions needed under very high
 load.
 
  On Thu, Aug 5, 2010 at 3:31 PM, Jonathan Holloway 
  jonathan.hollo...@gmail.com wrote:
 
   My so far involve:
   a) Creating a node with PERSISTENT_SEQUENTIAL then deleting it - this
  gives
   me the monotonically increasing number, but the sequence number isn't
   contiguous
   b) Storing the sequence number in the data portion of a persistent
 node -
   then updating this (using the version number - aka optimistic
 locking).
    The
   problem with this is that under high load I'm assuming there'll be a
 lot
  of
   contention and hence failures with regards to updates.
  
 
 




Re: Sequence Number Generation With Zookeeper

2010-08-10 Thread David Rosenstrauch
Good news!  I got approval to release this code!  (Man, I love working 
for a startup!!!)  :-)


So anyone know:  what's the next step?  Do I need to obtain commit 
privileges?  Or do I deliver the code to someone who has commit privs 
who shepherds this for me?


Also, what (if anything) do I need to tweak in the code to make it 
release-ready.  (e.g., Change package names?  Slap an Apache license on 
it?  etc.)


Thanks,

DR

On 08/06/2010 10:39 PM, David Rosenstrauch wrote:

I'll run it by my boss next week.

DR

On 08/06/2010 07:30 PM, Mahadev Konar wrote:

Hi David,
I think it would be really useful. It would be very helpful for someone
looking for geenrating unique tokens/generations ids ( I can think of
plenty
of applications for this).

Please do consider contributing it back to the community!

Thanks
mahadev


On 8/6/10 7:10 AM, David Rosenstrauchdar...@darose.net wrote:


Perhaps. I'd have to ask my boss for permission to release the code.

Is this something that would be interesting/useful to other people? If
so, I can ask about it.

DR

On 08/05/2010 11:02 PM, Jonathan Holloway wrote:

Hi David,

We did discuss potentially doing this as well. It would be nice to
get some
recipes for Zookeeper done for this area, if people think it's
useful. Were
you thinking of submitting this back as a recipe, if not then I could
potentially work on such a recipe instead.

Many thanks,
Jon.



I just ran into this exact situation, and handled it like so:

I wrote a library that uses the option (b) you described above. Only
instead of requesting a single sequence number, you request a block
of them
at a time from Zookeeper, and then locally use them up one by one
from the
block you retrieved. Retrieving by block (e.g., by blocks of 1
at a
time) eliminates the contention issue.

Then, if you're finished assigning ID's from that block, but still
have a
bunch of ID's left in the block, the library has another function
to push
back the unused ID's. They'll then get pulled again in the next block
retrieval.

We don't actually have this code running in production yet, so I can't
vouch for how well it works. But the design was reviewed and given the
thumbs up by the core developers on the team, and the
implementation passes
all my unit tests.

HTH. Feel free to email back with specific questions if you'd like
more
details.

DR




Re: Sequence Number Generation With Zookeeper

2010-08-10 Thread Patrick Hunt

Great!

Basic details are here (create a jira, attach a patch, click submit 
and someone will review and help you get it into a state which we can 
commit). Probably you'd put your code into src/recipes or src/contrib 
(recipes sounds reasonable).

http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute

Patrick

On 08/10/2010 09:59 AM, David Rosenstrauch wrote:

Good news!  I got approval to release this code!  (Man, I love working
for a startup!!!) :-)

So anyone know: what's the next step? Do I need to obtain commit
privileges? Or do I deliver the code to someone who has commit privs who
shepherds this for me?

Also, what (if anything) do I need to tweak in the code to make it
release-ready. (e.g., Change package names? Slap an Apache license on
it? etc.)

Thanks,

DR

On 08/06/2010 10:39 PM, David Rosenstrauch wrote:

I'll run it by my boss next week.

DR

On 08/06/2010 07:30 PM, Mahadev Konar wrote:

Hi David,
I think it would be really useful. It would be very helpful for someone
looking for geenrating unique tokens/generations ids ( I can think of
plenty
of applications for this).

Please do consider contributing it back to the community!

Thanks
mahadev


On 8/6/10 7:10 AM, David Rosenstrauchdar...@darose.net wrote:


Perhaps. I'd have to ask my boss for permission to release the code.

Is this something that would be interesting/useful to other people? If
so, I can ask about it.

DR

On 08/05/2010 11:02 PM, Jonathan Holloway wrote:

Hi David,

We did discuss potentially doing this as well. It would be nice to
get some
recipes for Zookeeper done for this area, if people think it's
useful. Were
you thinking of submitting this back as a recipe, if not then I could
potentially work on such a recipe instead.

Many thanks,
Jon.



I just ran into this exact situation, and handled it like so:

I wrote a library that uses the option (b) you described above. Only
instead of requesting a single sequence number, you request a block
of them
at a time from Zookeeper, and then locally use them up one by one
from the
block you retrieved. Retrieving by block (e.g., by blocks of 1
at a
time) eliminates the contention issue.

Then, if you're finished assigning ID's from that block, but still
have a
bunch of ID's left in the block, the library has another function
to push
back the unused ID's. They'll then get pulled again in the next
block
retrieval.

We don't actually have this code running in production yet, so I
can't
vouch for how well it works. But the design was reviewed and given
the
thumbs up by the core developers on the team, and the
implementation passes
all my unit tests.

HTH. Feel free to email back with specific questions if you'd like
more
details.

DR




Re: Sequence Number Generation With Zookeeper

2010-08-10 Thread David Rosenstrauch
OK, will do, as soon as time permits.  It'll take me a little while to 
do the needed tweaks.  (Plus I'm under some pretty heavy deadline on 
some other work right now.)  Will email back once I've got this done.


DR

On 08/10/2010 01:10 PM, Patrick Hunt wrote:

Great!

Basic details are here (create a jira, attach a patch, click submit
and someone will review and help you get it into a state which we can
commit). Probably you'd put your code into src/recipes or src/contrib
(recipes sounds reasonable).
http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute

Patrick

On 08/10/2010 09:59 AM, David Rosenstrauch wrote:

Good news! I got approval to release this code! (Man, I love working
for a startup!!!) :-)

So anyone know: what's the next step? Do I need to obtain commit
privileges? Or do I deliver the code to someone who has commit privs who
shepherds this for me?

Also, what (if anything) do I need to tweak in the code to make it
release-ready. (e.g., Change package names? Slap an Apache license on
it? etc.)

Thanks,

DR


Re: Sequence Number Generation With Zookeeper

2010-08-07 Thread Me
Hi all,

we have something implementing the optimistic concurrency approach to
sequence generation that we've been running in production for some time now.
We don't see a huge amount of contention over the sequence counters as the
nature of our app lends itself well to partitioned keys. Initially, we coded
up the simplest thing we thought could work and deployed it, figuring that
we'd have plenty of scope for improvement once we saw it running with real
load. However, to date its been ticking over so well we've not really had
cause to spend any further effort on it.

There's plenty of scope for improvement though, two of the things we had
thought we would need to do sooner rather than later are implement an
exponential backoff scheme (like Ted describes) when there is contention
over a given counter, and to add a more performant network interface than
HTTP. Like I say though, this just hasn't been a high enough priority for us
yet.

Anyway, we've been meaning to open source this for a while now, and prompted
by this thread, I just spent an afternoon tidying up a little and pushing to
github. Its at http://github.com/talisplatform/H1  and any feedback would be
gratefully received.

Cheers,
Sam

On 7 August 2010 03:40, Ted Dunning ted.dunn...@gmail.com wrote:

 Tell him that we will all look over your code so he gets immediate free
 consulting.

 On Fri, Aug 6, 2010 at 7:39 PM, David Rosenstrauch dar...@darose.net
 wrote:

  I'll run it by my boss next week.
 
  DR
 
 
  On 08/06/2010 07:30 PM, Mahadev Konar wrote:
 
  Hi David,
   I think it would be really useful. It would be very helpful for someone
  looking for geenrating unique tokens/generations ids ( I can think of
  plenty
  of applications for this).
 
  Please do consider contributing it back to the community!
 
  Thanks
  mahadev
 
 
  On 8/6/10 7:10 AM, David Rosenstrauchdar...@darose.net  wrote:
 
   Perhaps.  I'd have to ask my boss for permission to release the code.
 
  Is this something that would be interesting/useful to other people?  If
  so, I can ask about it.
 
  DR
 
  On 08/05/2010 11:02 PM, Jonathan Holloway wrote:
 
  Hi David,
 
  We did discuss potentially doing this as well.  It would be nice to
 get
  some
  recipes for Zookeeper done for this area, if people think it's useful.
   Were
  you thinking of submitting this back as a recipe, if not then I could
  potentially work on such a recipe instead.
 
  Many thanks,
  Jon.
 
 
   I just ran into this exact situation, and handled it like so:
 
  I wrote a library that uses the option (b) you described above.  Only
  instead of requesting a single sequence number, you request a block
 of
  them
  at a time from Zookeeper, and then locally use them up one by one
 from
  the
  block you retrieved.  Retrieving by block (e.g., by blocks of 1
 at
  a
  time) eliminates the contention issue.
 
  Then, if you're finished assigning ID's from that block, but still
 have
  a
  bunch of ID's left in the block, the library has another function to
  push
  back the unused ID's.  They'll then get pulled again in the next
 block
  retrieval.
 
  We don't actually have this code running in production yet, so I
 can't
  vouch for how well it works.  But the design was reviewed and given
 the
  thumbs up by the core developers on the team, and the implementation
  passes
  all my unit tests.
 
  HTH.  Feel free to email back with specific questions if you'd like
  more
  details.
 
  DR
 
 



Re: Sequence Number Generation With Zookeeper

2010-08-06 Thread David Rosenstrauch

Perhaps.  I'd have to ask my boss for permission to release the code.

Is this something that would be interesting/useful to other people?  If 
so, I can ask about it.


DR

On 08/05/2010 11:02 PM, Jonathan Holloway wrote:

Hi David,

We did discuss potentially doing this as well.  It would be nice to get some
recipes for Zookeeper done for this area, if people think it's useful.  Were
you thinking of submitting this back as a recipe, if not then I could
potentially work on such a recipe instead.

Many thanks,
Jon.



I just ran into this exact situation, and handled it like so:

I wrote a library that uses the option (b) you described above.  Only
instead of requesting a single sequence number, you request a block of them
at a time from Zookeeper, and then locally use them up one by one from the
block you retrieved.  Retrieving by block (e.g., by blocks of 1 at a
time) eliminates the contention issue.

Then, if you're finished assigning ID's from that block, but still have a
bunch of ID's left in the block, the library has another function to push
back the unused ID's.  They'll then get pulled again in the next block
retrieval.

We don't actually have this code running in production yet, so I can't
vouch for how well it works.  But the design was reviewed and given the
thumbs up by the core developers on the team, and the implementation passes
all my unit tests.

HTH.  Feel free to email back with specific questions if you'd like more
details.

DR






Re: Sequence Number Generation With Zookeeper

2010-08-06 Thread Mahadev Konar
Hi David,
 I think it would be really useful. It would be very helpful for someone
looking for geenrating unique tokens/generations ids ( I can think of plenty
of applications for this).

Please do consider contributing it back to the community!

Thanks
mahadev


On 8/6/10 7:10 AM, David Rosenstrauch dar...@darose.net wrote:

 Perhaps.  I'd have to ask my boss for permission to release the code.
 
 Is this something that would be interesting/useful to other people?  If
 so, I can ask about it.
 
 DR
 
 On 08/05/2010 11:02 PM, Jonathan Holloway wrote:
 Hi David,
 
 We did discuss potentially doing this as well.  It would be nice to get some
 recipes for Zookeeper done for this area, if people think it's useful.  Were
 you thinking of submitting this back as a recipe, if not then I could
 potentially work on such a recipe instead.
 
 Many thanks,
 Jon.
 
 
 I just ran into this exact situation, and handled it like so:
 
 I wrote a library that uses the option (b) you described above.  Only
 instead of requesting a single sequence number, you request a block of them
 at a time from Zookeeper, and then locally use them up one by one from the
 block you retrieved.  Retrieving by block (e.g., by blocks of 1 at a
 time) eliminates the contention issue.
 
 Then, if you're finished assigning ID's from that block, but still have a
 bunch of ID's left in the block, the library has another function to push
 back the unused ID's.  They'll then get pulled again in the next block
 retrieval.
 
 We don't actually have this code running in production yet, so I can't
 vouch for how well it works.  But the design was reviewed and given the
 thumbs up by the core developers on the team, and the implementation passes
 all my unit tests.
 
 HTH.  Feel free to email back with specific questions if you'd like more
 details.
 
 DR
 
 
 



Sequence Number Generation With Zookeeper

2010-08-05 Thread Jonathan Holloway
Hi all,

I'm looking at using Zookeeper for distributed sequence number generation.
 What's the best way to do this currently?  Is there a particular recipe
available for this?

My so far involve:
a) Creating a node with PERSISTENT_SEQUENTIAL then deleting it - this gives
me the monotonically increasing number, but the sequence number isn't
contiguous
b) Storing the sequence number in the data portion of a persistent node -
then updating this (using the version number - aka optimistic locking).  The
problem with this is that under high load I'm assuming there'll be a lot of
contention and hence failures with regards to updates.

What are your thoughts on the above?

Many thanks,
Jon.


Re: Sequence Number Generation With Zookeeper

2010-08-05 Thread Ted Dunning
(b)

BUT:

Sequential numbering is a special case of now.  In large diameters, now
gets very expensive.  This is a special case of that assertion.  If there is
a way to get away from this presumption of the need for sequential
numbering, you will be miles better off.

HOWEVER:

ZK can do better than you suggest.  Incrementing a counter does involve
potential contention, but you will very likely be able to get to pretty high
rates before the optimistic locking begins to fail.  If you code your update
with a few tries at full speed followed by some form of retry back-off, you
should get pretty close to the best possible performance.

You might also try building a lock with an ephemeral file before updating
the counter.  I would expect that this will be slower than the back-off
option if only because involves more transactions in ZK.  IF you wanted to
get too complicated for your own good, you could have a secondary strategy
flag that is only sampled by all clients every few seconds and is updated
whenever a client needs to back-off more than say 5 steps.  If this flag has
been updated recently, then clients should switch to the locking protocol.
 You might even have several locks so that you don't exclude all other
updaters, merely thin them out a bit.  This flagged strategy would run as
fast as optimistic locking as long as optimistic locking is fast and then
would limit the total number of transactions needed under very high load.

On Thu, Aug 5, 2010 at 3:31 PM, Jonathan Holloway 
jonathan.hollo...@gmail.com wrote:

 My so far involve:
 a) Creating a node with PERSISTENT_SEQUENTIAL then deleting it - this gives
 me the monotonically increasing number, but the sequence number isn't
 contiguous
 b) Storing the sequence number in the data portion of a persistent node -
 then updating this (using the version number - aka optimistic locking).
  The
 problem with this is that under high load I'm assuming there'll be a lot of
 contention and hence failures with regards to updates.



Re: Sequence Number Generation With Zookeeper

2010-08-05 Thread Jonathan Holloway
Hi Ted,

Thanks for the comments.

I might have overlooked something here, but is it also possible to do the
following:

1. Create a PERSISTENT node
2. Have multiple clients set the data on the node, e.g.  Stat stat =
zookeeper.setData(SEQUENCE, ArrayUtils.EMPTY_BYTE_ARRAY, -1);
3. Use the version number from stat.getVersion() as the sequence (obviously
I'm limited to Integer.MAX_VALUE)

Are there any weird race conditions involved here which would mean that a
client would receive the wrong Stat object back?

Many thanks again,
Jon.

On 5 August 2010 16:09, Ted Dunning ted.dunn...@gmail.com wrote:

 (b)

 BUT:

 Sequential numbering is a special case of now.  In large diameters, now
 gets very expensive.  This is a special case of that assertion.  If there
 is
 a way to get away from this presumption of the need for sequential
 numbering, you will be miles better off.

 HOWEVER:

 ZK can do better than you suggest.  Incrementing a counter does involve
 potential contention, but you will very likely be able to get to pretty
 high
 rates before the optimistic locking begins to fail.  If you code your
 update
 with a few tries at full speed followed by some form of retry back-off, you
 should get pretty close to the best possible performance.

 You might also try building a lock with an ephemeral file before updating
 the counter.  I would expect that this will be slower than the back-off
 option if only because involves more transactions in ZK.  IF you wanted to
 get too complicated for your own good, you could have a secondary strategy
 flag that is only sampled by all clients every few seconds and is updated
 whenever a client needs to back-off more than say 5 steps.  If this flag
 has
 been updated recently, then clients should switch to the locking protocol.
  You might even have several locks so that you don't exclude all other
 updaters, merely thin them out a bit.  This flagged strategy would run as
 fast as optimistic locking as long as optimistic locking is fast and then
 would limit the total number of transactions needed under very high load.

 On Thu, Aug 5, 2010 at 3:31 PM, Jonathan Holloway 
 jonathan.hollo...@gmail.com wrote:

  My so far involve:
  a) Creating a node with PERSISTENT_SEQUENTIAL then deleting it - this
 gives
  me the monotonically increasing number, but the sequence number isn't
  contiguous
  b) Storing the sequence number in the data portion of a persistent node -
  then updating this (using the version number - aka optimistic locking).
   The
  problem with this is that under high load I'm assuming there'll be a lot
 of
  contention and hence failures with regards to updates.
 



Re: Sequence Number Generation With Zookeeper

2010-08-05 Thread Ted Dunning
Sounds right to me.  Much simpler as well.

On Thu, Aug 5, 2010 at 5:38 PM, Jonathan Holloway 
jonathan.hollo...@gmail.com wrote:

 Hi Ted,

 Thanks for the comments.

 I might have overlooked something here, but is it also possible to do the
 following:

 1. Create a PERSISTENT node
 2. Have multiple clients set the data on the node, e.g.  Stat stat =
 zookeeper.setData(SEQUENCE, ArrayUtils.EMPTY_BYTE_ARRAY, -1);
 3. Use the version number from stat.getVersion() as the sequence (obviously
 I'm limited to Integer.MAX_VALUE)

 Are there any weird race conditions involved here which would mean that a
 client would receive the wrong Stat object back?

 Many thanks again,
 Jon.

 On 5 August 2010 16:09, Ted Dunning ted.dunn...@gmail.com wrote:

  (b)
 
  BUT:
 
  Sequential numbering is a special case of now.  In large diameters, now
  gets very expensive.  This is a special case of that assertion.  If there
  is
  a way to get away from this presumption of the need for sequential
  numbering, you will be miles better off.
 
  HOWEVER:
 
  ZK can do better than you suggest.  Incrementing a counter does involve
  potential contention, but you will very likely be able to get to pretty
  high
  rates before the optimistic locking begins to fail.  If you code your
  update
  with a few tries at full speed followed by some form of retry back-off,
 you
  should get pretty close to the best possible performance.
 
  You might also try building a lock with an ephemeral file before updating
  the counter.  I would expect that this will be slower than the back-off
  option if only because involves more transactions in ZK.  IF you wanted
 to
  get too complicated for your own good, you could have a secondary
 strategy
  flag that is only sampled by all clients every few seconds and is updated
  whenever a client needs to back-off more than say 5 steps.  If this flag
  has
  been updated recently, then clients should switch to the locking
 protocol.
   You might even have several locks so that you don't exclude all other
  updaters, merely thin them out a bit.  This flagged strategy would run as
  fast as optimistic locking as long as optimistic locking is fast and then
  would limit the total number of transactions needed under very high load.
 
  On Thu, Aug 5, 2010 at 3:31 PM, Jonathan Holloway 
  jonathan.hollo...@gmail.com wrote:
 
   My so far involve:
   a) Creating a node with PERSISTENT_SEQUENTIAL then deleting it - this
  gives
   me the monotonically increasing number, but the sequence number isn't
   contiguous
   b) Storing the sequence number in the data portion of a persistent node
 -
   then updating this (using the version number - aka optimistic locking).
The
   problem with this is that under high load I'm assuming there'll be a
 lot
  of
   contention and hence failures with regards to updates.
  
 



Re: Sequence Number Generation With Zookeeper

2010-08-05 Thread David Rosenstrauch

On 08/05/2010 06:31 PM, Jonathan Holloway wrote:

Hi all,

I'm looking at using Zookeeper for distributed sequence number generation.
  What's the best way to do this currently?  Is there a particular recipe
available for this?

My so far involve:
a) Creating a node with PERSISTENT_SEQUENTIAL then deleting it - this gives
me the monotonically increasing number, but the sequence number isn't
contiguous
b) Storing the sequence number in the data portion of a persistent node -
then updating this (using the version number - aka optimistic locking).  The
problem with this is that under high load I'm assuming there'll be a lot of
contention and hence failures with regards to updates.

What are your thoughts on the above?

Many thanks,
Jon.


I just ran into this exact situation, and handled it like so:

I wrote a library that uses the option (b) you described above.  Only 
instead of requesting a single sequence number, you request a block of 
them at a time from Zookeeper, and then locally use them up one by one 
from the block you retrieved.  Retrieving by block (e.g., by blocks of 
1 at a time) eliminates the contention issue.


Then, if you're finished assigning ID's from that block, but still have 
a bunch of ID's left in the block, the library has another function to 
push back the unused ID's.  They'll then get pulled again in the next 
block retrieval.


We don't actually have this code running in production yet, so I can't 
vouch for how well it works.  But the design was reviewed and given the 
thumbs up by the core developers on the team, and the implementation 
passes all my unit tests.


HTH.  Feel free to email back with specific questions if you'd like more 
details.


DR


Re: Sequence Number Generation With Zookeeper

2010-08-05 Thread Jonathan Holloway
Hi David,

We did discuss potentially doing this as well.  It would be nice to get some
recipes for Zookeeper done for this area, if people think it's useful.  Were
you thinking of submitting this back as a recipe, if not then I could
potentially work on such a recipe instead.

Many thanks,
Jon.


 I just ran into this exact situation, and handled it like so:

 I wrote a library that uses the option (b) you described above.  Only
 instead of requesting a single sequence number, you request a block of them
 at a time from Zookeeper, and then locally use them up one by one from the
 block you retrieved.  Retrieving by block (e.g., by blocks of 1 at a
 time) eliminates the contention issue.

 Then, if you're finished assigning ID's from that block, but still have a
 bunch of ID's left in the block, the library has another function to push
 back the unused ID's.  They'll then get pulled again in the next block
 retrieval.

 We don't actually have this code running in production yet, so I can't
 vouch for how well it works.  But the design was reviewed and given the
 thumbs up by the core developers on the team, and the implementation passes
 all my unit tests.

 HTH.  Feel free to email back with specific questions if you'd like more
 details.

 DR