Re: Unique Id Generation

2009-04-24 Thread Satish
Thank you ted for your solution. I think I will implement this one too  
along with the one mahadev suggested. The ids will be long lived, they  
are used to uniquely identify documents in our system.  The way I was  
using the ephemeral ids in my simplistic solution was to create a node  
and then immediately delete it after grabbing the ephemeral Id off it.


Sent from my iPhone

On Apr 23, 2009, at 6:28 PM, Ted Dunning ted.dunn...@gmail.com wrote:

I don't think you meant ephemeral nodes because it isn't very likely  
that
you would have more than a billion sessions attached to a single  
zookeeper
cluster.  If you simply want to have a guaranteed unique value among  
all
live owners of these id's, then ephemeral sequential nodes are fine  
and

integers are also probably fine.

If you want longer-lived uniqueness, then you need something  
stronger.  If
your required rate of generating these values is relatively low,  
then you
can keep the current maximum (long) value of an id in a file on  
Zookeeper.

When you need to generate a new id, do this:

   public long nextId(String state) throws InterruptedException,
KeeperException {
   Stat s = new Stat();
   boolean committed = false;
   long id = 0;
   while (!committed) {
   ByteBuffer buf = ByteBuffer.wrap(zk.getData(state, false,  
s));

   id = buf.getLong();
   id++;
   buf.reset();
   buf.putLong(id);
   try {
   zk.setData(state, buf.array(), s.getVersion());
   committed = true;
   } catch (KeeperException.BadVersionException e) {
   committed = false;
   } catch (InterruptedException e) {
   // at this point, we don't know that our update  
happened.

Since it is
   // not a problem to redo this and because this  
situation

should be extremely
   // rare, we will just pretend the update failed.
   committed = false;
   }
   }
   return id;
   }

This gives you long id's that will not repeat.  It will be somewhat  
limited
in the number of id's you can create per second, especially if you  
have
thousands of nodes all asking for id's.  You could increase that  
rate by

randomly selecting one of *n* files each of which corresponds to an
different disjoint region in the id space.  Even so, with a good  
zookeeper
cluster, you should be able to generate 10,000 id's per second or  
more.


Another way to substantially increase the id generation rate would  
be to
allocate 1000 id's per call to zookeeper.  You give up on  
consecutive id's

with that approach, but you should be able to generate millions of
guaranteed unique id's per second and these id's should be pretty  
dense if

each process generates lots of id's.

On Thu, Apr 23, 2009 at 4:52 PM, Satish Bhatti cthd2...@gmail.com  
wrote:


We currently use a database sequence to generate unique ids for use  
by our
application.  I was thinking about using ZooKeeper instead so I can  
get rid

of the database.  My plan was to use the sequential id from ephemeral
nodes,
but looking at the code it appears that this is an int, not a  
long.  Is

there any other straightforward way to generate ids using ZooKeeper?
Thanks,

Satish



Re: Unique Id Generation

2009-04-24 Thread Benjamin Reed
i'm not exactly clear how you use these ideas, but one source of unique 
ids that are longs is the zxid. if you create a znode, everytime you 
write to it, you will get a unique zxid in the mzxid member of the stat 
structure. (you get the stat structure back in the response to the setData.)


ben

Mahadev Konar wrote:

Hi Satish,
 Most of the sequences (versions of nodes ) and the sequence flags are ints.
We do have plans to move it to long.
But in your case I can imagine you can split a long into 2 32 bits -

Parent (which is int) - child(which is int)
Now after you run out of child epehemarls then you should create a node
Parent + 1
Remove parent 
And then start creating an ephemeral child


(so parent (32 bits) and child (32 bits)) would form a long.

I don't think this should be very hard to implement. Their is nothing in
zookeeper (out of the box) currently that would help you out.

Mahadev
 
On 4/23/09 4:52 PM, Satish Bhatti cthd2...@gmail.com wrote:


  

We currently use a database sequence to generate unique ids for use by our
application.  I was thinking about using ZooKeeper instead so I can get rid
of the database.  My plan was to use the sequential id from ephemeral nodes,
but looking at the code it appears that this is an int, not a long.  Is
there any other straightforward way to generate ids using ZooKeeper?
Thanks,

Satish



  




Re: Unique Id Generation

2009-04-24 Thread Satish Bhatti
Hello Ben,
Basically the ids are document Ids.  We will eventually have several billion
documents in our system, and each has a unique long id.  Currently we are
using a database sequence to generate these longs.  Having eliminated other
uses of the database, we didn't want to keep it around just to generate ids.
 That is why I am looking to use ZooKeeper to generate them instead.

Satish

On Fri, Apr 24, 2009 at 8:27 AM, Benjamin Reed br...@yahoo-inc.com wrote:

 i'm not exactly clear how you use these ideas, but one source of unique ids
 that are longs is the zxid. if you create a znode, everytime you write to
 it, you will get a unique zxid in the mzxid member of the stat structure.
 (you get the stat structure back in the response to the setData.)

 ben


 Mahadev Konar wrote:

 Hi Satish,
  Most of the sequences (versions of nodes ) and the sequence flags are
 ints.
 We do have plans to move it to long.
 But in your case I can imagine you can split a long into 2 32 bits -

 Parent (which is int) - child(which is int)
 Now after you run out of child epehemarls then you should create a node
 Parent + 1
 Remove parent And then start creating an ephemeral child

 (so parent (32 bits) and child (32 bits)) would form a long.

 I don't think this should be very hard to implement. Their is nothing in
 zookeeper (out of the box) currently that would help you out.

 Mahadev
  On 4/23/09 4:52 PM, Satish Bhatti cthd2...@gmail.com wrote:



 We currently use a database sequence to generate unique ids for use by
 our
 application.  I was thinking about using ZooKeeper instead so I can get
 rid
 of the database.  My plan was to use the sequential id from ephemeral
 nodes,
 but looking at the code it appears that this is an int, not a long.  Is
 there any other straightforward way to generate ids using ZooKeeper?
 Thanks,

 Satish









Re: Unique Id Generation

2009-04-24 Thread Satish Bhatti
Hello Ted,
Your approach appears to be the fastest, so I think I will go with it.  By
the way, it should be buf.rewind() not buf.reset().

Satish

On Thu, Apr 23, 2009 at 6:28 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 I don't think you meant ephemeral nodes because it isn't very likely that
 you would have more than a billion sessions attached to a single zookeeper
 cluster.  If you simply want to have a guaranteed unique value among all
 live owners of these id's, then ephemeral sequential nodes are fine and
 integers are also probably fine.

 If you want longer-lived uniqueness, then you need something stronger.  If
 your required rate of generating these values is relatively low, then you
 can keep the current maximum (long) value of an id in a file on Zookeeper.
 When you need to generate a new id, do this:

public long nextId(String state) throws InterruptedException,
 KeeperException {
Stat s = new Stat();
boolean committed = false;
long id = 0;
while (!committed) {
ByteBuffer buf = ByteBuffer.wrap(zk.getData(state, false, s));
id = buf.getLong();
id++;
buf.reset();
buf.putLong(id);
try {
zk.setData(state, buf.array(), s.getVersion());
committed = true;
} catch (KeeperException.BadVersionException e) {
committed = false;
} catch (InterruptedException e) {
// at this point, we don't know that our update happened.
 Since it is
// not a problem to redo this and because this situation
 should be extremely
// rare, we will just pretend the update failed.
committed = false;
}
}
return id;
}

 This gives you long id's that will not repeat.  It will be somewhat limited
 in the number of id's you can create per second, especially if you have
 thousands of nodes all asking for id's.  You could increase that rate by
 randomly selecting one of *n* files each of which corresponds to an
 different disjoint region in the id space.  Even so, with a good zookeeper
 cluster, you should be able to generate 10,000 id's per second or more.

 Another way to substantially increase the id generation rate would be to
 allocate 1000 id's per call to zookeeper.  You give up on consecutive id's
 with that approach, but you should be able to generate millions of
 guaranteed unique id's per second and these id's should be pretty dense if
 each process generates lots of id's.

 On Thu, Apr 23, 2009 at 4:52 PM, Satish Bhatti cthd2...@gmail.com wrote:

  We currently use a database sequence to generate unique ids for use by
 our
  application.  I was thinking about using ZooKeeper instead so I can get
 rid
  of the database.  My plan was to use the sequential id from ephemeral
  nodes,
  but looking at the code it appears that this is an int, not a long.  Is
  there any other straightforward way to generate ids using ZooKeeper?
  Thanks,
 
  Satish
 



Re: Unique Id Generation

2009-04-24 Thread Satish Bhatti
A follow up to this:  I implemented method (b), and ran a test that
generated 100K of ids.  This generated 1.3G worth of transaction logs.
 Question:  when can these be safely deleted?  How does one know which ones
may be deleted?  Or do they need to exist forever?

On Fri, Apr 24, 2009 at 9:52 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 Of the methods proposed,

 a) recursive sequential files

 b) latest state file(s) that is updated using a pseudo transaction to give
 a
 range of numbers to allocate

 c) just probe zxid

 You should be pretty good with any of them.  With (a), you have to be
 careful to avoid race conditions when you get to the end of the range for
 the sub-level.  With (b), you get results of guaranteed nature although the
 highest throughput versions might have gaps (shouldn't bother you).  The
 code for this is more complex than the other implementations.  With (c),
 you
 could have potentially large gaps in the sequence, but 64 bits that
 shouldn't be a big deal.  Code for that version would be the simplest of
 any
 of them.

 On Fri, Apr 24, 2009 at 8:56 AM, Satish Bhatti cthd2...@gmail.com wrote:

  Hello Ben,
  Basically the ids are document Ids.  We will eventually have several
  billion
  documents in our system, and each has a unique long id.  Currently we are
  using a database sequence to generate these longs.  Having eliminated
 other
  uses of the database, we didn't want to keep it around just to generate
  ids.
   That is why I am looking to use ZooKeeper to generate them instead.
 
 



Re: Unique Id Generation

2009-04-24 Thread Ted Dunning
I would expect Ben's method to be slightly faster, but they should be
comparable.

And, of course you are correct about rewind.  Such are the perils of writing
code in the email program.

On Fri, Apr 24, 2009 at 10:01 AM, Satish Bhatti cthd2...@gmail.com wrote:

 ... Your approach appears to be the fastest, so I think I will go with it.
  By
 the way, it should be buf.rewind() not buf.reset().




Re: Unique Id Generation

2009-04-23 Thread Ted Dunning
I don't think you meant ephemeral nodes because it isn't very likely that
you would have more than a billion sessions attached to a single zookeeper
cluster.  If you simply want to have a guaranteed unique value among all
live owners of these id's, then ephemeral sequential nodes are fine and
integers are also probably fine.

If you want longer-lived uniqueness, then you need something stronger.  If
your required rate of generating these values is relatively low, then you
can keep the current maximum (long) value of an id in a file on Zookeeper.
When you need to generate a new id, do this:

public long nextId(String state) throws InterruptedException,
KeeperException {
Stat s = new Stat();
boolean committed = false;
long id = 0;
while (!committed) {
ByteBuffer buf = ByteBuffer.wrap(zk.getData(state, false, s));
id = buf.getLong();
id++;
buf.reset();
buf.putLong(id);
try {
zk.setData(state, buf.array(), s.getVersion());
committed = true;
} catch (KeeperException.BadVersionException e) {
committed = false;
} catch (InterruptedException e) {
// at this point, we don't know that our update happened.
Since it is
// not a problem to redo this and because this situation
should be extremely
// rare, we will just pretend the update failed.
committed = false;
}
}
return id;
}

This gives you long id's that will not repeat.  It will be somewhat limited
in the number of id's you can create per second, especially if you have
thousands of nodes all asking for id's.  You could increase that rate by
randomly selecting one of *n* files each of which corresponds to an
different disjoint region in the id space.  Even so, with a good zookeeper
cluster, you should be able to generate 10,000 id's per second or more.

Another way to substantially increase the id generation rate would be to
allocate 1000 id's per call to zookeeper.  You give up on consecutive id's
with that approach, but you should be able to generate millions of
guaranteed unique id's per second and these id's should be pretty dense if
each process generates lots of id's.

On Thu, Apr 23, 2009 at 4:52 PM, Satish Bhatti cthd2...@gmail.com wrote:

 We currently use a database sequence to generate unique ids for use by our
 application.  I was thinking about using ZooKeeper instead so I can get rid
 of the database.  My plan was to use the sequential id from ephemeral
 nodes,
 but looking at the code it appears that this is an int, not a long.  Is
 there any other straightforward way to generate ids using ZooKeeper?
 Thanks,

 Satish



Re: Unique Id Generation

2009-04-23 Thread Satish
Thanks mahadev, that's a simple and elegant solution.  I feel pretty  
dumb not thinking of it myself! :(  it should be very straightforward  
to implement too.  We were using a database to store blobs and to  
generate ids. I replaced the blob storage with hadoop hdfs and the ids  
with zookeeper.  Nice work you hadoop guys! :)


Sent from my iPhone

On Apr 23, 2009, at 5:26 PM, Mahadev Konar maha...@yahoo-inc.com  
wrote:



Hi Satish,
Most of the sequences (versions of nodes ) and the sequence flags  
are ints.

We do have plans to move it to long.
But in your case I can imagine you can split a long into 2 32 bits -

Parent (which is int) - child(which is int)
Now after you run out of child epehemarls then you should create a  
node

Parent + 1
Remove parent
And then start creating an ephemeral child

(so parent (32 bits) and child (32 bits)) would form a long.

I don't think this should be very hard to implement. Their is  
nothing in

zookeeper (out of the box) currently that would help you out.

Mahadev

On 4/23/09 4:52 PM, Satish Bhatti cthd2...@gmail.com wrote:

We currently use a database sequence to generate unique ids for use  
by our
application.  I was thinking about using ZooKeeper instead so I can  
get rid
of the database.  My plan was to use the sequential id from  
ephemeral nodes,
but looking at the code it appears that this is an int, not a  
long.  Is

there any other straightforward way to generate ids using ZooKeeper?
Thanks,

Satish