Re: Unique Id Generation

2009-04-24 Thread Ted Dunning
I would expect Ben's method to be slightly faster, but they should be comparable. And, of course you are correct about rewind. Such are the perils of writing code in the email program. On Fri, Apr 24, 2009 at 10:01 AM, Satish Bhatti wrote: > ... Your approach appears to be the fastest, so I th

Re: Unique Id Generation

2009-04-24 Thread Mahadev Konar
Hi Satish, take a look at http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperAdmin.html#sc_mainte nance This can be run as a cron job and will get rid of old unwanted logs and snapshots. mahadev On 4/24/09 10:18 AM, "Satish Bhatti" wrote: > A follow up to this: I implemented method (b

Re: Unique Id Generation

2009-04-24 Thread Satish Bhatti
A follow up to this: I implemented method (b), and ran a test that generated 100K of ids. This generated 1.3G worth of transaction logs. Question: when can these be safely deleted? How does one know which ones may be deleted? Or do they need to exist forever? On Fri, Apr 24, 2009 at 9:52 AM,

Re: Unique Id Generation

2009-04-24 Thread Satish Bhatti
Hello Ted, Your approach appears to be the fastest, so I think I will go with it. By the way, it should be buf.rewind() not buf.reset(). Satish On Thu, Apr 23, 2009 at 6:28 PM, Ted Dunning wrote: > I don't think you meant ephemeral nodes because it isn't very likely that > you would have more

Re: Unique Id Generation

2009-04-24 Thread Ted Dunning
Of the methods proposed, a) recursive sequential files b) latest state file(s) that is updated using a pseudo transaction to give a range of numbers to allocate c) just probe zxid You should be pretty good with any of them. With (a), you have to be careful to avoid race conditions when you get

Re: Unique Id Generation

2009-04-24 Thread Satish Bhatti
Hello Ben, Basically the ids are document Ids. We will eventually have several billion documents in our system, and each has a unique long id. Currently we are using a database sequence to generate these longs. Having eliminated other uses of the database, we didn't want to keep it around just t

Re: Unique Id Generation

2009-04-24 Thread Benjamin Reed
i'm not exactly clear how you use these ideas, but one source of unique ids that are longs is the zxid. if you create a znode, everytime you write to it, you will get a unique zxid in the mzxid member of the stat structure. (you get the stat structure back in the response to the setData.) ben

Re: Unique Id Generation

2009-04-23 Thread Satish
Thank you ted for your solution. I think I will implement this one too along with the one mahadev suggested. The ids will be long lived, they are used to uniquely identify documents in our system. The way I was using the ephemeral ids in my simplistic solution was to create a node and then

Re: Unique Id Generation

2009-04-23 Thread Satish
Thanks mahadev, that's a simple and elegant solution. I feel pretty dumb not thinking of it myself! :( it should be very straightforward to implement too. We were using a database to store blobs and to generate ids. I replaced the blob storage with hadoop hdfs and the ids with zookeeper.

Re: Unique Id Generation

2009-04-23 Thread Ted Dunning
I don't think you meant ephemeral nodes because it isn't very likely that you would have more than a billion sessions attached to a single zookeeper cluster. If you simply want to have a guaranteed unique value among all live owners of these id's, then ephemeral sequential nodes are fine and integ

Re: Unique Id Generation

2009-04-23 Thread Mahadev Konar
Hi Satish, Most of the sequences (versions of nodes ) and the sequence flags are ints. We do have plans to move it to long. But in your case I can imagine you can split a long into 2 32 bits - Parent (which is int) -> child(which is int) Now after you run out of child epehemarls then you should c

Unique Id Generation

2009-04-23 Thread Satish Bhatti
We currently use a database sequence to generate unique ids for use by our application. I was thinking about using ZooKeeper instead so I can get rid of the database. My plan was to use the sequential id from ephemeral nodes, but looking at the code it appears that this is an int, not a long. Is