Re: Unique Id Generation
Thank you ted for your solution. I think I will implement this one too along with the one mahadev suggested. The ids will be long lived, they are used to uniquely identify documents in our system. The way I was using the ephemeral ids in my simplistic solution was to create a node and then immediately delete it after grabbing the ephemeral Id off it. Sent from my iPhone On Apr 23, 2009, at 6:28 PM, Ted Dunning ted.dunn...@gmail.com wrote: I don't think you meant ephemeral nodes because it isn't very likely that you would have more than a billion sessions attached to a single zookeeper cluster. If you simply want to have a guaranteed unique value among all live owners of these id's, then ephemeral sequential nodes are fine and integers are also probably fine. If you want longer-lived uniqueness, then you need something stronger. If your required rate of generating these values is relatively low, then you can keep the current maximum (long) value of an id in a file on Zookeeper. When you need to generate a new id, do this: public long nextId(String state) throws InterruptedException, KeeperException { Stat s = new Stat(); boolean committed = false; long id = 0; while (!committed) { ByteBuffer buf = ByteBuffer.wrap(zk.getData(state, false, s)); id = buf.getLong(); id++; buf.reset(); buf.putLong(id); try { zk.setData(state, buf.array(), s.getVersion()); committed = true; } catch (KeeperException.BadVersionException e) { committed = false; } catch (InterruptedException e) { // at this point, we don't know that our update happened. Since it is // not a problem to redo this and because this situation should be extremely // rare, we will just pretend the update failed. committed = false; } } return id; } This gives you long id's that will not repeat. It will be somewhat limited in the number of id's you can create per second, especially if you have thousands of nodes all asking for id's. You could increase that rate by randomly selecting one of *n* files each of which corresponds to an different disjoint region in the id space. Even so, with a good zookeeper cluster, you should be able to generate 10,000 id's per second or more. Another way to substantially increase the id generation rate would be to allocate 1000 id's per call to zookeeper. You give up on consecutive id's with that approach, but you should be able to generate millions of guaranteed unique id's per second and these id's should be pretty dense if each process generates lots of id's. On Thu, Apr 23, 2009 at 4:52 PM, Satish Bhatti cthd2...@gmail.com wrote: We currently use a database sequence to generate unique ids for use by our application. I was thinking about using ZooKeeper instead so I can get rid of the database. My plan was to use the sequential id from ephemeral nodes, but looking at the code it appears that this is an int, not a long. Is there any other straightforward way to generate ids using ZooKeeper? Thanks, Satish
Re: Unique Id Generation
i'm not exactly clear how you use these ideas, but one source of unique ids that are longs is the zxid. if you create a znode, everytime you write to it, you will get a unique zxid in the mzxid member of the stat structure. (you get the stat structure back in the response to the setData.) ben Mahadev Konar wrote: Hi Satish, Most of the sequences (versions of nodes ) and the sequence flags are ints. We do have plans to move it to long. But in your case I can imagine you can split a long into 2 32 bits - Parent (which is int) - child(which is int) Now after you run out of child epehemarls then you should create a node Parent + 1 Remove parent And then start creating an ephemeral child (so parent (32 bits) and child (32 bits)) would form a long. I don't think this should be very hard to implement. Their is nothing in zookeeper (out of the box) currently that would help you out. Mahadev On 4/23/09 4:52 PM, Satish Bhatti cthd2...@gmail.com wrote: We currently use a database sequence to generate unique ids for use by our application. I was thinking about using ZooKeeper instead so I can get rid of the database. My plan was to use the sequential id from ephemeral nodes, but looking at the code it appears that this is an int, not a long. Is there any other straightforward way to generate ids using ZooKeeper? Thanks, Satish
Re: Unique Id Generation
Hello Ben, Basically the ids are document Ids. We will eventually have several billion documents in our system, and each has a unique long id. Currently we are using a database sequence to generate these longs. Having eliminated other uses of the database, we didn't want to keep it around just to generate ids. That is why I am looking to use ZooKeeper to generate them instead. Satish On Fri, Apr 24, 2009 at 8:27 AM, Benjamin Reed br...@yahoo-inc.com wrote: i'm not exactly clear how you use these ideas, but one source of unique ids that are longs is the zxid. if you create a znode, everytime you write to it, you will get a unique zxid in the mzxid member of the stat structure. (you get the stat structure back in the response to the setData.) ben Mahadev Konar wrote: Hi Satish, Most of the sequences (versions of nodes ) and the sequence flags are ints. We do have plans to move it to long. But in your case I can imagine you can split a long into 2 32 bits - Parent (which is int) - child(which is int) Now after you run out of child epehemarls then you should create a node Parent + 1 Remove parent And then start creating an ephemeral child (so parent (32 bits) and child (32 bits)) would form a long. I don't think this should be very hard to implement. Their is nothing in zookeeper (out of the box) currently that would help you out. Mahadev On 4/23/09 4:52 PM, Satish Bhatti cthd2...@gmail.com wrote: We currently use a database sequence to generate unique ids for use by our application. I was thinking about using ZooKeeper instead so I can get rid of the database. My plan was to use the sequential id from ephemeral nodes, but looking at the code it appears that this is an int, not a long. Is there any other straightforward way to generate ids using ZooKeeper? Thanks, Satish
Re: Unique Id Generation
Hello Ted, Your approach appears to be the fastest, so I think I will go with it. By the way, it should be buf.rewind() not buf.reset(). Satish On Thu, Apr 23, 2009 at 6:28 PM, Ted Dunning ted.dunn...@gmail.com wrote: I don't think you meant ephemeral nodes because it isn't very likely that you would have more than a billion sessions attached to a single zookeeper cluster. If you simply want to have a guaranteed unique value among all live owners of these id's, then ephemeral sequential nodes are fine and integers are also probably fine. If you want longer-lived uniqueness, then you need something stronger. If your required rate of generating these values is relatively low, then you can keep the current maximum (long) value of an id in a file on Zookeeper. When you need to generate a new id, do this: public long nextId(String state) throws InterruptedException, KeeperException { Stat s = new Stat(); boolean committed = false; long id = 0; while (!committed) { ByteBuffer buf = ByteBuffer.wrap(zk.getData(state, false, s)); id = buf.getLong(); id++; buf.reset(); buf.putLong(id); try { zk.setData(state, buf.array(), s.getVersion()); committed = true; } catch (KeeperException.BadVersionException e) { committed = false; } catch (InterruptedException e) { // at this point, we don't know that our update happened. Since it is // not a problem to redo this and because this situation should be extremely // rare, we will just pretend the update failed. committed = false; } } return id; } This gives you long id's that will not repeat. It will be somewhat limited in the number of id's you can create per second, especially if you have thousands of nodes all asking for id's. You could increase that rate by randomly selecting one of *n* files each of which corresponds to an different disjoint region in the id space. Even so, with a good zookeeper cluster, you should be able to generate 10,000 id's per second or more. Another way to substantially increase the id generation rate would be to allocate 1000 id's per call to zookeeper. You give up on consecutive id's with that approach, but you should be able to generate millions of guaranteed unique id's per second and these id's should be pretty dense if each process generates lots of id's. On Thu, Apr 23, 2009 at 4:52 PM, Satish Bhatti cthd2...@gmail.com wrote: We currently use a database sequence to generate unique ids for use by our application. I was thinking about using ZooKeeper instead so I can get rid of the database. My plan was to use the sequential id from ephemeral nodes, but looking at the code it appears that this is an int, not a long. Is there any other straightforward way to generate ids using ZooKeeper? Thanks, Satish
Re: Unique Id Generation
A follow up to this: I implemented method (b), and ran a test that generated 100K of ids. This generated 1.3G worth of transaction logs. Question: when can these be safely deleted? How does one know which ones may be deleted? Or do they need to exist forever? On Fri, Apr 24, 2009 at 9:52 AM, Ted Dunning ted.dunn...@gmail.com wrote: Of the methods proposed, a) recursive sequential files b) latest state file(s) that is updated using a pseudo transaction to give a range of numbers to allocate c) just probe zxid You should be pretty good with any of them. With (a), you have to be careful to avoid race conditions when you get to the end of the range for the sub-level. With (b), you get results of guaranteed nature although the highest throughput versions might have gaps (shouldn't bother you). The code for this is more complex than the other implementations. With (c), you could have potentially large gaps in the sequence, but 64 bits that shouldn't be a big deal. Code for that version would be the simplest of any of them. On Fri, Apr 24, 2009 at 8:56 AM, Satish Bhatti cthd2...@gmail.com wrote: Hello Ben, Basically the ids are document Ids. We will eventually have several billion documents in our system, and each has a unique long id. Currently we are using a database sequence to generate these longs. Having eliminated other uses of the database, we didn't want to keep it around just to generate ids. That is why I am looking to use ZooKeeper to generate them instead.
Re: Unique Id Generation
I would expect Ben's method to be slightly faster, but they should be comparable. And, of course you are correct about rewind. Such are the perils of writing code in the email program. On Fri, Apr 24, 2009 at 10:01 AM, Satish Bhatti cthd2...@gmail.com wrote: ... Your approach appears to be the fastest, so I think I will go with it. By the way, it should be buf.rewind() not buf.reset().
Re: Multiple ZooKeeper client instances
HI Satish, A zookeeper client usually has a very small footprint for memory and cpu. The mutithreaded version of zookeeper client creates an internal thread to do the io and callbacks. I would suggest using the same zookeeper client across the objects to have less number of threads in your client process. mahadev On 4/24/09 2:37 PM, Satish Bhatti cthd2...@gmail.com wrote: If my application has several objects who are using ZooKeeper for entirely unrelated reasons, is it recommended to create one ZooKeeper client instance and share it, or to create one per object? Do the ZooKeeper client instances have a lot of overhead? I am thinking that having one instance per object will lead to simpler code in terms of handling Session expirations. Satish