Re: counter with zookeeper

David Rosenstrauch Thu, 02 Dec 2010 07:25:28 -0800

We're using ZK to implement something similar. We have a need for aHadoop job to assign new ID's a) without hitting a database, and b)ensuring that the ID's assigned are unique (i.e., that the numeroussimultaneous tasks in the Hadoop job don't contend with each otherand/or corrupt the "next ID value"). So we wrote a small library on topof ZK to do this, and it's working out quite nicely. See:http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201008.mbox/%[email protected]%3efor details.

I had been planning to release this as open source to the community(see:http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201008.mbox/%[email protected]%3e)- and still am. Just haven't quite gotten around to cleaning it up forrelease yet.


DR

On 12/02/2010 09:29 AM, Claudio Martella wrote:

Hi,

I'm trying to implement a String->Long dictionary, as I'm doing text
processing in M/R and would like to speed up my things.
In order to implement the mapping, I need to access a high speed atomic
counter that allows me to pick the latest used Long, increment it and
use it for the latest-discovered new word to put in the dictionary.

At first i thought about using a regular sequential znode and use the
sequence number as the counter value, but I realize the sequence number
is an int, while i'd like a long. Is that correct? I'm refering to
Stat.getVersion() in the API.

In case this strategy is unfeasible, the second possibility is to use a
WriteLock to "/counter" to control access the payload of the znode,
where i'd put the counter value, or access to a special row in
cassandra, where i'd put the counter value. The Cassandra option is
probably the best possibility, as i'm storing my dictionary there
anyway, but I'd like to hear from you about latency and performance for
this options in ZK.


Thanks

Claudio

Re: counter with zookeeper

Reply via email to