We're using ZK to implement something similar. We have a need for a
Hadoop job to assign new ID's a) without hitting a database, and b)
ensuring that the ID's assigned are unique (i.e., that the numerous
simultaneous tasks in the Hadoop job don't contend with each other
and/or corrupt the "next ID value"). So we wrote a small library on top
of ZK to do this, and it's working out quite nicely. See:
http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201008.mbox/%[email protected]%3e
for details.
I had been planning to release this as open source to the community
(see:
http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201008.mbox/%[email protected]%3e)
- and still am. Just haven't quite gotten around to cleaning it up for
release yet.
DR
On 12/02/2010 09:29 AM, Claudio Martella wrote:
Hi,
I'm trying to implement a String->Long dictionary, as I'm doing text
processing in M/R and would like to speed up my things.
In order to implement the mapping, I need to access a high speed atomic
counter that allows me to pick the latest used Long, increment it and
use it for the latest-discovered new word to put in the dictionary.
At first i thought about using a regular sequential znode and use the
sequence number as the counter value, but I realize the sequence number
is an int, while i'd like a long. Is that correct? I'm refering to
Stat.getVersion() in the API.
In case this strategy is unfeasible, the second possibility is to use a
WriteLock to "/counter" to control access the payload of the znode,
where i'd put the counter value, or access to a special row in
cassandra, where i'd put the counter value. The Cassandra option is
probably the best possibility, as i'm storing my dictionary there
anyway, but I'd like to hear from you about latency and performance for
this options in ZK.
Thanks
Claudio