We're thinking along the same lines. Specifically, I was thinking of using a hash ring to minimize disruptions to the key space when nodes come and go. Either that, or micro-sharding would be nice and I'm curious how this has went with anyone else using ZooKeeper? I should mention, this is basically an alternative to distributed locks. Both achieve the same thing - protecting against race conditions.
Josh On Thu, Jan 5, 2012 at 12:50 AM, Ted Dunning <[email protected]> wrote: > Jordan, I don't think that leader election does what Josh wants. > > I don't think that consistent hashing is particularly good for that either > because the loss of one node causes the sequential state for lots of > entities to move even among nodes that did not fail. > > What I would recommend is a variant of micro-sharding. The key space is > divided into many micro-shards. Then nodes that are alive claim the > micro-shards using ephemerals and proceed as Josh described. On loss of a > node, the shards that node was handling should be claimed by the remaining > nodes. When a new node appears or new work appears, it is helpful to > direct nodes to effect a hand-off of traffic. > > In my experience, the best way to implement shard balancing is with and > external master instance much in the style of hbase or katta. This > external master can be exceedingly simple and only needs to wake up on > various events like loss of a node or change in the set of live shards. It > can also wake up at intervals if desired to backstop the normal > notifications or to allow small changes for certain kinds of balancing. > Typically, this only requires a few hundred lines of code. > > This external master can, of course, be run on multiple nodes and which > master is in current control can be adjudicated with yet another leader > election. > > You can view this as a package of many leader elections. Or as discretized > consistent hashing. The distinctions are a bit subtle but are very > important. These include, > > - there is a clean division of control between the master which determines > who serves what and the nodes that do the serving > > - there is no herd effect because the master drives the assignments > > - node loss causes the minimum amount of change of assignments since no > assignments to surviving nodes are disturbed. This is a major win. > > - balancing is pretty good because there are many shards compared to the > number of nodes. > > - the balancing strategy is highly pluggable. > > This pattern would make a nice addition to Curator, actually. It comes up > repeatedly in different contexts. > > On Thu, Jan 5, 2012 at 12:11 AM, Jordan Zimmerman <[email protected] > >wrote: > > > OK - so this is two options for doing the same thing. You use a Leader > > Election algorithm to make sure that only one node in the cluster is > > operating on a work unit. Curator has an implementation (it's really just > > a distributed lock with a slightly different API). > > > > -JZ > > > > On 1/5/12 12:04 AM, "Josh Stone" <[email protected]> wrote: > > > > >Thanks for the response. Comments below: > > > > > >On Wed, Jan 4, 2012 at 10:46 PM, Jordan Zimmerman > > ><[email protected]>wrote: > > > > > >> Hi Josh, > > >> > > >> >Second use case: Distributed locking > > >> This is one of the most common uses of ZooKeeper. There are many > > >> implementations - one included with the ZK distro. Also, there is > > >>Curator: > > >> https://github.com/Netflix/curator > > >> > > >> >First use case: Distributing work to a cluster of nodes > > >> This sounds feasible. If you give more details I and others on this > list > > >> can help more. > > >> > > > > > >Sure. I basically want to handle race conditions where two commands that > > >operate on the same data are received by my cluster of znodes, > > >concurrently. One approach is to lock on the data that is effected by > the > > >command (distributed lock). Another approach is make sure that all of > the > > >commands that operate on any set of data are routed to the same node, > > >where > > >they can be processed serially using local synchronization. Consistent > > >hashing is an algorithm that can be used to select a node to handle a > > >message (where the inputs are the key to hash and the number of nodes in > > >the cluster). > > > > > >There are various implementations for this floating around. I'm just > > >interesting to know how this is working for anyone else. > > > > > >Josh > > > > > > > > >> > > >> -JZ > > >> > > >> ________________________________________ > > >> From: Josh Stone [[email protected]] > > >> Sent: Wednesday, January 04, 2012 8:09 PM > > >> To: [email protected] > > >> Subject: Use cases for ZooKeeper > > >> > > >> I have a few use cases that I'm wondering if ZooKeeper would be > suitable > > >> for and would appreciate some feedback. > > >> > > >> First use case: Distributing work to a cluster of nodes using > consistent > > >> hashing to ensure that messages of some type are consistently handled > by > > >> the same node. I haven't been able to find any info about ZooKeeper + > > >> consistent hashing. Is anyone using it for this? A concern here would > be > > >> how to redistribute work as nodes come and go from the cluster. > > >> > > >> Second use case: Distributed locking. I noticed that there's a recipe > > >>for > > >> this on the ZooKeeper wiki. Is anyone doing this? Any issues? One > > >>concern > > >> would be how to handle orphaned locks if a node that obtained a lock > > >>goes > > >> down. > > >> > > >> Third use case: Fault tolerance. If we utilized ZooKeeper to > distribute > > >> messages to workers, can it be made to handle a node going down by > > >> re-distributing the work to another node (perhaps messages that are > not > > >> ack'ed within a timeout are resent)? > > >> > > >> Cheers, > > >> Josh > > >> > > > > >
