Sam, I think the approach ted described should have response time of under seconds, and I think is probably a more reasonable one for scaling up.
Thanks mahadev On 12/16/10 10:17 PM, "Samuel Rash" <[email protected]> wrote: > Can these approaches respond in under a few seconds? If a traffic source > remains unclaimed for even a short while, we have a problem. > > Also, a host may "shed" traffic manually be releasing a subset of its > paths. In this way, all the other hosts watching only its location does > prevent against the herd when it dies, but how do they know when it > releases 50/625 traffic buckets? > > I agree we might be able to make a more intelligent design that trades > latency for watch efficiency, but the idea was that we'd use the simplest > approach that gave us the lowest latency *if* the throughput of watches > from zookeeper was sufficient (and it seems like it is from Mahadev's link) > > Thx, > -sr > > On 12/16/10 9:58 PM, "Ted Dunning" <[email protected]> wrote: > >> This really sounds like it might be refactored a bit to decrease the >> number >> of notifications and reads. >> >> In particular, it sounds like you have two problems. >> >> The first is that the 40 hosts need to claim various traffic sources, one >> per traffic source, many sources per host. This is well solved by the >> standard winner takes all file create idiom. >> >> The second problem is that other hosts need to know when traffic sources >> need claiming. >> >> I think you might consider an approach to the second problem which has >> each >> host posting a single ephemeral file containing a list of all of the >> sources >> it has claimed. Whenever a host claims a new service, it can update this >> file. When a host dies or exits, all the others will wake due to having a >> watch on the directory containing these ephemerals, will read the >> remaining >> host/source lists and determine which services are insufficiently covered. >> There will need to be some care taken about race conditions on this, but >> I >> think they all go the right way. >> >> This means that a host dying will cause 40 notifications followed by 1600 >> reads and at most 40 attempts at file creates. You might even be able to >> avoid the 1600 reads by having each of the source directories be watched >> by >> several of the 40 hosts. Then a host dying would cause just a few >> notifications and a few file creates. >> >> A background process on each node could occasionally scan the service >> lists >> for each host to make sure nothing drops through the cracks. >> >> This seems much more moderate than what you describe. >> >> On Thu, Dec 16, 2010 at 8:23 PM, Samuel Rash <[email protected]> wrote: >> >>> Yea--one host going down should trigger 24k watches. Each host then >>> looks >>> at its load and determines which paths to acquire (they represent >>> traffic >>> flow). This could result in, at worst, 24k create() attempts >>> immediately >>> after. >>> >>> I'll read the docs--Thanks >>> >>> -sr >>> >>> On 12/16/10 8:06 PM, "Mahadev Konar" <[email protected]> wrote: >>> >>>> Hi Sam, >>>> Just a clarifiaction, will a host going down fire 625 * 39 watches? >>> That >>>> is ~ 24000 watches per host being down. >>>> >>>> You can take a look at >>>> http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview about >>>> watches and latencies and hw requirements. Please do take a look and if >>>> it doesn't answer your questions, we should add more documentation. >>>> >>>> Thanks >>>> Mahadev >>>> >>>> On 12/16/10 7:42 PM, "Samuel Rash" <[email protected]> wrote: >>>> >>>> Hello, >>>> >>>> I am looking to run about 40 zookeeper clients with the following watch >>>> properties: >>>> >>>> 1. Up to 25,000 paths that every host has a watch on (each path has one >>>> child and the watch is one for that child, an ephemeral node, being >>>> removed) >>>> 2. An individual host "owns" 625 of these paths in this example; one >>> going >>>> down will fire 625 watches to the other 39 hosts >>>> >>>> Is there any limit on the rate at which these watches can be sent off? >>>> What's the right size cluster? (3? 5?) Does it need to be dedicated >>> hw? >>>> >>>> Thanks, >>>> Sam >>>> >>>> >>> >>> > >
