FW: performance of watches

Samuel Rash Fri, 17 Dec 2010 08:13:00 -0800

Inline comments

>---------- Forwarded message ----------
>From: Ted Dunning <[email protected]>
>Date: Fri, Dec 17, 2010 at 12:36 AM
>Subject: Re: performance of watches
>To: [email protected]
>
>
>The limitation will be how quickly the crash of a processor can be
>detected.
> Typically this takes at least a few seconds and depending on how bad
>false
>positives are for you and how much your program does things like garbage
>collection, it might take a few tens of seconds.  Normal program exit
>should
>be detected almost instantly as should load shedding.

Great to hear--that's what we were hoping for.
>
>On Thu, Dec 16, 2010 at 10:17 PM, Samuel Rash <[email protected]> wrote:
>
>> Can these approaches respond in under a few seconds? If a traffic source
>> remains unclaimed for even a short while, we have a problem.
>>
>
>It can be made so.  This isn't a Zookeeper problem so much as a problem in
>detecting failures in general in the presence of poor real-time guarantees
>due to all manner of issues.

Understood.  We just want that if a host dies and hence is unreachable,
within a few seconds, it will lose claims on any buckets.  If it can
handle this case, that will be great

>
>
>> Also, a host may "shed" traffic manually be releasing a subset of its
>> paths.  In this way, all the other hosts watching only its location does
>> prevent against the herd when it dies, but how do they know when it
>> releases 50/625 traffic buckets?
>>
>
>In the design I proposed, the load shedding would be indicated by the
>disappearance of the "I am handling this source" file.  That would
>immediately notify the hosts waiting to handle that source who would all
>try
>to claim the source.

Sorry, I'm still missing something.  I thought each host created a file
that listed the buckets it was handling.  In this way, each host watches
the other hosts' buckets and if a host dies, a single notification is sent
to all others, 39 notifications rather than 625 * 39.   However, what I
don't understand is what a host that gets that notification does to figure
out what buckets are free for acquisition?  The full 25,000 bucket set is
not known by all clients and can grow and shrink dymacially.  If the watch
a for a host's sources file fires, each other host still needs to read
25,000 paths now to figure out which were freed up.

>
>Response time in this scenario would be a few milliseconds.
>
>It should work, but you have a quadratic scaling factor in your design.
> That is almost always bad before too long.  That quadratic term can be
>avoided pretty easily.

I'd love to avoid this, but the basic idea is the event of concern is a
host X bucket assignment changes.  Every other host needs to be notified
of either this singular event, or when a 'batch' has occurred.  If it's
batched, there has to be a way to get the list of actual assignemnts that
went away.  The question I have is whether or not this information comes
in the form of the path a watch firing sends, or reads.  Is there some
salient (or blunt) point that I'm just not understanding?

Thanks so much for the help.

-sr

FW: performance of watches

Reply via email to