Hi Mike
I also faced same issue. There is test patch in ZOOKEEPER-2570 which can be
used to quickly check  performance gains in each modification.  Hope it is


On Thu, Oct 13, 2016 at 1:27 AM, Mike Solomon <ms...@dropbox.com> wrote:

> I've been performance testing 3.5.2 and hit an interesting unavailability
> issue.
> When there server is very busy (64k connections, 16k writes per
> second) the leader can get busy enough that connections get throttled.
> Enough throttling causes sessions to expire. As sessions expire, the
> CPU consumption rises and the quorum is effectively unavailable.
> Interestingly, if you shut down all the clients, the quorum won't heal
> for nearly 10 minutes.
> The issue is that the outstandingChanges queue has 250k items in it
> and the closeSession code scans this linearly under a lock. Replacing
> the linear scan with a hash table lookup improves this, but likely the
> real solution is some backpressure on clients as a result of an
> oversized outstandingChanges queue.
> Here is a sample fix:
> https://github.com/msolo/zookeeper/commit/75da352d506c2e3b0001d28acc058c
> 422b3c8f0c
> This results in the quorum healing about 30 seconds after the clients
> disconnect.
> Is there a way to prevent runaway growth in this queue? I'm wondering
> if changing the definition of "throttling" to take into account the
> size of this queue might help mitigate this. The end goal is that some
> stable amount of traffic is reached asymptotically without suffering a
> collapse.
> Thanks,
> -Mike

Reply via email to