Re: Understanding Load on Zookeeper Box

Patrick Hunt Thu, 24 May 2012 17:31:50 -0700

On Thu, May 24, 2012 at 3:42 PM, Matthew Ward <[email protected]> wrote:
> I have a couple theories and questions I was hoping to clear up (all java 
> based 3.3.4):
> 1) I have been trying to troubleshoot the reason for high system wait time on 
> one of our zookeeper instances. The theory I have is that setting watches 
> increases the system wait load. Does this theory sound accurate?


The two most common causes of high latency are GC/swapping and high
disk utilization on the transaction log (WAL). Check for that first.

Have you seen this page?
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting

Given you mention AWS in q2 that might also be related - remember
you're not accessing the disk(s) directly so disk issues are even more
likely - the main issue being that we need to fsync the txnlog before
responding to the proposal. (I often use strace on the fsync fdatasync
methods to track/graph this)

> 2) Question 2 is a follow up to the first... whenever I do a watch and wait 
> for the event, I have an 'insurance policy' (since AWS is fun...) of setting 
> a mutex with a timeout, before retrying the operation and potentially setting 
> another watch. How does zookeeper handle duplicate watches? Am I exacerbating 
> the system wait load issue by setting duplicate watches? If there a way I 
> should cancel the watch?

A particular session can establish only a single watch on a particular
path. Multiple watches have no negative effect (other than a
round-trip read to the server of course).

Patrick

Re: Understanding Load on Zookeeper Box

Reply via email to