Well that's good - 300ms max latency means that the server can round
trip any requests pretty quickly. It would lead me to look at the client
VMs or (intermittent) network problems...
Keep in mind though that's one of your servers (unless you are saying
you checked all X of the servers in the cluster and that was the overall
max?). You may discover one server that has issues while the other
servers are fine. In which case only clients connected to the "bad"
server(s) will experience problems. (and since clients can jump btw that
might be contributing the the randomness in observed occurrence)
Good luck and keep us posted. EC2 is very interesting, I'd like to learn
more about the operating environment and in particular the issues
involved with running ZK there.
Ted Dunning wrote:
This hasn't helped yet, but that is just because it was a very large bite of
the apple. Once I digest it, I can tell that it will be very helpful.
I did have a chance to look at the "stat" output and maximum latency was
<300ms. How that connects with what you are saying isn't clear yet, but I
can see how that might not be diagnostic of whether the server side timeout
is sufficiently long.
On Thu, Apr 16, 2009 at 10:57 AM, Patrick Hunt <ph...@apache.org> wrote:
lots of stuff about monitoring ... jmx ... packet loss ... vm latencies ...
... Hope this helps.