The thing that seems odd to me is that the connectivity complaints are out of the zk client, right?, why is it failing getting to member 14 and why not move to another ensemble member if issue w/ 14?, and if there were a general connectivity issue, I'd think that the running hbase cluster would be complaining at about the same time (its talking to datanodes and masters at this time).
(Thanks for the input lads) St.Ack On Mon, Feb 22, 2010 at 11:26 AM, Mahadev Konar <maha...@yahoo-inc.com> wrote: > I also looked at the logs. Ted might have a point. It does look like that > zookeeper server's are doing fine (though as ted mentions the skew is a > little concerning, though that might be due to very few packets served by > the first server). Other than that the latencies of 300 ms at max should not > cause any timeouts. > Also, the number of packets received is pretty low - meaning that it wasn't > serving huge traffic. Is there anyway we can check if the network connection > from the client to the server is not flaky? > > Thanks > mahadev > > > On 2/22/10 10:40 AM, "Ted Dunning" <ted.dunn...@gmail.com> wrote: > >> Not sure this helps at all, but these times are remarkably asymmetrical. I >> would expect members of a ZK cluster to have very comparable times. >> >> Additionally, 345 ms is nowhere near large enough to cause a session to >> expire. My take is that ZK doesn't think it caused the timeout. >> >> On Mon, Feb 22, 2010 at 10:18 AM, Stack <st...@duboce.net> wrote: >> >>> Latency min/avg/max: 2/125/345 >>> ... >>> Latency min/avg/max: 0/7/81 >>> ... >>> Latency min/avg/max: 1/1/1 >>> >>> Thanks for any pointers on how to debug. >>> > >