Hard to say based on the bits/pieces of the log we have access to. I'd
have to see the full log, preferably from both the server and client, to
gain more insight.
re low numbers, this is the received count for the server, this should
always increase never decrease. The fact that it is so low either
indicates that the server recently restarted, or clients are not
attaching to it. Seems like it should be near the other servers but
again, hard to tell based on the small aperture we have via mail.
Thanks Patrick. See below.
On Tue, Feb 23, 2010 at 1:19 PM, Patrick Hunt <ph...@apache.org> wrote:
Stack you might look at the following:
1) why does server 14 have such a low recv count?
while the other servers are at 3.7k + received. Did server 14 fail at some
point? Or it's network? This may have caused the timeout seen by the client:
Ok. Will check into this the next time. I did take the dump after
the observed TIMED_OUT, a good while after. Could this be why the
numbers are low?
2010-02-21 18:23:55,583 [main-SendThread] INFO
org.apache.zookeeper.ClientCnxn: Attempting connection to server
org.apache.zookeeper.ClientCnxn: Exception closing session
0x226ed968a270003 to sun.nio.ch.selectionkeyi...@2a50e9a3
java.io.IOException: TIMED OUT
2) connection timeout is different from session timeout. connection timeout
is the amount of time we allow for connection establishment (socket open)
until the server accepts the connection, this value is the session timeout
(as requested by the client) divided by the number of hosts in the host
list. This could account for why the timeout (above snippet) occurred after
5 seconds. What timeout value is this client using? 15 seconds?
We ask for a session timeout of 60 seconds -- the hbase default -- and
our ticktime is 3 seconds.
You are not troubled at all by the exceptions closing sessions above?
Are these just noise?
Thanks for the input,