[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828743#action_12828743
 ] 

Patrick Hunt commented on ZOOKEEPER-662:
----------------------------------------

Qian, in looking at your log I see the following. 

2010-02-01 05:40:54,488 - INFO  [NIOServerCxn.Factory:8181:nioserverc...@583] - 
Connected to /10.81.14.81:16629 lastZxid 0
2010-02-01 05:40:54,488 - INFO  [NIOServerCxn.Factory:8181:nioserverc...@615] - 
Creating new session 0xc8265060cea96b73
2010-02-01 05:40:54,489 - INFO  [CommitProcessor:0:nioserverc...@964] - 
Finished init of 0xc8265060cea96b73 valid:true
2010-02-01 05:40:59,627 - WARN  [NIOServerCxn.Factory:8181:nioserverc...@494] - 
Exception causing close of session 0xc8265060cea96b73 due to 
java.io.IOException: Read error

can you tell if this client is crashing or having some issue? I see it 
reconnect very 15seconds or so with this same error.

I don't see what you described though - this client is stil able (just before 
your restart) to establish a session (create a connection). Were
you able to tell which client (was there one in particular?) where the 
CLOSE_WAIT was happening? Could you tell if it was the
ZK client or the 4letterword client? (do you have the netstat available?)


> Too many CLOSE_WAIT socket state on a server
> --------------------------------------------
>
>                 Key: ZOOKEEPER-662
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.2.1
>         Environment: Linux 2.6.9
>            Reporter: Qian Ye
>             Fix For: 3.3.0
>
>         Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106
>
>
> I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is 
> the content in the configure file, zoo.cfg
> ======
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial 
> # synchronization phase can take
> initLimit=5
> # The number of ticks that can pass between 
> # sending a request and getting an acknowledgement
> syncLimit=2
> # the directory where the snapshot is stored.
> dataDir=./data/
> # the port at which the clients will connect
> clientPort=8181
> # zookeeper cluster list
> server.100=10.23.253.43:8887:8888
> server.101=10.23.150.29:8887:8888
> server.102=10.23.247.141:8887:8888
> server.200=10.65.20.68:8887:8888
> server.201=10.65.27.21:8887:8888
> =====
> Before the problem happened, the server.200 was the leader. Yesterday 
> morning, I found the there were many sockets with the state of CLOSE_WAIT on 
> the clientPort (8181),  the total was over about 120. Because of these 
> CLOSE_WAIT, the server.200 could not accept more connections from the 
> clients. The only thing I can do under this situation is restart the 
> server.200, at about 2010-02-01 06:06:35. The related log is attached to the 
> issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to