[ https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828743#action_12828743 ]
Patrick Hunt commented on ZOOKEEPER-662: ---------------------------------------- Qian, in looking at your log I see the following. 2010-02-01 05:40:54,488 - INFO [NIOServerCxn.Factory:8181:nioserverc...@583] - Connected to /10.81.14.81:16629 lastZxid 0 2010-02-01 05:40:54,488 - INFO [NIOServerCxn.Factory:8181:nioserverc...@615] - Creating new session 0xc8265060cea96b73 2010-02-01 05:40:54,489 - INFO [CommitProcessor:0:nioserverc...@964] - Finished init of 0xc8265060cea96b73 valid:true 2010-02-01 05:40:59,627 - WARN [NIOServerCxn.Factory:8181:nioserverc...@494] - Exception causing close of session 0xc8265060cea96b73 due to java.io.IOException: Read error can you tell if this client is crashing or having some issue? I see it reconnect very 15seconds or so with this same error. I don't see what you described though - this client is stil able (just before your restart) to establish a session (create a connection). Were you able to tell which client (was there one in particular?) where the CLOSE_WAIT was happening? Could you tell if it was the ZK client or the 4letterword client? (do you have the netstat available?) > Too many CLOSE_WAIT socket state on a server > -------------------------------------------- > > Key: ZOOKEEPER-662 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662 > Project: Zookeeper > Issue Type: Bug > Components: quorum > Affects Versions: 3.2.1 > Environment: Linux 2.6.9 > Reporter: Qian Ye > Fix For: 3.3.0 > > Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106 > > > I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is > the content in the configure file, zoo.cfg > ====== > # The number of milliseconds of each tick > tickTime=2000 > # The number of ticks that the initial > # synchronization phase can take > initLimit=5 > # The number of ticks that can pass between > # sending a request and getting an acknowledgement > syncLimit=2 > # the directory where the snapshot is stored. > dataDir=./data/ > # the port at which the clients will connect > clientPort=8181 > # zookeeper cluster list > server.100=10.23.253.43:8887:8888 > server.101=10.23.150.29:8887:8888 > server.102=10.23.247.141:8887:8888 > server.200=10.65.20.68:8887:8888 > server.201=10.65.27.21:8887:8888 > ===== > Before the problem happened, the server.200 was the leader. Yesterday > morning, I found the there were many sockets with the state of CLOSE_WAIT on > the clientPort (8181), the total was over about 120. Because of these > CLOSE_WAIT, the server.200 could not accept more connections from the > clients. The only thing I can do under this situation is restart the > server.200, at about 2010-02-01 06:06:35. The related log is attached to the > issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.