[ https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918984#action_12918984 ]
Patrick Hunt edited comment on ZOOKEEPER-885 at 10/7/10 1:34 PM: ----------------------------------------------------------------- bq. I do have confirmation that a session is established for every client (all 45 of them) before beginning the disk load with dd. I see, I was just trying to reduce variables. that should be fine then. I see this in the logs: 2010-10-07 14:49:13,956 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server environment:java.version=1.6.0_0 2010-10-07 14:49:13,960 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server environment:java.vendor=Sun Microsystems Inc. 2010-10-07 14:49:13,960 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server environment:java.home=/usr/lib/jvm/java-6-openjdk/jre I'm not sure many users are running openjdk, also 1.6.0_0 is very old (I have 1.6.0_18 openjdk on my system). You should upgrade to a recent version of openjdk at the least, although I'd highly suggest running with the official (and recent) sun jdk. (again, this is to reduce variables) Also I noticed this in the server log for 1 server, it seems to be misconfigured, perhaps you can fix that? (normal_3.3.1/192.168.131.12.log) 2010-10-07 14:49:13,979 - FATAL [main:quorumpeerm...@83] - Invalid config, exiting abnormally bq. Should I enable more verbose logging? Yes, give that a try, perhaps run with TRACE logging turned on. If you can upload one of those logs I'll take a look. Right now we have this in the server log: 2010-10-07 14:51:32,961 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@633] - EndOfStreamException: Unable to read additional data from client sessionid 0x22b872ad9ff000c, likely client has closed socket 2010-10-07 14:51:32,962 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1434] - Closed socket connection for client /10.23.4.95:59738 which had sessionid 0x22b872ad9ff000c This indicates that the client is closing the connection (EOS). Please capture the logs on your client and upload one of them. Perhaps run that at TRACE level as well. That will give us more insight into why the client is closing it's side of the connection (at least from the server's perspective). Thanks for the help on this! was (Author: phunt): bq. I do have confirmation that a session is established for every client (all 45 of them) before beginning the disk load with dd. I see, I was just trying to reduce variables. that should be fine then. I see this in the logs: 2010-10-07 14:49:13,956 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server environment:java.version=1.6.0_0 2010-10-07 14:49:13,960 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server environment:java.vendor=Sun Microsystems Inc. 2010-10-07 14:49:13,960 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server environment:java.home=/usr/lib/jvm/java-6-openjdk/jre I'm not sure many users are running openjdk, also 1.6.0_0 is very old (I have 1.6.0_18 openjdk on my system). You should upgrade to a recent version of openjdk at the least, although I'd highly suggest running with the official (and recent) sun jdk. (again, this is to reduce variables) Also I noticed this in the server log for 1 server, it seems to be misconfigured, perhaps you can fix that? (normal_3.3.1/192.168.131.12.log) 2010-10-07 14:49:13,979 - FATAL [main:quorumpeerm...@83] - Invalid config, exiting abnormally bq. Should I enable more verbose logging? Yes, give that a try, perhaps run with TRACE logging turned on. If you can upload one of those logs I'll take a look. Right now we have this in the server log: 2010-10-07 14:51:32,961 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@633] - EndOfStreamException: Unable to read additional data from client sessionid 0x22b872ad9ff000c, likely client has closed socket 2010-10-07 14:51:32,962 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1434] - Closed socket connection for client /10.23.4.95:59738 which had sessionid 0x22b872ad9ff000c This indicates that the client is closing the connection (EOS). Please capture the logs on your client and upload one of them. Perhaps run that at DEBUG level as well. That will give us more insight into why the client is closing it's side of the connection (at least from the server's perspective). Thanks for the help on this! > Zookeeper drops connections under moderate IO load > -------------------------------------------------- > > Key: ZOOKEEPER-885 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885 > Project: Zookeeper > Issue Type: Bug > Components: server > Affects Versions: 3.2.2 > Environment: Debian (Lenny) > 1Gb RAM > swap disabled > 100Mb heap for zookeeper > Reporter: Alexandre Hardy > Priority: Critical > Attachments: WatcherTest.java, zklogs.tar.gz > > > A zookeeper server under minimum load, with a number of clients watching > exactly one node will fail to maintain the connection when the machine is > subjected to moderate IO load. > In a specific test example we had three zookeeper servers running on > dedicated machines with 45 clients connected, watching exactly one node. The > clients would disconnect after moderate load was added to each of the > zookeeper servers with the command: > {noformat} > dd if=/dev/urandom of=/dev/mapper/nimbula-test > {noformat} > The {{dd}} command transferred data at a rate of about 4Mb/s. > The same thing happens with > {noformat} > dd if=/dev/zero of=/dev/mapper/nimbula-test > {noformat} > It seems strange that such a moderate load should cause instability in the > connection. > Very few other processes were running, the machines were setup to test the > connection instability we have experienced. Clients performed no other read > or mutation operations. > Although the documents state that minimal competing IO load should present on > the zookeeper server, it seems reasonable that moderate IO should not cause > problems in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.