[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-880: --- Fix Version/s: 3.4.0 3.3.3 Status: Open (was: Patch Available) We really should have a test for this case. Vishal can you add it? (the more the better) QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Assignee: Vishal K Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz, ZOOKEEPER-880.patch We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal K updated ZOOKEEPER-880: --- Attachment: ZOOKEEPER-880.patch The root cause of frequent disconnect needs to be resolved. In the mean time, I have fixed the problem that was causing the leak of every other thread of SendWorker. I tested the patch by connecting to 3888 on one of the servers using telnet. 2010-11-19 14:51:10,081 - INFO [/10.17.119.101:3888:quorumcnxmanager$liste...@477] - Received connection request /10.16.251.39:2074 2010-11-19 14:51:14,364 - DEBUG [/10.17.119.101:3888:quorumcnxmanager$sendwor...@553] - Address of remote peer: 8103510703875099187 2010-11-19 14:51:19,440 - WARN [Thread-7:quorumcnxmanager$recvwor...@726] - Connection broken for id 8103510703875099187, my id = 1, error = java.io.IOException: Received packet with invalid packet: 218824692 at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:711) 2010-11-19 14:51:19,441 - WARN [Thread-7:quorumcnxmanager$recvwor...@730] - Interrupting SendWorker = SendWorker is getting killed. 2010-11-19 14:51:19,442 - DEBUG [Thread-7:quorumcnxmanager$sendwor...@571] - Calling finish for 8103510703875099187 2010-11-19 14:51:19,443 - DEBUG [Thread-7:quorumcnxmanager$sendwor...@591] - Removing entry from senderWorkerMap sid=8103510703875099187 2010-11-19 14:51:19,443 - WARN [Thread-6:quorumcnxmanager$sendwor...@643] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:631) 2010-11-19 14:51:19,456 - DEBUG [Thread-6:quorumcnxmanager$sendwor...@571] - Calling finish for 8103510703875099187 2010-11-19 14:51:19,457 - WARN [Thread-6:quorumcnxmanager$sendwor...@652] - Send worker leaving thread Can you see if this fixes the problem? QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Priority: Critical Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz, ZOOKEEPER-880.patch We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoit Sigoure updated ZOOKEEPER-880: - Priority: Critical (was: Major) QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Priority: Critical Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated ZOOKEEPER-880: - Attachment: hbase-hadoop-zookeeper-sv4borg9.log.gz jstack hbase-hadoop-zookeeper-sv4borg12.log.gz Attaching the logs of the problematic server (sv4borg9) that I restarted this afternoon and the logs from one of the other servers from that point. Also attaching the jstack of first server. QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.