[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933897#action_12933897 ] Benoit Sigoure commented on ZOOKEEPER-880: -- Do we agree that monitoring wasn't causing the issue? As JD said, even after we stopped it, the problem re-occurred. QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Priority: Critical Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoit Sigoure updated ZOOKEEPER-880: - Priority: Critical (was: Major) QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Priority: Critical Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931457#action_12931457 ] Benoit Sigoure commented on ZOOKEEPER-880: -- Bumping up the severity. This took down one of our clusters again. QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Priority: Critical Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.