[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-22 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-880:
---

Fix Version/s: 3.4.0
   3.3.3
   Status: Open  (was: Patch Available)

We really should have a test for this case. Vishal can you add it? (the more 
the better)

 QuorumCnxManager$SendWorker grows without bounds
 

 Key: ZOOKEEPER-880
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.2
Reporter: Jean-Daniel Cryans
Assignee: Vishal K
Priority: Critical
 Fix For: 3.3.3, 3.4.0

 Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
 hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
 TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz, ZOOKEEPER-880.patch


 We're seeing an issue where one server in the ensemble has a steady growing 
 number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
 out of native threads, and at the same time we see a lot of exceptions in the 
 logs.  This is on 3.2.2 and our config looks like:
 {noformat}
 tickTime=3000
 dataDir=/somewhere_thats_not_tmp
 clientPort=2181
 initLimit=10
 syncLimit=5
 server.0=sv4borg9:2888:3888
 server.1=sv4borg10:2888:3888
 server.2=sv4borg11:2888:3888
 server.3=sv4borg12:2888:3888
 server.4=sv4borg13:2888:3888
 {noformat}
 The issue is on the first server. I'm going to attach threads dumps and logs 
 in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-19 Thread Vishal K (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal K updated ZOOKEEPER-880:
---

Attachment: ZOOKEEPER-880.patch



The root cause of frequent disconnect needs to be resolved. In the mean time, I 
have fixed the problem that was causing the leak of every other thread of 
SendWorker.

I tested the patch by connecting to 3888 on one of the servers using telnet.

2010-11-19 14:51:10,081 - INFO  
[/10.17.119.101:3888:quorumcnxmanager$liste...@477] - Received connection 
request /10.16.251.39:2074
2010-11-19 14:51:14,364 - DEBUG 
[/10.17.119.101:3888:quorumcnxmanager$sendwor...@553] - Address of remote peer: 
8103510703875099187
2010-11-19 14:51:19,440 - WARN  [Thread-7:quorumcnxmanager$recvwor...@726] - 
Connection broken for id 8103510703875099187, my id = 1, error = 
java.io.IOException: Received packet with invalid packet: 218824692
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:711)
2010-11-19 14:51:19,441 - WARN  [Thread-7:quorumcnxmanager$recvwor...@730] - 
Interrupting SendWorker   = SendWorker is getting killed.
2010-11-19 14:51:19,442 - DEBUG [Thread-7:quorumcnxmanager$sendwor...@571] - 
Calling finish for 8103510703875099187
2010-11-19 14:51:19,443 - DEBUG [Thread-7:quorumcnxmanager$sendwor...@591] - 
Removing entry from senderWorkerMap sid=8103510703875099187
2010-11-19 14:51:19,443 - WARN  [Thread-6:quorumcnxmanager$sendwor...@643] - 
Interrupted while waiting for message on queue
java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
at 
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:631)
2010-11-19 14:51:19,456 - DEBUG [Thread-6:quorumcnxmanager$sendwor...@571] - 
Calling finish for 8103510703875099187
2010-11-19 14:51:19,457 - WARN  [Thread-6:quorumcnxmanager$sendwor...@652] - 
Send worker leaving thread

Can you see if this fixes the problem?

 QuorumCnxManager$SendWorker grows without bounds
 

 Key: ZOOKEEPER-880
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.2
Reporter: Jean-Daniel Cryans
Priority: Critical
 Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
 hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
 TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz, ZOOKEEPER-880.patch


 We're seeing an issue where one server in the ensemble has a steady growing 
 number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
 out of native threads, and at the same time we see a lot of exceptions in the 
 logs.  This is on 3.2.2 and our config looks like:
 {noformat}
 tickTime=3000
 dataDir=/somewhere_thats_not_tmp
 clientPort=2181
 initLimit=10
 syncLimit=5
 server.0=sv4borg9:2888:3888
 server.1=sv4borg10:2888:3888
 server.2=sv4borg11:2888:3888
 server.3=sv4borg12:2888:3888
 server.4=sv4borg13:2888:3888
 {noformat}
 The issue is on the first server. I'm going to attach threads dumps and logs 
 in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-12 Thread Benoit Sigoure (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Sigoure updated ZOOKEEPER-880:
-

Priority: Critical  (was: Major)

 QuorumCnxManager$SendWorker grows without bounds
 

 Key: ZOOKEEPER-880
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.2
Reporter: Jean-Daniel Cryans
Priority: Critical
 Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
 hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
 TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz


 We're seeing an issue where one server in the ensemble has a steady growing 
 number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
 out of native threads, and at the same time we see a lot of exceptions in the 
 logs.  This is on 3.2.2 and our config looks like:
 {noformat}
 tickTime=3000
 dataDir=/somewhere_thats_not_tmp
 clientPort=2181
 initLimit=10
 syncLimit=5
 server.0=sv4borg9:2888:3888
 server.1=sv4borg10:2888:3888
 server.2=sv4borg11:2888:3888
 server.3=sv4borg12:2888:3888
 server.4=sv4borg13:2888:3888
 {noformat}
 The issue is on the first server. I'm going to attach threads dumps and logs 
 in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-09-27 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated ZOOKEEPER-880:
-

Attachment: hbase-hadoop-zookeeper-sv4borg9.log.gz
jstack
hbase-hadoop-zookeeper-sv4borg12.log.gz

Attaching the logs of the problematic server (sv4borg9) that I restarted this 
afternoon and the logs from one of the other servers from that point. Also 
attaching the jstack of first server.

 QuorumCnxManager$SendWorker grows without bounds
 

 Key: ZOOKEEPER-880
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.2
Reporter: Jean-Daniel Cryans
 Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
 hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack


 We're seeing an issue where one server in the ensemble has a steady growing 
 number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
 out of native threads, and at the same time we see a lot of exceptions in the 
 logs.  This is on 3.2.2 and our config looks like:
 {noformat}
 tickTime=3000
 dataDir=/somewhere_thats_not_tmp
 clientPort=2181
 initLimit=10
 syncLimit=5
 server.0=sv4borg9:2888:3888
 server.1=sv4borg10:2888:3888
 server.2=sv4borg11:2888:3888
 server.3=sv4borg12:2888:3888
 server.4=sv4borg13:2888:3888
 {noformat}
 The issue is on the first server. I'm going to attach threads dumps and logs 
 in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.