Vishal K commented on ZOOKEEPER-880:

While debugging for https://issues.apache.org/jira/browse/ZOOKEEPER-822 I found 
that senderWorkerMap would not have an entry for a server, but there will be a 
RecvWorker and SendWorker thread running for the server. In my case, this was 
seen when the leader died (i.e., during leader election). However, I think this 
can happen when a peer disconnects from another peer. The cause was incorrect 
handling of add/remove of entries from senderWorkerMap, which is exposed due to 
race conditions in QuorumCnxManager. There is a patch available for 

I am not sure if the ZOOKEEPER-822 is causing trouble here as well. I just 
wanted to point out the possibility.

> QuorumCnxManager$SendWorker grows without bounds
> ------------------------------------------------
>                 Key: ZOOKEEPER-880
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
>             Project: Zookeeper
>          Issue Type: Bug
>    Affects Versions: 3.2.2
>            Reporter: Jean-Daniel Cryans
>         Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
> hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
> TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz
> We're seeing an issue where one server in the ensemble has a steady growing 
> number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
> out of native threads, and at the same time we see a lot of exceptions in the 
> logs.  This is on 3.2.2 and our config looks like:
> {noformat}
> tickTime=3000
> dataDir=/somewhere_thats_not_tmp
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=sv4borg9:2888:3888
> server.1=sv4borg10:2888:3888
> server.2=sv4borg11:2888:3888
> server.3=sv4borg12:2888:3888
> server.4=sv4borg13:2888:3888
> {noformat}
> The issue is on the first server. I'm going to attach threads dumps and logs 
> in moment.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to