[
https://issues.apache.org/jira/browse/ZOOKEEPER-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flavio Paiva Junqueira updated ZOOKEEPER-647:
---------------------------------------------
Attachment: ZOOKEEPER-647.patch
By inspecting the code of QuorumCnxManager, I've been able to find a corner
case that could cause a RecvWorker thread to hang during shutdown. Here is a
summary of how the problem can occur:
1- sendWorkerMap is updated during the execution of softHalt (cnx manager is
being shut down);
2- The sender worker that was not shut down during the execution of softHalt
will later leave its main loop because the value of the attribute shutdown is
true;
3- Leaving the loop due to shutdown evaluating to true does not cause finish()
to be called, which must happen to kill its recv worker sibling.
I'm proposing a fix that is quite simple. The correct interleaving to trigger
the bug is quite difficult to reproduce, though, so I'm not providing a test.
> hudson failure in testLeaderShutdown
> ------------------------------------
>
> Key: ZOOKEEPER-647
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-647
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Reporter: Patrick Hunt
> Assignee: Flavio Paiva Junqueira
> Priority: Critical
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-647.patch
>
>
> http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/666/testReport/org.apache.zookeeper.test/QuorumTest/testLeaderShutdown/
> junit.framework.AssertionFailedError: QP failed to shutdown in 30 seconds
> at org.apache.zookeeper.test.QuorumBase.shutdown(QuorumBase.java:293)
> at
> org.apache.zookeeper.test.QuorumBase.shutdownServers(QuorumBase.java:281)
> at org.apache.zookeeper.test.QuorumBase.tearDown(QuorumBase.java:266)
> at org.apache.zookeeper.test.QuorumTest.tearDown(QuorumTest.java:55)
> Flavio, can you triage this one?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.