Mike Percy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9246 )
Change subject: KUDU-2274. Shut down tombstoned replica when replacing it ...................................................................... KUDU-2274. Shut down tombstoned replica when replacing it Failing to shut down a tombstoned replica after copying it can lead to unfortunate interleavings resulting in the replica ending up in an inconsistent state. This actually occurred in a test environment, although it proved very hard to reproduce. This patch includes several changes in addition to shutting down tombstoned replicas before replacing them: * Remove the thread safety properties of the ConsensusMetadata class ConsensusMetadata doesn't need to be thread-safe, even though it is ref-counted, because it is required to be externally synchronized. This patch replaces the mutex with a DFAKE_MUTEX from the thread collision warner utility class in order to easily detect concurrent access due to buggy external sychronization. * Also improve destructor state checks in TabletReplica. * Fix another case of unlocked cmeta access by TSTabletManager. These fixes were verified by running tombstoned_voting-stress-test with 4 CPU stress threads on the dist-test cluster after applying only the ConsensusMetadata thread-safety portion of this patch, and then again with the unlocked access fix and shutdown portions of this patch. After removing the cmeta mutex only (186/200 failed): http://dist-test.cloudera.org/job?job_id=mpercy.1518077234.135005 This full patch (200/200 succeeded): http://dist-test.cloudera.org/job?job_id=mpercy.1518078690.66599 Change-Id: Ia8d086c3fba52826ebe0d3a44842d53ecb6a9265 Reviewed-on: http://gerrit.cloudera.org:8080/9246 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]> --- M src/kudu/consensus/consensus_meta.cc M src/kudu/consensus/consensus_meta.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/tablet/tablet_replica.cc M src/kudu/tserver/ts_tablet_manager.cc 6 files changed, 87 insertions(+), 155 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/9246 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia8d086c3fba52826ebe0d3a44842d53ecb6a9265 Gerrit-Change-Number: 9246 Gerrit-PatchSet: 5 Gerrit-Owner: Mike Percy <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]>
