Mike Percy has submitted this change and it was merged. (
Change subject: KUDU-2274. Shut down tombstoned replica when replacing it
KUDU-2274. Shut down tombstoned replica when replacing it
Failing to shut down a tombstoned replica after copying it can lead to
unfortunate interleavings resulting in the replica ending up in an
inconsistent state. This actually occurred in a test environment,
although it proved very hard to reproduce.
This patch includes several changes in addition to shutting down
tombstoned replicas before replacing them:
* Remove the thread safety properties of the ConsensusMetadata class
ConsensusMetadata doesn't need to be thread-safe, even though it is
ref-counted, because it is required to be externally synchronized.
This patch replaces the mutex with a DFAKE_MUTEX from the thread
collision warner utility class in order to easily detect concurrent
access due to buggy external sychronization.
* Also improve destructor state checks in TabletReplica.
* Fix another case of unlocked cmeta access by TSTabletManager.
These fixes were verified by running tombstoned_voting-stress-test with
4 CPU stress threads on the dist-test cluster after applying only the
ConsensusMetadata thread-safety portion of this patch, and then again
with the unlocked access fix and shutdown portions of this patch.
After removing the cmeta mutex only (186/200 failed):
This full patch (200/200 succeeded):
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <aser...@cloudera.com>
6 files changed, 87 insertions(+), 155 deletions(-)
Kudu Jenkins: Verified
Alexey Serbin: Looks good to me, approved
To view, visit http://gerrit.cloudera.org:8080/9246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>