Adar Dembo has posted comments on this change. Change subject: consensus: consolidate Raft thread pools ......................................................................
Patch Set 1: I'm having trouble reconciling the SequenceToken lifecycle requirements with the complicated RaftConsensus lifecycle. The issue is that while RaftConsensus::ShutDown() is called in a normal context (typically a service pool thread handling the DeleteTablet() RPC), the object itself may be destroyed later, by its thread pool's worker thread. This happens because the RaftConsensus object is bound and retained by several callbacks that run on the thread pool. When this happens, the process crashes because ~RaftConsensus tries to release its SequenceToken, but the token's slot is still active because it's running this destructor. I'm not sure how to fix this because I still don't fully understand RaftConsensus's locking. Is it sufficient for RaftConsensus::Shutdown() to destroy the SequenceToken and for the various callbacks to check if the token is live before submitting to it? Or does that require additional synchronization? Here's a sample failure: https://gist.github.com/adembo/9a3c8e91807ba41bb4aceef67ec1dc0a with some additional logging. Note how thread 12205 initiates the crash, and the last thing it logged was "Destroying runnable" but not "Runnable destroyed"; this is a sign that the crash was in the worker thread's runnable release code. -- To view, visit http://gerrit.cloudera.org:8080/6946 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: No
