Adar Dembo has posted comments on this change.

Change subject: consensus: consolidate Raft thread pools
......................................................................


Patch Set 1:

I'm having trouble reconciling the SequenceToken lifecycle requirements with 
the complicated RaftConsensus lifecycle.

The issue is that while RaftConsensus::ShutDown() is called in a normal context 
(typically a service pool thread handling the DeleteTablet() RPC), the  object 
itself may be destroyed later, by its thread pool's worker thread. This happens 
because the RaftConsensus object is bound and retained by several callbacks 
that run on the thread pool. When this happens, the process crashes because 
~RaftConsensus tries to release its SequenceToken, but the token's slot is 
still active because it's running this destructor.

I'm not sure how to fix this because I still don't fully understand 
RaftConsensus's locking. Is it sufficient for RaftConsensus::Shutdown() to 
destroy the SequenceToken and for the various callbacks to check if the token 
is live before submitting to it? Or does that require additional 
synchronization?

Here's a sample failure: 
https://gist.github.com/adembo/9a3c8e91807ba41bb4aceef67ec1dc0a with some 
additional logging. Note how thread 12205 initiates the crash, and the last 
thing it logged was "Destroying runnable" but not "Runnable destroyed"; this is 
a sign that the crash was in the worker thread's runnable release code.

-- 
To view, visit http://gerrit.cloudera.org:8080/6946
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: No

Reply via email to