Alexey Serbin has posted comments on this change.

Change subject: [catalog manager] fixed deadlock on catalog shutdown
......................................................................


Patch Set 2:

> (5 comments)
 > 
 > After thinking about this some more, I think this approach is too
 > brittle. My concern is that by allowing leader election callbacks
 > concurrently with tablet peer shutdown, any change to how
 > TabletPeer::Shutdown() cleans up state can lead to invalid memory
 > accesses in the leader election callback.
 > 
 > For example, someone changes TabletPeer::Shutdown() to destroy its
 > transaction tracker. If we're lucky, we'll see a SIGSEGV in a unit
 > test when the leader election callback accesses the transaction
 > tracker. If we're not, we'll get flaky tests with hard-to-debug
 > symptoms.
 >

Exactly -- I observed that while running the test some times.

 > So, I'm more inclined to keep the shutdown order as-is, and explore
 > shutting down just the consensus state machine early, in order to
 > abort all outstanding transactions. Or, alternatively, explore
 > moving more object cleanup out of TabletPeer::Shutdown() and into
 > ~TabletPeer() (right now Shutdown() also destroys the Consensus and
 > Tablet objects).

I decided to go with the former -- shutting down the consensus machine first.

-- 
To view, visit http://gerrit.cloudera.org:8080/6134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I10ad66fe33d4696adf2a02a09e2790afa8869583
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: No

Reply via email to