Alexey Serbin has posted comments on this change. Change subject: [catalog manager] fixed deadlock on catalog shutdown ......................................................................
Patch Set 2: > (5 comments) > > After thinking about this some more, I think this approach is too > brittle. My concern is that by allowing leader election callbacks > concurrently with tablet peer shutdown, any change to how > TabletPeer::Shutdown() cleans up state can lead to invalid memory > accesses in the leader election callback. > > For example, someone changes TabletPeer::Shutdown() to destroy its > transaction tracker. If we're lucky, we'll see a SIGSEGV in a unit > test when the leader election callback accesses the transaction > tracker. If we're not, we'll get flaky tests with hard-to-debug > symptoms. > Exactly -- I observed that while running the test some times. > So, I'm more inclined to keep the shutdown order as-is, and explore > shutting down just the consensus state machine early, in order to > abort all outstanding transactions. Or, alternatively, explore > moving more object cleanup out of TabletPeer::Shutdown() and into > ~TabletPeer() (right now Shutdown() also destroys the Consensus and > Tablet objects). I decided to go with the former -- shutting down the consensus machine first. -- To view, visit http://gerrit.cloudera.org:8080/6134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I10ad66fe33d4696adf2a02a09e2790afa8869583 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: No
