Adar Dembo has posted comments on this change. Change subject: WIP: [catalog manager] fixed deadlock on catalog shutdown ......................................................................
Patch Set 2: (5 comments) After thinking about this some more, I think this approach is too brittle. My concern is that by allowing leader election callbacks concurrently with tablet peer shutdown, any change to how TabletPeer::Shutdown() cleans up state can lead to invalid memory accesses in the leader election callback. For example, someone changes TabletPeer::Shutdown() to destroy its transaction tracker. If we're lucky, we'll see a SIGSEGV in a unit test when the leader election callback accesses the transaction tracker. If we're not, we'll get flaky tests with hard-to-debug symptoms. So, I'm more inclined to keep the shutdown order as-is, and explore shutting down just the consensus state machine early, in order to abort all outstanding transactions. Or, alternatively, explore moving more object cleanup out of TabletPeer::Shutdown() and into ~TabletPeer() (right now Shutdown() also destroys the Consensus and Tablet objects). http://gerrit.cloudera.org:8080/#/c/6134/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: Line 722: ConsensusStatePB cstate = consensus->ConsensusState(CONSENSUS_CONFIG_COMMITTED); Need to null check here. Line 996: return sys_catalog_->tablet_peer_->shared_consensus()->role(); Do we need to null check here? When is Role() invoked? Line 1027: // Shut down the underlying storage for tables and tablets. This aborts There should be a larger explanation here justifying why we're allowing the table visitors to run rampant on potentially half-uninitialized memory. Line 3350: // Nevertheless, returning OK here. How will CheckIfFatalLeaderError() work on this function if we return OK? Line 3981: ConsensusStatePB cstate = consensus->ConsensusState(CONSENSUS_CONFIG_COMMITTED); Do we need to null check here? -- To view, visit http://gerrit.cloudera.org:8080/6134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I10ad66fe33d4696adf2a02a09e2790afa8869583 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
