Adar Dembo has posted comments on this change. Change subject: WIP: [catalog manager] fixed deadlock on catalog shutdown ......................................................................
Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6134/2//COMMIT_MSG Commit Message: PS2, Line 9: Fixed deadlock on system catalog manager shutdown in case of : multi-master Kudu cluster. Prior to the fix, the leader master often : hung in its 'elected-as-a-leader' callback while trying to write into : the system table. It was awaiting for completion of the system table : operations, but those were retried indefinitely since system catalog : table's Raft quorum was not available (other masters were shutdown). > I think it would be worth figuring out what changed to make this happen now Before Alexey's catalog manager work, the leader election callback was only ever reading from the catalog table, and reads don't depend on Raft responses from potentially shut down peers. The issue is that when we've already shut down two of three follower masters and we're trying to shut down the leader master, the leader master may be waiting on a transaction that will never finish, and we don't time out those transactions. -- To view, visit http://gerrit.cloudera.org:8080/6134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I10ad66fe33d4696adf2a02a09e2790afa8869583 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
