Adar Dembo has posted comments on this change.

Change subject: WIP: [catalog manager] fixed deadlock on catalog shutdown
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6134/2//COMMIT_MSG
Commit Message:

PS2, Line 9: Fixed deadlock on system catalog manager shutdown in case of
           : multi-master Kudu cluster.  Prior to the fix, the leader master 
often
           : hung in its 'elected-as-a-leader' callback while trying to write 
into
           : the system table.  It was awaiting for completion of the system 
table
           : operations, but those were retried indefinitely since system 
catalog
           : table's Raft quorum was not available (other masters were 
shutdown).
> I think it would be worth figuring out what changed to make this happen now
Before Alexey's catalog manager work, the leader election callback was only 
ever reading from the catalog table, and reads don't depend on Raft responses 
from potentially shut down peers.

The issue is that when we've already shut down two of three follower masters 
and we're trying to shut down the leader master, the leader master may be 
waiting on a transaction that will never finish, and we don't time out those 
transactions.


-- 
To view, visit http://gerrit.cloudera.org:8080/6134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I10ad66fe33d4696adf2a02a09e2790afa8869583
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to