Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8619 )
Change subject: KUDU-1097: 3-4-3 behavior for catalog manager ...................................................................... Patch Set 6: (14 comments) http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/consensus/quorum_util.h File src/kudu/consensus/quorum_util.h: http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/consensus/quorum_util.h@91 PS6, Line 91: // Check if the given Raft configuration needs an extra non-voter replica, : // targeted for future promotion to a voter replica in anticipation to replace : // failed or missing voter replicas in accordance with the specified replication : // factor. Return 'true' iff an extra non-voter is needed. > I think this documentation can be reduced a lot because it's very specific Sure. I hope it's clear what 'cluster is under-replicated' means. http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/consensus/quorum_util.h@95 PS6, Line 95: NeedNonVoterReplica > nit: how about renaming this IsUnderReplicated() ? The idea is to detect th Done http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/consensus/quorum_util.h@101 PS6, Line 101: ReadyToEvictReplica > nit: how about renaming this to CanEvictReplica() ? Done http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/consensus/quorum_util.cc File src/kudu/consensus/quorum_util.cc: http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/consensus/quorum_util.cc@411 PS6, Line 411: if ((peer.has_health_report() && : peer.health_report().has_overall_health() && : peer.health_report().overall_health() > We can safely check peer.health_report().overall_health() without doing any I don't think so. What if the health_report() is not present? SIGSEGV, eh? http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/consensus/quorum_util.cc@424 PS6, Line 424: num_non_voters_for_replacement > mind calling this num_non_voters_to_promote ? Done. Sorry for the confusion. http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/consensus/quorum_util.cc@432 PS6, Line 432: return replication_factor > (num_voters_total - num_voters_need_replacement) + > are you counting non-voters with promote = true as potential voters for the That invariant needs an update: it does not include replicas marked for replacement with REPLACE attribute. Also, that invariant is not quite correct from the standpoint of not mentioning replicas which health state is not known. So, instead of '# healthy voters' I think it should be '# non-failed voters not marked with the REPLACE attribute'. This code does exactly that. The quorum_util-test.cc contains numerous cases which verifies how this works. So far it looks reasonable to me. Please take a look at those as well -- let me know if you see something strange there. http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/master/catalog_manager.cc@228 PS6, Line 228: evicts > nit: I'd suggest evict instead of evicts here, like giving the master a com Done http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/master/catalog_manager.cc@3483 PS6, Line 3483: FLAGS_raft_prepare_replacement_before_eviction > shouldn't this be negated? Good catch -- it's a typo, right -- there should be a negation. Overall, it sounds like a better idea to move that 7d out of the outer scope, since that big if/else does not fit there well. Done. http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/master/catalog_manager.cc@3495 PS6, Line 3495: 7d(v) > Numbering this as 7d(v) to me implies it is a step after 7d(iv) which is mi Done http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/master/catalog_manager.cc@3498 PS6, Line 3498: New mode > avoid the use of "new" because someday someone is going to read this code w Removed. http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/master/catalog_manager.cc@3501 PS6, Line 3501: evict already replaces failed replica > evict an already-replaced failed replica Done http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/master/catalog_manager.cc@3502 PS6, Line 3502: as replacement > as a replacement Done http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/master/catalog_manager.cc@3503 PS6, Line 3503: // The leader replica will promote the newly added non-voters once : // they are in good enough shape for that. > Doesn't it seem like we are adding unnecessary detail about the overall des Removed. http://gerrit.cloudera.org:8080/#/c/8619/6/src/kudu/master/catalog_manager.cc@3512 PS6, Line 3512: ReadyToEvictReplica(config, replication_factor, &to_evict)) { > I think we want to evict first, if we can, since that means we are over-rep What is the crucial difference here? If it's over-replicated, then NeedNonVoter replica returns false, and then ReadyToEvictReplica() starts playing. I don't understand what's your point here. -- To view, visit http://gerrit.cloudera.org:8080/8619 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6f0469ac641bf7a03dbef01eaa3f1b58a5bf5d27 Gerrit-Change-Number: 8619 Gerrit-PatchSet: 6 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Thu, 23 Nov 2017 07:57:02 +0000 Gerrit-HasComments: Yes
