Alexey Serbin has posted comments on this change. Change subject: [catalog_manager] proper handling of catalog shutdown ......................................................................
Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/6170/4/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: Line 459: bool error_is_not_leader; > What is this boolean doing? It's passed as an out parameter to TryGenerate I didn't want to introduce optional parameters. Yes, in this context it's not used, but it's used in CatalogManager::VisitTablesAndTabletsTask() code -- that was the primary reason for introducing this additional boolean parameter. Line 763: if (s.IsNotFound()) { > This method might be simpler if it's structured as a single if/else chain: The idea was to return earlier, hoping for better readability: http://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code Line 764: LOG(INFO) << "Did not find CA certificate and key for Kudu IPKI, " > This is going to be pretty chatty, perhaps VLOG? This is going to be written only once -- on the very first cluster start-up. Hence, I don't think it's worth replacing it with VLOG. http://gerrit.cloudera.org:8080/#/c/6170/4/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: PS4, Line 585: bool* error_is_not_leader > I agree with dan. In fact this was part of the motivation for my comments y The original idea was to detect whether a consensus-related error is tolerable or not. The common use case was: if write into the system table fails, then master cannot use the newly generated entity (a CA cert of a Token Signing Key). If it turns out that the master is no longer a leader -- that's fine, because non-leader masters do not use those (i.e. that was a scenario when leadership changed just after then-a-leader-master generated a new CA or Token Signing Key but before it successfully persisted the data into the system table). However, if it turns that the master is a leader but it could not persist the newly generated data, then it means it cannot use it, and in case of CA cert and the only available Token Signing Key that means such leader master is useless in a secure cluster. Basically, that means cluster is left with no leader master in that case. One approach was to crash the master in such a condition -- we discussed that some time ago with Todd. After some time, another train of sought appeared -- let's just not crash the master, but let it stay in such a useless state for a while. The latter option does not goes well with the current code structure -- the only place when a master re-generates a CA cert is elected-as-a-leader callback. So, I'm not sure what would be the right behavior here. -- To view, visit http://gerrit.cloudera.org:8080/6170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes