Alexey Serbin has posted comments on this change.

Change subject: [catalog_manager] proper handling of catalog shutdown
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6170/4/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

Line 459:           bool error_is_not_leader;
> What is this boolean doing?  It's passed as an out parameter to TryGenerate
I didn't want to introduce optional parameters.  Yes, in this context it's not 
used, but it's used in  CatalogManager::VisitTablesAndTabletsTask() code -- 
that was the primary reason for introducing this additional boolean parameter.


Line 763:   if (s.IsNotFound()) {
> This method might be simpler if it's structured as a single if/else chain:
The idea was to return earlier, hoping for better readability:  
http://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code


Line 764:     LOG(INFO) << "Did not find CA certificate and key for Kudu IPKI, "
> This is going to be pretty chatty, perhaps VLOG?
This is going to be written only once -- on the very first cluster start-up.  
Hence, I don't think it's worth replacing it with VLOG.


http://gerrit.cloudera.org:8080/#/c/6170/4/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

PS4, Line 585: bool* error_is_not_leader
> I agree with dan. In fact this was part of the motivation for my comments y
The original idea was to detect whether a consensus-related error is tolerable 
or not.  The common use case was: if write into the system table fails, then 
master cannot use the newly generated entity (a CA cert of a Token Signing Key).

If it turns out that the master is no longer a leader -- that's fine, because 
non-leader masters do not use those (i.e. that was a scenario when leadership 
changed just after then-a-leader-master generated a new CA or Token Signing Key 
but before it successfully persisted the data into the system table).

However, if it turns that the master is a leader but it could not persist the 
newly generated data, then it means it cannot use it, and in case of CA cert 
and the only available Token Signing Key that means such leader master is 
useless in a secure cluster.  Basically, that means cluster is left with no 
leader master in that case.  One approach was to crash the master in such a 
condition -- we discussed that some time ago with Todd.  After some time, 
another train of sought appeared -- let's just not crash the master, but let it 
stay in such a useless state for a while.  The latter option does not goes well 
with the current code structure -- the only place when a master re-generates a 
CA cert is elected-as-a-leader callback.

So, I'm not sure what would be the right behavior here.


-- 
To view, visit http://gerrit.cloudera.org:8080/6170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to