Alexey Serbin has posted comments on this change. Change subject: KUDU-1927 [master] always send IPKI CA certificate ......................................................................
Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/6356/4/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: PS4, Line 748: MAYBE_INJECT_RANDOM_LATENCY( : FLAGS_catalog_manager_inject_latency_load_ca_info_ms); > nit: no need to wrap, right? It seems I have my editors set to wrap at 79 chars (set textwidth=79). Perhaps, I need to put 99 instead. http://gerrit.cloudera.org:8080/#/c/6356/4/src/kudu/master/master_service.cc File src/kudu/master/master_service.cc: PS4, Line 424: resp->set_role(RaftPeerPB::UNKNOWN_ROLE); : resp->mutable_error()->set_code(MasterErrorPB::NOT_THE_LEADER); : StatusToPB(leader_status, resp->mutable_error()->mutable_status()); : VLOG(3) << "Detected transient state; responding with: " : << leader_status.ToString(); > sorta curious why we don't just treat this the same as still being a follow For some reason I started thinking about this as an error which is some sort of transient state which would be nice to capture on the at the client side. Another factor is that I was not sure that the situation where all servers report their role as followers is correctly handled at the client side (as I see now, it is). Do you think it's better to not report this condition as an error? PS4, Line 456: resp->mutable_error()->set_code(MasterErrorPB::NOT_THE_LEADER); : const Status resp_status(Status::ServiceUnavailable( : "master server could not generate authn token: $0", s.ToString())); : > odd to be using NOT_THE_LEADER here... here we are the leader (the leader s As for the NOT_THE_LEADER -- that's the same logic as above: de-facto the master server is a leader of the consensus group, but we intentionally demoting itself to UNKNOWN_ROLE and reporting back an error if we see it cannot serve the request appropriately. Not sure why this seems odd. As for the fail fast instead of retry, if we are sure the error from GenerateAuthToken() cannot be transient, then it would make sense. Right now that's true, so it would make sense. A couple of questions to clarify on this, please: 1) What is the best way to report on error to client in this case and do we need to do that at all? Something like sending MasterErrorPB::UNKNOWN_ERROR ? 2) What would be our policy regarding the master server itself? Just crash it even in non-debug build? -- To view, visit http://gerrit.cloudera.org:8080/6356 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a20e6da6531091fcc597af58a7bbf01a8847f5 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
