Alexey Serbin has posted comments on this change. Change subject: [catalog_manager] categorization of rw operation failures ......................................................................
Patch Set 26: (23 comments) http://gerrit.cloudera.org:8080/#/c/6170/26/src/kudu/integration-tests/catalog_manager_tsk-itest.cc File src/kudu/integration-tests/catalog_manager_tsk-itest.cc: PS26, Line 82: We want this to happen as often as possible, but : // due to truncation issues (double --> int64_t) two TSK keys might be : // generated with interval less than 1 second, which is not desirable. > not sure what's the intent of this sentence? do you do something to mitigat Agreed, I think it's better to remove this comment. PS26, Line 88: master_tsk_propagated_by_non_leaders > this flag is a bit cryptic, at first I thought this was referring to tablet agreed PS26, Line 114: builder : .default_admin_operation_timeout(timeout) > no need to wrap, in fact could you do all of this in the decl line? I don't know how to do that in the decl line if not creating the second instance of the class, using copy constructor. Line 126: // signed by the key which hasn't yet reached the tserver. This can > with a key which hasn't yet reached the tserver or non-leader master (or so Both. This is workaround is no longer needed, though. PS26, Line 132: IPKI > can you TODO yourself? no longer relavant -- the TODO is resolved. PS26, Line 134: Ideally, (hb_interval_ms_ * 2) would be enough, but due : // to network latency and multi-process OS, the factor should be greater. > how about: We allow for extra slop beyond the minimum (hb_interval_ms_ * 2) no longer relavant. PS26, Line 136: SleepFor(MonoDelta::FromMilliseconds(hb_interval_ms_ * 10)); > these timeouts always come back to bite us in the form of flakyness. could This is no longer needed since the auto-retry for ERROR_UNAVAILABLE is already implemented. PS26, Line 137: // std::min<int64_t>(run_time_seconds_ * 1000 / 2, hb_interval_ms_ * 50))); > did you mean to leave this? Done PS26, Line 185: //SleepFor(MonoDelta::FromMilliseconds(hb_interval_ms_)); > left over garbage? Done http://gerrit.cloudera.org:8080/#/c/6170/26/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: PS26, Line 471: const > we never declare status const. I know you can here, but I haven't seen that Why not? And why Status is so special? It's just a simple notation to explicitly say that it's not to change in the whole scope. PS26, Line 484: const > same Probably, const Status& s = ...; would be more preferable? PS26, Line 492: LOG(WARNING) << err_msg << "will try next time"; > this is not very informative Well, it would be 'failed to refresh TSK: <status_message>: will try next time' It reports that adding a newly generated TSK did not come through, and the catalog manager will retry it on next cycle. However, the TokenSigner still has a valid key to use. I'll replace 'next time' with 'next cycle'. PS26, Line 494: has left > where did it go? :) :) Good catch! PS26, Line 762: const > same Yep, I just want to clarify on why it's not acceptable. PS26, Line 777: leadership change in the middle: : // if the master server loses its leadership role, then there will be an : // error if trying to write into the system table. If that happens, we do : // not activate the newly generated CA information since it is no longer : // relevant. If it was leadership change indeed, the system table should : // contain the CA certificate information when this master is re-elected : // next. > numbered bullets or some better way to lay this out. it's hard to parse as Done PS26, Line 794: > a Done PS26, Line 857: LOG(INFO) << "Generated new certificate authority record"; > can you provide more info? it'd be good to print a seq no or some identifie Good idea: we could print so called fingerprint of the private key. However, currently we don't have appropriate methods in PrivateKey and CaCert classes. Let me address that in a separate changelist. PS26, Line 914: using std::bind; > using declarations go on the top of the file I could not find that in the code style guide. Putting that into the top of the file does not make much sense because in this file there boost::bind is used as well. So, I decided to use std::bind() when necessary. PS26, Line 919: auto wrapper = [this](std::function<Status()> func, : const Consensus& consensus, : int64_t start_term, : const char* op_description) { : leader_lock_.AssertAcquiredForWriting(); : const Status s = func(); > not really sure what's happening here. can you simplify? I think it would be clearer if I rename the lambda and re-format this a bit. It's just a lambda declaration plus its first two lines of implementation. I'll add some more comments. PS26, Line 961: static const char* const kLoadMetaOpDescription = : "Loading table and tablet metadata into memory"; > can you declare this const on the top of the file? I would prefer to keep it here, like other OpDescription messages, where they used. Why to move to the top of the file? They would look totally irrelevant there. http://gerrit.cloudera.org:8080/#/c/6170/26/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: PS26, Line 358: changed from the initial. > what's the initial The initial is initial_term_, the initial term is sampled in the constructor. PS26, Line 568: of the object > which object? 'this' object: the instance of this class. PS26, Line 607: load TSK records from the system table and import them into the : // TokenSigner. After doing that check whether it's time to generate new : // token signing key; if so, then generate and store the new one in the system : // catalog table. Besides, purge the expired TSKs from the system table. > can you split this into numbered steps (1, 1.1, 1.2, 2) etc Done -- To view, visit http://gerrit.cloudera.org:8080/6170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8 Gerrit-PatchSet: 26 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
