Alexey Serbin has posted comments on this change.

Change subject: [catalog_manager] categorization of rw operation failures
......................................................................


Patch Set 26:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/6170/26/src/kudu/integration-tests/catalog_manager_tsk-itest.cc
File src/kudu/integration-tests/catalog_manager_tsk-itest.cc:

PS26, Line 82:  We want this to happen as often as possible, but
             :       // due to truncation issues (double --> int64_t) two TSK 
keys might be
             :       // generated with interval less than 1 second, which is 
not desirable.
> not sure what's the intent of this sentence? do you do something to mitigat
Agreed, I think it's better to remove this comment.


PS26, Line 88: master_tsk_propagated_by_non_leaders
> this flag is a bit cryptic, at first I thought this was referring to tablet
agreed


PS26, Line 114: builder
              :         .default_admin_operation_timeout(timeout)
> no need to wrap, in fact could you do all of this in the decl line?
I don't know how to do that in the decl line if not creating the second 
instance of the class, using copy constructor.


Line 126:     // signed by the key which hasn't yet reached the tserver. This 
can
> with a key which hasn't yet reached the tserver or non-leader master (or so
Both.

This is workaround is no longer needed, though.


PS26, Line 132: IPKI
> can you TODO yourself?
no longer relavant -- the TODO is resolved.


PS26, Line 134: Ideally, (hb_interval_ms_ * 2) would be enough, but due
              :     //   to network latency and multi-process OS, the factor 
should be greater.
> how about: We allow for extra slop beyond the minimum (hb_interval_ms_ * 2)
no longer relavant.


PS26, Line 136: SleepFor(MonoDelta::FromMilliseconds(hb_interval_ms_ * 10));
> these timeouts always come back to bite us in the form of flakyness. could 
This is no longer needed since the auto-retry for ERROR_UNAVAILABLE is already 
implemented.


PS26, Line 137: //    std::min<int64_t>(run_time_seconds_ * 1000 / 2, 
hb_interval_ms_ * 50)));
> did you mean to leave this?
Done


PS26, Line 185: //SleepFor(MonoDelta::FromMilliseconds(hb_interval_ms_));
> left over garbage?
Done


http://gerrit.cloudera.org:8080/#/c/6170/26/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

PS26, Line 471: const
> we never declare status const. I know you can here, but I haven't seen that
Why not?  And why Status is so special?

It's just a simple notation to explicitly say that it's not to change in the 
whole scope.


PS26, Line 484: const
> same
Probably, const Status& s = ...; would be more preferable?


PS26, Line 492: LOG(WARNING) << err_msg << "will try next time";
> this is not very informative
Well, it would be

'failed to refresh TSK: <status_message>: will try next time'

It reports that adding a newly generated TSK did not come through, and the 
catalog manager will retry it on next cycle.  However, the TokenSigner still 
has a valid key to use.

I'll replace 'next time' with 'next cycle'.


PS26, Line 494: has left
> where did it go? :)
:)

Good catch!


PS26, Line 762: const
> same
Yep, I just want to clarify on why it's not acceptable.


PS26, Line 777:  leadership change in the middle:
              :     // if the master server loses its leadership role, then 
there will be an
              :     // error if trying to write into the system table. If that 
happens, we do
              :     // not activate the newly generated CA information since it 
is no longer
              :     // relevant. If it was leadership change indeed, the system 
table should
              :     // contain the CA certificate information when this master 
is re-elected
              :     // next.
> numbered bullets or some better way to lay this out. it's hard to parse as 
Done


PS26, Line 794:  
> a
Done


PS26, Line 857: LOG(INFO) << "Generated new certificate authority record";
> can you provide more info? it'd be good to print a seq no or some identifie
Good idea: we could print so called fingerprint of the private  key.  However, 
currently we don't have appropriate methods in PrivateKey and CaCert classes.

Let me address that in a separate changelist.


PS26, Line 914: using std::bind;
> using declarations go on the top of the file
I could not find that in the code style guide.  Putting that into the top of 
the file does not make much sense because in this file there boost::bind is 
used as well.  So, I decided to use std::bind() when necessary.


PS26, Line 919:     auto wrapper = [this](std::function<Status()> func,
              :                           const Consensus& consensus,
              :                           int64_t start_term,
              :                           const char* op_description) {
              :       leader_lock_.AssertAcquiredForWriting();
              :       const Status s = func();
> not really sure what's happening here. can you simplify?
I think it would be clearer if I rename the lambda and re-format this a bit.

It's just a lambda declaration plus its first two lines of implementation.  
I'll add some more comments.


PS26, Line 961: static const char* const kLoadMetaOpDescription =
              :         "Loading table and tablet metadata into memory";
> can you declare this const on the top of the file?
I would prefer to keep it here, like other OpDescription messages, where they 
used.

Why to move to the top of the file?  They would look totally irrelevant there.


http://gerrit.cloudera.org:8080/#/c/6170/26/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

PS26, Line 358: changed from the initial.
> what's the initial
The initial is initial_term_, the initial term is sampled in the constructor.


PS26, Line 568: of the object
> which object?
'this' object: the instance of this class.


PS26, Line 607: load TSK records from the system table and import them into the
              :   // TokenSigner. After doing that check whether it's time to 
generate new
              :   // token signing key; if so, then generate and store the new 
one in the system
              :   // catalog table. Besides, purge the expired TSKs from the 
system table.
> can you split this into numbered steps (1, 1.1, 1.2, 2) etc
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8
Gerrit-PatchSet: 26
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to