Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14222 )

Change subject: KUDU-2069 p4: stop replication from failed servers in 
maintenance mode
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc
File src/kudu/integration-tests/maintenance_mode-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@99
PS2, Line 99:   Status GetNumBytesCopied(int64_t* num_bytes_copied) {
This is a proxy for the question "did we rereplicate anything", right? Would it 
be more intuitive if we looked at the count of the server entity's 
METRIC_handler_latency_kudu_tserver_TabletCopyService_BeginTabletCopySession 
instead?

Also, you only ever compare the value to 0, so maybe convert this into a yes/no 
function?


http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@151
PS2, Line 151:  protected:
Don't need this?


http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@208
PS2, Line 208:   int64_t num_bytes_copied = 0;
Nit: don't need to initialize this.


http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/master/catalog_manager.cc@4190
PS2, Line 4190:         unordered_set<string> whitelisted_uuids;
              :         
master_->ts_manager()->GetWhitelistedUuids(&whitelisted_uuids);
It's wasteful to generate this for each tablet in the report.


http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/master/ts_manager.h
File src/kudu/master/ts_manager.h:

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/master/ts_manager.h@94
PS2, Line 94:   void GetWhitelistedUuids(std::unordered_set<std::string>* 
uuids) const;
"Whitelisted" doesn't really capture the expected semantics of these UUIDs.

Another thing I noticed is that there's potentially a big difference in 
cardinality between the returned set (number of tservers in maintenance mode) 
and the number of iterations in ShouldAddReplica (replication factor).

Taken together, perhaps we shouldn't generate this set up front, and instead 
pass the TSManager in some fashion into ShouldAddReplica? The consensus -> 
master dependency means we shouldn't pass in TSManager as-is, but maybe we 
could pass it via a lambda that takes a UUID and returns true if the tserver 
should be ignored for underreplication purposes?



--
To view, visit http://gerrit.cloudera.org:8080/14222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a63b55011d16900c0d27eac0eb75880074204db
Gerrit-Change-Number: 14222
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 13 Sep 2019 04:24:17 +0000
Gerrit-HasComments: Yes

Reply via email to