Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19931 )

Change subject: KUDU-3476: Make replica placement range and table aware
......................................................................


Patch Set 10:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/19931/10//COMMIT_MSG
Commit Message:

PS10:
Please also add the link to the design document.


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc@231
PS10, Line 231:              "between runs.");
Why is this change?


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc@360
PS10, Line 360:               "of 0 means table locations are not be cached.");
Why is this change?


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc@364
PS10, Line 364:             "Whether to support range-specific hash schemas for 
tables.");
Why is this change?


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc@376
PS10, Line 376: support
use


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc@393
PS10, Line 393:               "Set the ratio of how much write limit can be 
reached.");
Why is this change?


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc@418
PS10, Line 418:             "per range.");
Why is this change?


http://gerrit.cloudera.org:8080/#/c/19931/6/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/19931/6/src/kudu/master/master.proto@377
PS6, Line 377:
             :   // Replica management parameters that the tablet server is 
running with.
             :   // This field is set only if the registration field is present.
             :   optional consensus.ReplicaManagementInfoPB 
replica_management_info = 7;
             :
             :   // The number of tablets that are BOOTSTRAPPING or RUNNING in 
each dimension.
             :   // Used by the master to determine load when placing new 
tablet replicas
             :   // based on dimension.
             :   map<string, int32> num_live_tablets_by_dimension =
?


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master.proto@338
PS10, Line 338: message TabletsByRangePB {
Would be great to add comments for these two new structures as well.


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master.proto@345
PS10, Line 345: num_live_tablets_by_table
Can this be deduced out of information from the field above?


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master.proto@387
PS10, Line 387: BOOTSTRAPPING or RUNNING
What's about other states (except, say, FAILED)?  I could think of a situation 
when some tablet servers are still early in bootstrapping, so the state of 
those tablets would not be even BOOTSTRAPPING yet, but there might have many 
replicas.  Wouldn't counting only BOOTSTRAPPING and RUNNING in that case 
introduce a significant skew in the result replica placement when all the 
tablets are bootstrapped eventually?


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master_runner.cc
File src/kudu/master/master_runner.cc:

http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master_runner.cc@350
PS10, Line 350:  // Master always reads the latest data snapshot from the 
system catalog and
              :   // never uses any specific past timestamp for a read 
snapshot. With that,
              :   // it doesn't make sense to keep a long chain of UNDO deltas 
in addition to the
              :   // latest version in the MVCC. Keeping a short history of 
deltas frees CPU
I've mentioned that multiple times already, but I see you are continuing doing 
this again and again.

Just one more time: in the future changelists, please avoid changing 
non-related code (especially just comments) in a patch which main focus is 
something else.  The motivation for such policy is:
  * it helps avoiding random conflicts over unrelated code when cherry-picking 
changes in other branches
  * it's simpler for reviewers
  * it's simpler for patch authors to revv a patch
  * it's simpler tracking of changes in the future using git repository history
  * etc.

If you want to update things like comments or do any unrelated updates along 
the way, please do so in a separate patch.

Thanks!


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/placement_policy.cc
File src/kudu/master/placement_policy.cc:

http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/placement_policy.cc@152
PS10, Line 152: CHECK_GE(replicas_by_range.size(), 1);
nit: I'd change the criterion to use vector::empty() -- it's usually 
implemented to have fewer CPU instructions than vector::size()


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/placement_policy.cc@159
PS10, Line 159: CHECK_GE(replicas_by_table.size(), 1);
ditto


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/placement_policy.cc@166
PS10, Line 166: CHECK_GE(replicas_total.size(), 1);
ditto


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/ts_descriptor.cc
File src/kudu/master/ts_descriptor.cc:

http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/ts_descriptor.cc@208
PS10, Line 208:   
recent_replicas_by_range_[table_id].first.find(range_start_key) ==
              :   recent_replicas_by_range_[table_id].first.end() ||
              :   recent_replicas_by_range_[table_id].first[range_start_key] == 
0) return;
nit: wrong indent; also, use braces here for the scope since even the criterion 
spans over multiple lines


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/ts_descriptor.cc@214
PS10, Line 214:   // If map for the table or range hasn't been initialized yet, 
use init_time_ as last decay.
              :   MonoTime last_decay =
              :       last_replica_decay_by_range_.find(table_id) == 
last_replica_decay_by_range_.end() ||
              :       
last_replica_decay_by_range_[table_id].first.find(range_start_key) ==
              :       last_replica_decay_by_range_[table_id].first.end() ?
              :       init_time_ : 
last_replica_decay_by_range_[table_id].first[range_start_key];
              :   double secs_since_last_decay = (now - last_decay).ToSeconds();
              :   recent_replicas_by_range_[table_id].first[range_start_key] *=
              :       pow(0.5, secs_since_last_decay / kHalflifeSecs);
Why do we need any recency information in case of range-aware replica 
placement?  I don't recall anything like that in the design document that you 
shared offline.  Also, the commit description doesn't mention that at all.


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/tserver/ts_tablet_manager.h
File src/kudu/tserver/ts_tablet_manager.h:

http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/tserver/ts_tablet_manager.h@222
PS10, Line 222: RUNNING or BOOTSTRAPPING
What about tablet replicas in, say, INITIALIZED state, or any other state than 
FAILED?  I guess those should be counted as well when planning for distributing 
tablet replicas evenly, no?



--
To view, visit http://gerrit.cloudera.org:8080/19931
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9caeb8d5547e946bfeb152a99e1ec034c3fa0a0f
Gerrit-Change-Number: 19931
Gerrit-PatchSet: 10
Gerrit-Owner: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Zoltan Chovan <[email protected]>
Gerrit-Reviewer: Ádám Bakai <[email protected]>
Gerrit-Comment-Date: Wed, 28 Jun 2023 16:46:05 +0000
Gerrit-HasComments: Yes

Reply via email to