Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/19931 )
Change subject: KUDU-3476: Make replica placement range and table aware ...................................................................... Patch Set 10: (18 comments) http://gerrit.cloudera.org:8080/#/c/19931/10//COMMIT_MSG Commit Message: PS10: Please also add the link to the design document. http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc@231 PS10, Line 231: "between runs."); Why is this change? http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc@360 PS10, Line 360: "of 0 means table locations are not be cached."); Why is this change? http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc@364 PS10, Line 364: "Whether to support range-specific hash schemas for tables."); Why is this change? http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc@376 PS10, Line 376: support use http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc@393 PS10, Line 393: "Set the ratio of how much write limit can be reached."); Why is this change? http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/catalog_manager.cc@418 PS10, Line 418: "per range."); Why is this change? http://gerrit.cloudera.org:8080/#/c/19931/6/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/19931/6/src/kudu/master/master.proto@377 PS6, Line 377: : // Replica management parameters that the tablet server is running with. : // This field is set only if the registration field is present. : optional consensus.ReplicaManagementInfoPB replica_management_info = 7; : : // The number of tablets that are BOOTSTRAPPING or RUNNING in each dimension. : // Used by the master to determine load when placing new tablet replicas : // based on dimension. : map<string, int32> num_live_tablets_by_dimension = ? http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master.proto@338 PS10, Line 338: message TabletsByRangePB { Would be great to add comments for these two new structures as well. http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master.proto@345 PS10, Line 345: num_live_tablets_by_table Can this be deduced out of information from the field above? http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master.proto@387 PS10, Line 387: BOOTSTRAPPING or RUNNING What's about other states (except, say, FAILED)? I could think of a situation when some tablet servers are still early in bootstrapping, so the state of those tablets would not be even BOOTSTRAPPING yet, but there might have many replicas. Wouldn't counting only BOOTSTRAPPING and RUNNING in that case introduce a significant skew in the result replica placement when all the tablets are bootstrapped eventually? http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master_runner.cc File src/kudu/master/master_runner.cc: http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master_runner.cc@350 PS10, Line 350: // Master always reads the latest data snapshot from the system catalog and : // never uses any specific past timestamp for a read snapshot. With that, : // it doesn't make sense to keep a long chain of UNDO deltas in addition to the : // latest version in the MVCC. Keeping a short history of deltas frees CPU I've mentioned that multiple times already, but I see you are continuing doing this again and again. Just one more time: in the future changelists, please avoid changing non-related code (especially just comments) in a patch which main focus is something else. The motivation for such policy is: * it helps avoiding random conflicts over unrelated code when cherry-picking changes in other branches * it's simpler for reviewers * it's simpler for patch authors to revv a patch * it's simpler tracking of changes in the future using git repository history * etc. If you want to update things like comments or do any unrelated updates along the way, please do so in a separate patch. Thanks! http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/placement_policy.cc File src/kudu/master/placement_policy.cc: http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/placement_policy.cc@152 PS10, Line 152: CHECK_GE(replicas_by_range.size(), 1); nit: I'd change the criterion to use vector::empty() -- it's usually implemented to have fewer CPU instructions than vector::size() http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/placement_policy.cc@159 PS10, Line 159: CHECK_GE(replicas_by_table.size(), 1); ditto http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/placement_policy.cc@166 PS10, Line 166: CHECK_GE(replicas_total.size(), 1); ditto http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/ts_descriptor.cc File src/kudu/master/ts_descriptor.cc: http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/ts_descriptor.cc@208 PS10, Line 208: recent_replicas_by_range_[table_id].first.find(range_start_key) == : recent_replicas_by_range_[table_id].first.end() || : recent_replicas_by_range_[table_id].first[range_start_key] == 0) return; nit: wrong indent; also, use braces here for the scope since even the criterion spans over multiple lines http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/ts_descriptor.cc@214 PS10, Line 214: // If map for the table or range hasn't been initialized yet, use init_time_ as last decay. : MonoTime last_decay = : last_replica_decay_by_range_.find(table_id) == last_replica_decay_by_range_.end() || : last_replica_decay_by_range_[table_id].first.find(range_start_key) == : last_replica_decay_by_range_[table_id].first.end() ? : init_time_ : last_replica_decay_by_range_[table_id].first[range_start_key]; : double secs_since_last_decay = (now - last_decay).ToSeconds(); : recent_replicas_by_range_[table_id].first[range_start_key] *= : pow(0.5, secs_since_last_decay / kHalflifeSecs); Why do we need any recency information in case of range-aware replica placement? I don't recall anything like that in the design document that you shared offline. Also, the commit description doesn't mention that at all. http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/tserver/ts_tablet_manager.h@222 PS10, Line 222: RUNNING or BOOTSTRAPPING What about tablet replicas in, say, INITIALIZED state, or any other state than FAILED? I guess those should be counted as well when planning for distributing tablet replicas evenly, no? -- To view, visit http://gerrit.cloudera.org:8080/19931 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9caeb8d5547e946bfeb152a99e1ec034c3fa0a0f Gerrit-Change-Number: 19931 Gerrit-PatchSet: 10 Gerrit-Owner: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Reviewer: Zoltan Chovan <[email protected]> Gerrit-Reviewer: Ádám Bakai <[email protected]> Gerrit-Comment-Date: Wed, 28 Jun 2023 16:46:05 +0000 Gerrit-HasComments: Yes
