Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13632 )
Change subject: KUDU-2823 Place tablet replicas based on deminsion ...................................................................... Patch Set 10: (34 comments) I skimmed trough, and overall it looks reasonable. There is a few question on the determining the load of a tablet server when using dimension labeling for new replicas and it would be nice to have a bit more of the unit test coverage. http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@7 PS10, Line 7: deminsion dimension http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@9 PS10, Line 9: the newly : created tablet to be evenly distributed on each tablet server maybe, rephrase for clarity: ... replicas of the newly created tablet to be evenly distributed across all available tablet servers. http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@25 PS10, Line 25: deminsion dimension http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@27 PS10, Line 27: deminsion dimension http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@30 PS10, Line 30: deminsion dimension http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/client/client.h File src/kudu/client/client.h: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/client/client.h@882 PS10, Line 882: the tablet Which tablet? Or you meant 'table'? http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/client/client.h@1210 PS10, Line 1210: that was created to be created ? http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc File src/kudu/integration-tests/create-table-itest.cc: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc@228 PS10, Line 228: vector<string> ts_flags; maybe, drop this since it's empty anyways. http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc@229 PS10, Line 229: vector<string> master_flags; : master_flags.emplace_back("--master_place_tablet_replicas_based_on_dimension=true"); : master_flags.emplace_back("--tserver_last_replica_creations_halflife_ms=10"); nit: the shorter form might be const vector<string> master_flags = { "--master_place_tablet_replicas_based_on_dimension", "--tserver_last_replica_creations_halflife_ms=10", }; http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc@233 PS10, Line 233: ts_flags nit: {} http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc@321 PS10, Line 321: 10 kNumServers ? http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@708 PS10, Line 708: boost::optional<string>* why boost::optional<string>* ? It seems string* as a type for the output parameter is good enough here, no? http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@3733 PS10, Line 3733: range dimension ? http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@3748 PS10, Line 3748: if (PREDICT_TRUE(s.ok())) { : PlacementPolicy policy(std::move(ts_descs), rng_); : s = policy.PlaceExtraTabletReplica(std::move(existing), range, &extra_replica); : } I think that in case of failure to get partition's debug string it's safe enough to just log a warning and proceed with empty/none dimension label. The idea is that this method is used for re-replication as well, and it's better to place a replica maybe a bit off from the dimension-dictated destination, but still preserve availability of the tablet's data. http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@4688 PS10, Line 4688: range dimension http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@4688 PS10, Line 4688: enable is enabled http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/master.proto@321 PS10, Line 321: when drop http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc File src/kudu/master/placement_policy-test.cc: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@904 PS10, Line 904: { "ts1", 1000, { { "A", 1000 }, } }, : { "ts2", 1000, { { "A", 1000 }, } }, It would be nice to add a few scenarios where: * tablets for multiple dimensions label are being added (like 5 different dimensions) for the same table * the initial distribution of tablet replicas has a few 'deeps' and 'peaks' around some average level: e.g., 10, 20, 30 * the replicas are placed for tablets with replication factor of 3 (i.e. the first argument for PlaceTabletReplicas() is 3, not only 1). * verify how PlaceExtraTabletReplica() works as well * there scenarios should cover the cases where the replication factor is a) equal to the number of tablet servers b) replication factor + 1 (e.g., for RF=3 that's 4 tablet servers) c) two times of the replication factor (e.g. 6 tablet servers or more for RF=3). In the end, it's necessary to validate the overall distribution of the replicas cluster-wise and label-wise. http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@910 PS10, Line 910: fuc nit: func http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@930 PS10, Line 930: boost::none It would be nice to have a coverage for the scenario when label-specific replicas are added in the case of huge imbalance of total number of replicas. http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@933 PS10, Line 933: ASSERT_TRUE(ts_uuid == "ts0" || ts_uuid == "ts1" || ts_uuid == "ts2") << ts_uuid; drop this: it does not provide much of the semantically meaningful coverage http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@941 PS10, Line 941: ASSERT_GE(stddev, 20.0); Did you run it many times (like thousands) to be sure that's a good enough threshold to avoid flakiness in the future runs? http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@944 PS10, Line 944: B For clarity, it's better to use some other label which is not easily conflated with table names ('A' is used as a table name in the initial distribution of replicas for the test). Maybe, something like 'label0' ? http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@944 PS10, Line 944: Ask for Place http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@944 PS10, Line 944: with with dimension label ? http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@954 PS10, Line 954: 1 It would be nice to add a test scenario that corresponds to a replication factors of 3 and 5 as well. http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@957 PS10, Line 957: ASSERT_TRUE(ts_uuid == "ts0" || ts_uuid == "ts1" || ts_uuid == "ts2") << ts_uuid; I'm not sure this assert has some semantically meaningful coverage: sure, the selected tablet server is one of the registered ones (there are no others). Also, the ASSERT() at line 961 covers this as well. http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@964 PS10, Line 964: LOG(INFO) << "stddev = " << stddev; Is it crucial to output from the test? If that was just for debugging purposes, consider removing this once the test is ready. http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy.cc File src/kudu/master/placement_policy.cc: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy.cc@56 PS10, Line 56: // TODO (oclarms): get the number of times this tablet server has recently been : // selected to create a tablet replica by dimension. Is this about somehow amplifying the numbers already accounted for by RecentReplicaCreations()? I'm curious why that is needed. Does it mean the current way of accounting is not good enough? http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy.cc@58 PS10, Line 58: return desc->RecentReplicaCreations() + desc->num_live_replicas(std::move(dimension)); What if measuring the load of two tablet servers, where nothing has been created so far (so both RecentReplicaCreations() and num_live_replicas(label) are 0), but first has 10, but second 20 tablet replicas total? I think the former one should be considered less loaded than the second. Or you want those dimension to be completely independent from the overall load? Maybe, add num_live_replicas() / 10000 or something like that into the sum? Ideally, that should be something that expresses the preference of using dimension-only count against weighted 'all other counts'. I'm not sure what's the best way to express it here, though. Probably, using 1/10000 is good enough initially given it's not expected to host that many replicas per tablet server :) http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy.cc@73 PS10, Line 73: If nit: If --> if http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/ts_descriptor.h File src/kudu/master/ts_descriptor.h: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/ts_descriptor.h@136 PS10, Line 136: number total number http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/tserver/ts_tablet_manager.h@204 PS10, Line 204: Return nit: it's not the result value of this method per se, so maybe replace 'Return' with simply 'Get' here. http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/tserver/ts_tablet_manager.cc@1248 PS10, Line 1248: CHECK nit: it's safe to use DCHECK here since in release build it would crash with SIGSEGV anyways at line 1261 if num_live_tablets_by_dimension was nullptr. -- To view, visit http://gerrit.cloudera.org:8080/13632 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48a225e221eb42ef2f5489687e80a151d8dc1a42 Gerrit-Change-Number: 13632 Gerrit-PatchSet: 10 Gerrit-Owner: Yao Xu <ocla...@gmail.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Yao Xu <ocla...@gmail.com> Gerrit-Comment-Date: Thu, 04 Jul 2019 18:49:11 +0000 Gerrit-HasComments: Yes