Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )
Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers ...................................................................... Patch Set 13: (20 comments) http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc File src/kudu/rebalance/rebalance-test.cc: http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@156 PS12, Line 156: } > const ? Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@157 PS12, Line 157: in() > nit: fix the alignment Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@197 PS12, Line 197: rhs.servers_by_total_replica_count); : } > nit: is it possible to templatize bool HasSameContents(const ServersByCount Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@271 PS12, Line 271: Empty > I think this method by itself doesn't imply working on ignored or other sub Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@293 PS12, Line 293: { "tablet_a0", "table_a", { { "ts_0", true }, }, }, > I think it's worth adding a comment explaining the essence of the constrain Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@296 PS12, Line 296: }, > Add SCOPED_TRACE() to get more information on the context if ASSERT_FALSE() Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@527 PS12, Line 527: > if moving replicas from Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance_algo.h File src/kudu/rebalance/rebalance_algo.h: http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance_algo.h@92 PS12, Line 92: on > nit: on Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalancer.cc File src/kudu/rebalance/rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalancer.cc@450 PS12, Line 450: const int* replica_count = FindOrNull(tserver_replicas_count, ignored_tserver); : if (!replica_count) { > Would FindOrNull() work here? Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc File src/kudu/tools/rebalancer_tool-test.cc: http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@273 PS12, Line 273: "--ignored_tservers=" + JoinStrings(ignored_tservers, ","), : "--move_replicas_from_ignored_tserv > Join these two strings. Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@701 PS12, Line 701: // Wait for the catalog manager to unde > If an ignored tablet server went down Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@702 PS12, Line 702: > server Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@706 PS12, Line 706: > nit: move this to the next line Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@757 PS12, Line 757: ts()) { > I think it might be important to know which tablet server is in progress of It's quiet difficult to know which tablet server is in progress of copying data, through ksck tool we could only know some tablets are copying data (tablets contains 4 replicas and one of replicas is not running), but which replica would be removed is not sure. The second case is similar to the case where the ignored tablet server goes down before run the rebalancing. http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@795 PS12, Line 795: } > Is there some significance behind starting the tablet server back? If yes, Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool.cc File src/kudu/tools/rebalancer_tool.cc: http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool.cc@639 PS12, Line 639: > healthy non-ignored Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool.cc@1243 PS12, Line 1243: auto s = CheckComplet > I'm not sure this is correct by the semantics of the 'has_updates' variable I think the 'has_updates' variable indicates whether the statuses of in-progress moves has been updated. In RebalancerTool::RunWith(Runner* runner, RunStatus* result_status) method, if no updates, the rebalancer tool will re-synchronize the state of the cluster. I think it's not necessary to re-synchronize the cluster state when the move has not been completed. http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/tool_action_cluster.cc File src/kudu/tools/tool_action_cluster.cc: http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/tool_action_cluster.cc@131 PS12, Line 131: the source tablet > the source tablet server Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/tool_action_cluster.cc@132 PS12, Line 132: "This setting is effective only if the '--ignored_tservers' flag " > nit: add a space between 'flag' and '"' Done http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/tool_action_cluster.cc@134 PS12, Line 134: then all ignored tablet servers must be placed into " : "the 'maintenance mode'."); > then all ignored tablet servers must be placed into the 'maintenance mode' Done -- To view, visit http://gerrit.cloudera.org:8080/14154 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89 Gerrit-Change-Number: 14154 Gerrit-PatchSet: 13 Gerrit-Owner: Yifan Zhang <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hannah Nguyen <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Comment-Date: Fri, 15 Nov 2019 08:01:36 +0000 Gerrit-HasComments: Yes
