Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10540 )
Change subject: [tools] extra integration tests for the rebalancer ...................................................................... Patch Set 10: (4 comments) http://gerrit.cloudera.org:8080/#/c/10540/10/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: http://gerrit.cloudera.org:8080/#/c/10540/10/src/kudu/tools/kudu-admin-test.cc@1711 PS10, Line 1711: timer.join(); : timer_cleanup.cancel(); > I agree that running the rebalancer in the main thread is nicer, but how is It's a bit different if the first run of the rebalancer takes longer then 30 seconds. Basically, I don't want to depend on the rebalancer runtime if it sees the need to continue the rebalancing over and over again when it detects some tables appear/disappear during the on-going DDL activity. In real life, I think it's OK if the rebalancer continues to run in case of ongoing DDL activity if it spots the need for additional rebalancing cycle. However, for the test I don't want big timing variations depending on ongoing DDL activity. I added a comment, trying to clarify on that point. http://gerrit.cloudera.org:8080/#/c/10540/11/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: http://gerrit.cloudera.org:8080/#/c/10540/11/src/kudu/tools/kudu-admin-test.cc@1831 PS11, Line 1831: string err; : const auto s = RunKuduTool(tool_args, &out, &err); : ASSERT_TRUE(s.ok()) << s.ToString() << ": " << err; : ASSERT_STR_CONTAINS(out, : "rebalancing is complete: cluster is balanced (moved 0 replicas)") : << "stderr: " << err; : } : : NO_FATALS(cluster_->AssertNoCrashes()); : NO_FATALS(ClusterVerifier(cluster_.get()).CheckCluster()); : } : : // The rebalancer should stop and exit upon detecting a tablet server that : // went down. That's a simple and effective way of preventing concurrent replica : // movement by the rebalancer and the > This is a strange pattern; it duplicates the cleanup code outright. Why is Yes, indeed -- that's a good observation. I wrote this code following the generic pattern with ScopedCleanup, which is common for all other places when some error might happen between the finalization code (waiting for the threads to join) and the place when the ScopedCleanup declared. Yep, in fact, given the current logic, no ScopedCleanup is necessary -- I'll remove that. I think in some cases it might be useful if the ScopedCleanup could run its code not only during the destruction phase. But I looked around and it seems there are only a few places where it's used like this, so maybe it's not worth adding a new method into ScopedCleanup. http://gerrit.cloudera.org:8080/#/c/10540/11/src/kudu/tools/kudu-admin-test.cc@1930 PS11, Line 1930: "test is sk > Nit: RuntimeError Done http://gerrit.cloudera.org:8080/#/c/10540/11/src/kudu/tools/kudu-admin-test.cc@2029 PS11, Line 2029: > Nit: RuntimeError. Done -- To view, visit http://gerrit.cloudera.org:8080/10540 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I78b3dcea71ed303f6ecd199604b2385796d05da8 Gerrit-Change-Number: 10540 Gerrit-PatchSet: 10 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Tue, 03 Jul 2018 23:48:08 +0000 Gerrit-HasComments: Yes
