Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10540 )

Change subject: [tools] extra integration tests for the rebalancer
......................................................................


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10540/10/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/10540/10/src/kudu/tools/kudu-admin-test.cc@1711
PS10, Line 1711:       timer.join();
               :       timer_cleanup.cancel();
> I agree that running the rebalancer in the main thread is nicer, but how is
It's a bit different if the first run of the rebalancer takes longer then 30 
seconds.  Basically, I don't want to depend on the rebalancer runtime if it 
sees the need to continue the rebalancing over and over again when it detects 
some tables appear/disappear during the on-going DDL activity.

In real life, I think it's OK if the rebalancer continues to run in case of 
ongoing DDL activity if it spots the need for additional rebalancing cycle.  
However, for the test I don't want big timing variations depending on ongoing 
DDL activity.

I added a comment, trying to clarify on that point.


http://gerrit.cloudera.org:8080/#/c/10540/11/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/10540/11/src/kudu/tools/kudu-admin-test.cc@1831
PS11, Line 1831:     string err;
               :     const auto s = RunKuduTool(tool_args, &out, &err);
               :     ASSERT_TRUE(s.ok()) << s.ToString() << ": " << err;
               :     ASSERT_STR_CONTAINS(out,
               :         "rebalancing is complete: cluster is balanced (moved 0 
replicas)")
               :         << "stderr: " << err;
               :   }
               :
               :   NO_FATALS(cluster_->AssertNoCrashes());
               :   NO_FATALS(ClusterVerifier(cluster_.get()).CheckCluster());
               : }
               :
               : // The rebalancer should stop and exit upon detecting a tablet 
server that
               : // went down. That's a simple and effective way of preventing 
concurrent replica
               : // movement by the rebalancer and the
> This is a strange pattern; it duplicates the cleanup code outright. Why is
Yes, indeed -- that's a good observation.

I wrote this code following the generic pattern with ScopedCleanup, which is 
common for all other places when some error might happen between the 
finalization code (waiting for the threads to join) and the place when the 
ScopedCleanup declared.

Yep, in fact, given the current logic, no ScopedCleanup is necessary -- I'll 
remove that.

I think in some cases it might be useful if the ScopedCleanup could run its 
code not only during the destruction phase.  But I looked around and it seems 
there are only a few places where it's used like this, so maybe it's not worth 
adding a new method into ScopedCleanup.


http://gerrit.cloudera.org:8080/#/c/10540/11/src/kudu/tools/kudu-admin-test.cc@1930
PS11, Line 1930:  "test is sk
> Nit: RuntimeError
Done


http://gerrit.cloudera.org:8080/#/c/10540/11/src/kudu/tools/kudu-admin-test.cc@2029
PS11, Line 2029:
> Nit: RuntimeError.
Done



--
To view, visit http://gerrit.cloudera.org:8080/10540
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I78b3dcea71ed303f6ecd199604b2385796d05da8
Gerrit-Change-Number: 10540
Gerrit-PatchSet: 10
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Tue, 03 Jul 2018 23:48:08 +0000
Gerrit-HasComments: Yes

Reply via email to