Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14177 )

Change subject: KUDU-2780: create thread for auto-rebalancing
......................................................................


Patch Set 11:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer-test.cc
File src/kudu/master/auto_rebalancer-test.cc:

http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer-test.cc@165
PS11, Line 165: // Each tserver is considered its own location.
I'm not sure I understand what this means.  As far as I know, in case of no 
locations are assigned, all tservers are considered to be in the same 
pseudo-location and that's not a location-aware cluster.  In that sense, there 
is no need to satisfy the location placement policy or perform any 
inter-location balancing.  Has that changed recently?

As I can see, this test doesn't assign locations for tablet servers.  If you 
want to start assigning locations to tablet servers, you can see an example how 
it's done for test scenarios like this one: 
https://github.com/apache/kudu/blob/03e2ada694290cafce0bea6ebbf092709aa64a2a/src/kudu/integration-tests/location_assignment-itest.cc#L77-L86


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer-test.cc@180
PS11, Line 180:  Each tserver is its own location
ditto


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.h
File src/kudu/master/auto_rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.h@165
PS11, Line 165:   // Structs to hold cluster information.
              :   // rebalance::ClusterRawInfo raw_info_;
              :   // rebalance::ClusterInfo cluster_info_;
              :   // rebalance::TabletsPlacementInfo placement_info_;
Remove if no longer needed?


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@224
PS11, Line 224: 
SleepFor(MonoDelta::FromSeconds(FLAGS_auto_rebalancing_interval_seconds));
Would this be a double-pause given there is another sleep at line 195?


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@228
PS11, Line 228: Rebalancer::MovesInProgress()
Why the moves in progress are empty?  I think this cannot be empty: the 
information should be kept between iterations.


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@231
PS11, Line 231: 
SleepFor(MonoDelta::FromSeconds(FLAGS_auto_rebalancing_interval_seconds));
ditto


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@236
PS11, Line 236: Rebalancer::MovesInProgress()
Ditto: why is it empty?  What if previous run has scheduled some moves?  Maybe 
I'm missing something, but I don't see the state of the scheduled moves tracked 
somewhere.   Please point there if I'm missing that.


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@239
PS11, Line 239: 
SleepFor(MonoDelta::FromSeconds(FLAGS_auto_rebalancing_interval_seconds)
ditto


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@246
PS11, Line 246: IsClusterBalanced
Why this method/function is necessary at all?  Wouldn't GetMoves() return an 
empty set of mo ves if the cluster is balanced?


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@501
PS11, Line 501:   // Since only returning live descriptors, all tservers 
healthy.
I think this logic has a flaw: the descriptors available for placement are the 
tablet servers available to move replicas _to_, but there might be other alive 
tablet servers which contain too many replicas which might be moved _from_ 
those overloaded tablet servers.

The idea behind original BuildClusterRawInfo() was about getting the whole 
picture.  If not counting in tablet replicas at servers not available now for 
placement, the picture is distorted.  That might result in first moving some 
replicas elsewhere, but then re-doing all that stuff once the tablet server 
which was not available for placement is back.

I think there should be at least an option to avoid making any moves if any 
tablets are not up.   At least, the safest default behavior is avoid moving 
replicas if it's not possible to get a full picture of replicas distribution in 
the cluster.  Besides being safe, that default behavior would be compatible 
with the default behavior of the rebalancer tool.


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@509
PS11, Line 509: ts->last_address().ToString();
I mentioned it before and I'm re-iterating this again: tablet server address is 
NOT a location and should not be treated like that.  Otherwise, I'm not sure 
the rebalancing would behave as needed in case of non-location aware clusters.  
Please fix this.



--
To view, visit http://gerrit.cloudera.org:8080/14177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifca25d1063c07047cf2123e6792b3c7395be20e4
Gerrit-Change-Number: 14177
Gerrit-PatchSet: 11
Gerrit-Owner: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 05 Oct 2019 06:21:52 +0000
Gerrit-HasComments: Yes

Reply via email to