Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14177 )

Change subject: WIP KUDU-2780: create thread for auto-rebalancing
......................................................................


Patch Set 2:

(12 comments)

It seems many tests are broken now, even those which are not related to the 
rebalancer itself.  It would be nice to understand what's wrong and fix those.

http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.h
File src/kudu/master/auto_rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.h@66
PS2, Line 66: explicit
nit: I don't think explicit is needed here since it's hard to write something 
like AutoRebalancerTask s = { a, b } unintentionally.  In other words, explicit 
usually makes sense for single-parameter constructors


http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.cc@156
PS2, Line 156: AutoRebalancerTask::RunLoop()
I think if you want to adapt the current implementation to this worker model, 
it's worth taking a look at RebalancerTool::Run() and 
RebalancerTool::RunWith().  I think one iteration of the while() cycle in 
RebalancerTool::RunWith() might be one of those 'atomic' steps which we want to 
schedule by this thread, checking whether this master is still leader before 
scheduling next moves.


http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.cc@165
PS2, Line 165: if (!catalog_manager_) continue;
How could it happen that catalog_manager_ is nullptr?  And if so, wouldn't it 
be a programming mistake?  And can it ever change during the lifetime of 
AutoRebalancerTask?


http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.cc@168
PS2, Line 168:     CatalogManager::ScopedLeaderSharedLock l(catalog_manager_);
As it's written now, the scope of this lock is too wide -- it will take a whole 
iteration to pass before releasing this lock.  Meanwhile, holding this lock 
doesn't mean the actual master leader cannot change.  So, I would simply limit 
the scope of this lock for a quick leader check before proceeding with next 
rebalancing operation.


http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.cc@171
PS2, Line 171:     // TODO(hannah.nguyen): holding onto this lock too long?
             :     // Concerning especially if replica moves aren't async.
Since this lock is used only to synchronize access to closing_ by this method 
and Shutdown(), you could limit it just to the scope of checking for closing_.  
The ThreadJoiner will do its job awaiting for the thread to complete the 
iteration and exit.

Another way around is to use CountDownLatch, and then call 
latch.WaitFor(<MonoDelta interval>) instead of SleepFor().  Shutdown() then 
will need to call latch.Countdown() to signal that it's time to shut down the 
activity.


http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.cc@183
PS2, Line 183: (!BuildClusterRawInfo(boost::none, &raw_info).ok())
Does it deserve some logging if it was not possible to get a snapshot of the 
cluster's status?


http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.cc@190
PS2, Line 190: \
drop


http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.cc@210
PS2, Line 210:
nit: this extra line isn't good for readability


http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.cc@214
PS2, Line 214: / TODO(hannah.nguyen): schedule moves for placement policy 
violations
How to check for the result status of that?  It's necessary to make sure all 
violations are fixed before moving the the next phase of the rebalancing if 
PolicyFixer is enabled.


http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.cc@235
PS2, Line 235:         // Step 4: Execute moves.
             :         // TODO(hannah.nguyen): do this
I think many moves are to be done in the background, so this thread should 
simply be some sort of a dispatcher for scheduling next step and processing the 
result when they are ready.


http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.cc@239
PS2, Line 239: FLAGS_auto_rebalancing_restart_flag = false;
I'm not sure I understand the logic behind changes in 
FLAG_auto_rebalancing_restart_flag.  Who is about to set it to 'true' again?


http://gerrit.cloudera.org:8080/#/c/14177/2/src/kudu/master/auto_rebalancer.cc@246
PS2, Line 246:
nit here and 3 closing parenthesis above: an extra empty line



--
To view, visit http://gerrit.cloudera.org:8080/14177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifca25d1063c07047cf2123e6792b3c7395be20e4
Gerrit-Change-Number: 14177
Gerrit-PatchSet: 2
Gerrit-Owner: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 05 Sep 2019 18:50:27 +0000
Gerrit-HasComments: Yes

Reply via email to