Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14111 )

Change subject: KUDU-2069 pt 1: add a maintenance mode
......................................................................


Patch Set 6:

(4 comments)

I started to review this, but it's a pretty big patch with several different 
concerns. How about breaking it up like this:
1. New persistent state. I can think of a few different ways to model this, so 
a focused review on just that would be nice.
2. In-memory state tracking changes.
3. Wiring up the state to do something useful (i.e. on heartbeating or other 
triggers).
4. RPC endpoint to set/unset maintenance mode.

It's tempting to rely on the RPC endpoint to drive testing for all of these, 
but that's not necessary. You should be able to reach into an in-proc catalog 
manager's memory state to enable/disable maintenance mode, and to update 
corresponding in-memory state.

http://gerrit.cloudera.org:8080/#/c/14111/6/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/14111/6/src/kudu/master/master.proto@230
PS6, Line 230:   optional MaintenanceStatePB state = 1;
I see the allure of using MaintenenanceStatePB for both the persistent state 
and the RPC, but I'm not sure the usage is exactly the same. What would be the 
purpose of persisting a NONE entry? And wouldn't discrete actions (i.e. 
ENTER_MAINTENANCE_MODE and EXIT_MAINTENANCE_MODE) be more appropriate for an 
RPC, especially when considering something like decommissioning in the future?

Separately, consider adding more state to aid in troubleshooting. At the very 
least a timestamp that denotes when maintenance mode was entered.


http://gerrit.cloudera.org:8080/#/c/14111/6/src/kudu/master/master.proto@820
PS6, Line 820:   // TODO(awong): consider setting maintenance state in batches.
Yeah, how about making this a 'repeated' now? Otherwise we'd have to break 
backwards compat to do it, right?


http://gerrit.cloudera.org:8080/#/c/14111/6/src/kudu/master/master.proto@824
PS6, Line 824: message SetMaintenanceStateResponsePB {
In anticipation of decommissioning, perhaps we should call this 
SetTabletServerState? Or ChangeTabletServerState?


http://gerrit.cloudera.org:8080/#/c/14111/6/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/14111/6/src/kudu/master/sys_catalog.cc@469
PS6, Line 469:         Status::Corruption("One or more rows failed to write");
Is this ever more useful than returning the first row error?



--
To view, visit http://gerrit.cloudera.org:8080/14111
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia857668b87560cdd451c2e7f90d72f8217ba5a4b
Gerrit-Change-Number: 14111
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 30 Aug 2019 03:45:27 +0000
Gerrit-HasComments: Yes

Reply via email to