Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14111 )
Change subject: KUDU-2069 pt 1: add a maintenance mode ...................................................................... Patch Set 6: (4 comments) I started to review this, but it's a pretty big patch with several different concerns. How about breaking it up like this: 1. New persistent state. I can think of a few different ways to model this, so a focused review on just that would be nice. 2. In-memory state tracking changes. 3. Wiring up the state to do something useful (i.e. on heartbeating or other triggers). 4. RPC endpoint to set/unset maintenance mode. It's tempting to rely on the RPC endpoint to drive testing for all of these, but that's not necessary. You should be able to reach into an in-proc catalog manager's memory state to enable/disable maintenance mode, and to update corresponding in-memory state. http://gerrit.cloudera.org:8080/#/c/14111/6/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/14111/6/src/kudu/master/master.proto@230 PS6, Line 230: optional MaintenanceStatePB state = 1; I see the allure of using MaintenenanceStatePB for both the persistent state and the RPC, but I'm not sure the usage is exactly the same. What would be the purpose of persisting a NONE entry? And wouldn't discrete actions (i.e. ENTER_MAINTENANCE_MODE and EXIT_MAINTENANCE_MODE) be more appropriate for an RPC, especially when considering something like decommissioning in the future? Separately, consider adding more state to aid in troubleshooting. At the very least a timestamp that denotes when maintenance mode was entered. http://gerrit.cloudera.org:8080/#/c/14111/6/src/kudu/master/master.proto@820 PS6, Line 820: // TODO(awong): consider setting maintenance state in batches. Yeah, how about making this a 'repeated' now? Otherwise we'd have to break backwards compat to do it, right? http://gerrit.cloudera.org:8080/#/c/14111/6/src/kudu/master/master.proto@824 PS6, Line 824: message SetMaintenanceStateResponsePB { In anticipation of decommissioning, perhaps we should call this SetTabletServerState? Or ChangeTabletServerState? http://gerrit.cloudera.org:8080/#/c/14111/6/src/kudu/master/sys_catalog.cc File src/kudu/master/sys_catalog.cc: http://gerrit.cloudera.org:8080/#/c/14111/6/src/kudu/master/sys_catalog.cc@469 PS6, Line 469: Status::Corruption("One or more rows failed to write"); Is this ever more useful than returning the first row error? -- To view, visit http://gerrit.cloudera.org:8080/14111 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia857668b87560cdd451c2e7f90d72f8217ba5a4b Gerrit-Change-Number: 14111 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 30 Aug 2019 03:45:27 +0000 Gerrit-HasComments: Yes
