Hello Tidy Bot, Alexey Serbin, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14111
to look at the new patch set (#4).
Change subject: KUDU-2069 pt 1: add a maintenance mode
......................................................................
KUDU-2069 pt 1: add a maintenance mode
When tablet server T is put in maintenance mode, replicas will not be
placed onto T, and failures of T will not be considered when determining
whether a given tablet is under-replicated.
This patch adds this mode with the following changes:
- A new master-side endpoint that enters maintenance mode is added:
- It plumbs in-memory maintenance states through the TSManager and the
TSDescriptors.
- It also writes a new kind of entry in the master system catalog for
maintenance states (for now, there's only maintenance mode, but this
could be useful for decommissioning).
- When a master becomes leader, it scans the on-disk state and
rebuilds the in-memory maintenance state.
- When determining whether a replica needs to be added, we may now
consider a "whitelist" of UUIDs that can be in a bad state while not
counting towards being under-replicated.
- When determining where to place new replicas, tablet servers in
maintenance mode are not considered.
- The same master-side endpoint is used to exit maintenance mode.
- To ensure that replicas that actually need to be replicated get
replicated upon finishing maintenance mode, when a tablet server is
removed from maintenance mode, the master will mark all tablet
servers as needing a full tablet report, triggering re-processing of
tablet reports.
This patch only introduces the master endpoints and the underlying
behavior. A later patch will introduce a way to set maintenance mode via
CLI.
I considered implementing maintenance mode by blocking master->tserver
RPCs, but opted to use this approach since it seems more intuitive for
the stopping of replica movement to exist in placement logic, (i.e. what
servers are available to host new replicas and what replicas needs to
be replaced), rather than the placement mechanism, (i.e. the handful of
RPCs that would need to be considered).
Change-Id: Ia857668b87560cdd451c2e7f90d72f8217ba5a4b
---
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/quorum_util-test.cc
M src/kudu/consensus/quorum_util.cc
M src/kudu/consensus/quorum_util.h
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/maintenance_mode-itest.cc
M src/kudu/master/CMakeLists.txt
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
A src/kudu/master/maintenance_state-test.cc
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
M src/kudu/master/ts_descriptor-test.cc
M src/kudu/master/ts_descriptor.cc
M src/kudu/master/ts_descriptor.h
M src/kudu/master/ts_manager.cc
M src/kudu/master/ts_manager.h
20 files changed, 1,089 insertions(+), 35 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/14111/4
--
To view, visit http://gerrit.cloudera.org:8080/14111
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia857668b87560cdd451c2e7f90d72f8217ba5a4b
Gerrit-Change-Number: 14111
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)