Hello Alexey Serbin, Kudu Jenkins, Adar Dembo, Hao Hao,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14223
to look at the new patch set (#6).
Change subject: KUDU-2069 p5: recheck tablet health when exiting maintenance
mode
......................................................................
KUDU-2069 p5: recheck tablet health when exiting maintenance mode
Previously, when exiting maintenance mode for a given tserver, if the
replicas of that tserver were unhealthy, there was no mechanism with
which to guarantee that the proper re-replication would happen.
Specifically, the following sequence of events was possible:
1. tablet T has replicas on tservers A, B*, C
2. A enters maintenance mode
3. A is shut down
4. enough time passes for B* to consider A as failed
5. B* notices the failure of A and reports to the master that replica A
has failed
6. the master does nothing to schedule re-replication because A is in
maintenance mode
7. A exits maintenance mode, but is not brought back online
8. B* never hears back from A, and never hits a health state change to
report to the master, and so the master never "rechecks" the health
of T
9. T is left under-replicated with only B* and C
Note: The set of tservers we need to recheck is the set that hosted a
leader of any replica on A.
This patch addresses this by requesting a full tablet report from every
tserver in the cluster upon exiting maintenance mode on any tserver.
Testing:
- this adds to the existing integration test for maintenance mode to
exercise the new behavior
- amends an existing concurrency test to verify the correct locking
behavior is used
Change-Id: Ic0ab3d78cbc5b1228c01592a00118f11f76e43dd
---
M src/kudu/integration-tests/maintenance_mode-itest.cc
M src/kudu/master/master_service.cc
M src/kudu/master/ts_descriptor.cc
M src/kudu/master/ts_descriptor.h
M src/kudu/master/ts_manager.cc
M src/kudu/master/ts_manager.h
M src/kudu/master/ts_state-test.cc
7 files changed, 74 insertions(+), 2 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/14223/6
--
To view, visit http://gerrit.cloudera.org:8080/14223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic0ab3d78cbc5b1228c01592a00118f11f76e43dd
Gerrit-Change-Number: 14223
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)