Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14223
Change subject: KUDU-2069 p5: recheck tablet health when exiting maintenance mode ...................................................................... KUDU-2069 p5: recheck tablet health when exiting maintenance mode Previously, when exiting maintenance mode for a given tserver, if the replicas of that tserver were unhealthy, there was no mechanism with which to guarantee that the proper re-replication would happen. Specifically, the following sequence of events was possible: 1. tablet T has replicas on tservers A, B*, C 2. A enters maintenance mode 3. A is shut down 4. enough time passes for B* to consider A as failed 5. B* notices the failure of A and reports to the master that replica A has failed 6. the master does nothing to schedule re-replication because A is in maintenance mode 7. A exits maintenance mode, but is not brought back online 8. B* never hears back from A, and never hits a health state change to report to the master, and so the master never "rechecks" the health of T 9. T is left under-replicated with only B* and C Note: The set of tservers we need to recheck is the set that hosted a leader of any replica on A. This patch addresses this by requesting a full tablet report from every tserver in the cluster upon exiting maintenance mode on any tserver. While somewhat hamfisted, on a reasonably dense cluster, it doesn't seem unlikely that every tserver might be hosting a leader. Testing: - this adds to the existing integration test for maintenance mode to exercise the new behavior Change-Id: Ic0ab3d78cbc5b1228c01592a00118f11f76e43dd --- M src/kudu/integration-tests/maintenance_mode-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/master_service.cc M src/kudu/master/ts_descriptor.cc M src/kudu/master/ts_descriptor.h M src/kudu/master/ts_manager.cc M src/kudu/master/ts_manager.h 7 files changed, 65 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/14223/1 -- To view, visit http://gerrit.cloudera.org:8080/14223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic0ab3d78cbc5b1228c01592a00118f11f76e43dd Gerrit-Change-Number: 14223 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong <[email protected]>
