Mike Percy has submitted this change and it was merged. Change subject: KUDU-1860: ksck doesn't identify tablets that are evicted but still in config ......................................................................
KUDU-1860: ksck doesn't identify tablets that are evicted but still in config This patch enhances ksck to gather consensus info from every tablet. It compares this info with master and outputs the master's config and every conflicting config, if there are any conflicts. To do this efficiently it reimplements the GetAllConsensusState RPC so that it gathers info about every replica's consensus state. This will catch at least the two problems identified in KUDU-1860: 1. The leader has a pending config to remove a tablet, but it is not committed so the master does not see this config. This can hide an unhealthy tablet if, e.g., one pending config member is down and the pending-to-be-kicked-out member is up, so 1/2 replicas are alive in the leader's active config but the master thinks 2/3 are alive. 2. No replica is leader but the master believes there is a leader because its cache is old and hasn't been updated. Sample output showing #1: https://gist.github.com/wdberkeley/d2606698e4f2e8ca3ef70d4dcef7ba9a Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808 Reviewed-on: http://gerrit.cloudera.org:8080/6772 Tested-by: Kudu Jenkins Reviewed-by: Mike Percy <[email protected]> --- M src/kudu/consensus/consensus.proto M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote.cc M src/kudu/tools/ksck_remote.h M src/kudu/tools/tool_action_cluster.cc M src/kudu/tserver/tablet_replica_lookup.h M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/tablet_service.h M src/kudu/tserver/ts_tablet_manager.h 14 files changed, 521 insertions(+), 71 deletions(-) Approvals: Mike Percy: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]>
