Will Berkeley has uploaded a new change for review. http://gerrit.cloudera.org:8080/6772
Change subject: [WIP] KUDU-1860: ksck doesn't identify tablets that are evicted but still in config ...................................................................... [WIP] KUDU-1860: ksck doesn't identify tablets that are evicted but still in config This patch enhances ksck to gather consensus info from every tablet. It compares this info with master and outputs the master's config and every conflicting config, if there are any conflicts. This checking is expensive because it requires gathering consensus info from every replica, so it is off by default. This will catch at least the two problems identified in KUDU-1860: 1. The leader has a pending config to remove a tablet, but it is not committed so the master does not see this config. This can hide an unhealthy tablet if, e.g., one pending config member is down and the pending-to-be-kicked-out member is up, so 1/2 replicas are alive in the leader's active config but the master thinks 2/3 are alive. 2. No replica is leader but the master believes there is a leader because its cache is old and hasn't been updated. Sample output: 1. https://gist.github.com/wdberkeley/aa97e9a108b57acacc4da6db0625445b 2. (problem #1) https://gist.github.com/wdberkeley/b8f9987caf8bd15ec4fd8fd345a217f9 WIP because it needs tests. Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808 --- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote.cc M src/kudu/tools/ksck_remote.h 5 files changed, 178 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/72/6772/1 -- To view, visit http://gerrit.cloudera.org:8080/6772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley <[email protected]>
