Jean-Daniel Cryans has submitted this change and it was merged.
Change subject: KUDU-1516 ksck should check for more raft-related status issues
KUDU-1516 ksck should check for more raft-related status issues (partial)
This patch improves ksck. The main way it does so is by adding "tablet
server POV" information. ksck now gathers information about tablet
replicas from the tablet servers and cross-references this information
with the master metadata. This adds the following checks:
* each tablet has a majority of replicas on live tablet servers
* if a tablet has a majority of replicas on a live tablet
server, then a majority of its tablets are in RUNNING state
* the assignments of tablets to tablet servers in the master agrees with
the assignment of tablet replicas reported by the tablet servers
This patch does not include other desiderata from KUDU-1516, like a consensus
canary or a write op canary.
The code is also restructured quite a bit, so that all of the "fetch
information from tablet servers" work happens up front in a single call. This
paves the way a bit for a future enhancement in which all of these RPCs are
done on a thread-pool (since it can be somewhat slow for large clusters).
To try to improve performance for clusters with a lot of data, I also added a
flag to the ListTablets RPC so that the response does not include schema
information, which is both large and irrelevant for this use case.
An example of the new output against a cluster with some dead tablet servers
and broken tablets is available at:
This patch is based on some earlier work by Will Berkeley.
Tested-by: Kudu Jenkins
Reviewed-by: Jean-Daniel Cryans <jdcry...@apache.org>
12 files changed, 402 insertions(+), 130 deletions(-)
Jean-Daniel Cryans: Looks good to me, approved
Kudu Jenkins: Verified
To view, visit http://gerrit.cloudera.org:8080/3632
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>