Todd Lipcon has submitted this change and it was merged.
Change subject: KUDU-763 consensus queue metrics on followers are messed up
KUDU-763 consensus queue metrics on followers are messed up
On follower tablet replicas, the majority_done_ops and
in_progress_ops metrics are wrong.
majority_done_ops = committed_index - all_replicated_opid
in_progress_ops = last_appended - committed_index
There are two reasons why:
1) followers do not update their consensus queue's committed index
2) followers do not maintain a correct value for all_replicated_opid,
since their queues generally only track the local peer and the leader
does not notify followers when ops are all-replicated.
This patch fixes 1 by having consensus notify the follower queues of
the updated committed index when the consensus committed index is
updated. This makes in_progress_ops meaningful for followers. Note
that a follower queue's committed index is not used for anything
besides the metrics.
Fixing 2 would require having the leader notify followers when
operations are all-replicated. This isn't needed for consensus, and
would be used by the followers just for the majority_done_ops metric,
so I think it's best just to zero the metric for followers and
document that it is not meaningful in that case.
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <t...@apache.org>
4 files changed, 54 insertions(+), 9 deletions(-)
Todd Lipcon: Looks good to me, approved
Kudu Jenkins: Verified
To view, visit http://gerrit.cloudera.org:8080/3501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>