David Ribeiro Alves has posted comments on this change.

Change subject: WIP: KUDU-1506 Add consensus lag metrics
......................................................................


Patch Set 6:

We could probably do the time thing by tracking assigned timestamps in addition 
to indexes.

The problem is that it's largely arbitrary. A replica might be lagging by a 
little time and have thousands of ops in the queue, or be lagging by a large 
chunk of time but have a relatively small amount of ops in the queue.

I think the problem we set out to solve here is to give users some insight into 
whether a replica is lagging, and this cannot be conveyed by a single number, 
its something that needs to be tracked over time to have meaning. That is a 
user won't care about or even understand that a replica is lagging by 1000 ops, 
or that it's lagging by 5 minutes (where 5 mins is the timestamp diff between 
the last appended op on the leader and the last received op by the replica). It 
cares whether this number goes down over time (replica is catching up) or 
whether it goes up over time (replica won't ever catch up).

To this point how accurate we are in defining this number is largely irrelevant 
as long as we do it consistently.

-- 
To view, visit http://gerrit.cloudera.org:8080/6451
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ida8e992cc2397ca8d5873e62961a65f618d52c36
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: No

Reply via email to