[kudu-CR] [consensus] Add consensus op-level lag metrics

Mike Percy (Code Review) Thu, 23 Mar 2017 18:38:39 -0700

Mike Percy has posted comments on this change.

Change subject: [consensus] Add consensus op-level lag metrics
......................................................................



Patch Set 10:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6451/10/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

Line 343:   // The highest index that is known to be replicated by all members 
of
Ha, nice catch


http://gerrit.cloudera.org:8080/#/c/6451/10/src/kudu/consensus/consensus_queue-test.cc
File src/kudu/consensus/consensus_queue-test.cc:

Line 253:   page_size_estimator.set_last_idx_appended_to_leader(0);
I'm not sure of the purpose of the changes to this test. Does this mean this 
field is now required by the server? If so, this patch will break rolling 
upgrades (which are a nice-to-have when possible). Maybe we should provide a 
default value of 0 for this field so it doesn't have to be specified.


http://gerrit.cloudera.org:8080/#/c/6451/10/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

PS10, Line 88: is
believes it is


Line 156:   queue_state_.last_idx_appended_to_leader = 
queue_state_.last_appended.index();
Maybe we should not track this in leader mode and instead do this 
re-initialization in NonLeaderMode(). It should be safe to assume (for our 
purposes with this metric) that the leader must be ahead of us in order to be 
the leader to we can move this line to line 183 or so to imply that our lag 
between the time we are demoted to follower and the time we get our next 
message from the leader would be 0.


PS10, Line 301:     // If LEADER, update the index of the last appended 
operation.
              :     queue_state_.last_idx_appended_to_leader = last_id.index();
Seems strange to me to track this at all as the leader; see below for a 
suggestion.


PS10, Line 380: last_idx_appended_to_leader
last_appended.index()


Line 603:   queue_state_.last_idx_appended_to_leader = 
std::max({queue_state_.last_idx_appended_to_leader,
Based on the changes we have made, I don't think we want the max() anymore 
because we can assume that the leader is always right, and the latest message 
received from the leader should contain the correct value. The reason max() is 
a problem is because if there is a leader change and the new leader truncates 
the log then our metric will be inaccurate until the leader catches up to the 
index of the previous leader.

I also find it a little bizarre that in this file we call 
UpdateLagMetrics(queue_state_.last_idx_appended_to_leader) only to again 
compare that value with queue_state_.last_idx_appended_to_leader

I think it would make more sense to have 2 methods here:

    void PeerMessageQueue::UpdateLagMetrics() {
      // Don't update queue_state_, only set the metric.
    }

    void PeerMessageQueue::UpdateLastIndexAppendedToLeader(int64_t index) {
      queue_state_.last_idx_appended_to_leader = index;
      UpdateLagMetrics();
    }

That way, from within the PeerMessageQueue class we would only call 
UpdateLagMetrics() and from RaftConsensus (and tests) we would call 
UpdateLastIndexAppendedToLeader(...).


PS10, Line 605:   metrics_.num_ops_behind_leader->set_value(
              :       queue_state_.last_idx_appended_to_leader - 
queue_state_.last_appended.index());
How about:

    int64_t num_ops_behind_leader = queue_state_.mode == LEADER ? 0 : 
queue_state_.last_idx_appended_to_leader - queue_state_.last_appended.index());
    metrics_.num_ops_behind_leader->set_value(num_ops_behind_leader);


-- 
To view, visit http://gerrit.cloudera.org:8080/6451
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ida8e992cc2397ca8d5873e62961a65f618d52c36
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [consensus] Add consensus op-level lag metrics

Reply via email to