[ https://issues.apache.org/jira/browse/KUDU-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840367#comment-15840367 ]
Todd Lipcon commented on KUDU-1731: ----------------------------------- [~mpercy] can you explain this one further? How is this different than what we do today by evicting a replica who has fallen too far behind the WAL retention? Maybe this is more about doing the "3->4->3" type of config change, along with PRE-VOTER? ie if a node is slow (seems to be falling farther behind) but still alive, we could try to recruit the pre-voter _before_ evicting the slow node? Let's fill out this JIRA with some more specifics, and/or link to a design doc where we cover all of the various backlog for improving re-replication. > Evict replicas that are alive but lagging > ----------------------------------------- > > Key: KUDU-1731 > URL: https://issues.apache.org/jira/browse/KUDU-1731 > Project: Kudu > Issue Type: Bug > Components: consensus > Reporter: Mike Percy > > In the case that a replica is consistently behind the other replicas, we may > be able to detect that the node is slow and evict it. (Currently under high > write load we very often degrade to all tablets having just two live replicas > and one COPYING) > * In fact, this would also be useful for leaders that are significantly > slower than their followers. > * We should instead recruit a new replica and only evict the lagging one once > the new one is online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)