[ 
https://issues.apache.org/jira/browse/KUDU-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840367#comment-15840367
 ] 

Todd Lipcon commented on KUDU-1731:
-----------------------------------

[~mpercy] can you explain this one further? How is this different than what we 
do today by evicting a replica who has fallen too far behind the WAL retention?

Maybe this is more about doing the "3->4->3" type of config change, along with 
PRE-VOTER? ie if a node is slow (seems to be falling farther behind) but still 
alive, we could try to recruit the pre-voter _before_ evicting the slow node? 
Let's fill out this JIRA with some more specifics, and/or link to a design doc 
where we cover all of the various backlog for improving re-replication.

> Evict replicas that are alive but lagging
> -----------------------------------------
>
>                 Key: KUDU-1731
>                 URL: https://issues.apache.org/jira/browse/KUDU-1731
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>            Reporter: Mike Percy
>
> In the case that a replica is consistently behind the other replicas, we may 
> be able to detect that the node is slow and evict it. (Currently under high 
> write load we very often degrade to all tablets having just two live replicas 
> and one COPYING)
> * In fact, this would also be useful for leaders that are significantly 
> slower than their followers.
> * We should instead recruit a new replica and only evict the lagging one once 
> the new one is online.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to