Re: Handling "uneven" network partitions

Guozhang Wang Mon, 14 Dec 2020 15:03:07 -0800

Hello Stig,

I think there's an ordering of the events here, e.g.:

T0: network partition happens.
T1: record-1 received at the leader, at this time the ISR is still 3.
Leader will accept this record and wait for it to be replicated.
T100: lafter some elapsed time the leader decides to kick other followers
out of ISR.
T101: record-2 received at leader, at this time the leader will reject the
produce request with not-enough-replica error code.

So that you see before T100, records reached on the leader are still being
accepted. Note that min.isr / acks configurations would not impact how the
ISR itself being managed.

Guozhang

On Fri, Dec 11, 2020 at 5:34 AM Stig Rohde Døssing <stigdoess...@gmail.com>
wrote:

>  Hi,
>
> We have a topic with min.insync.replicas = 2 where each partition is
> replicated to 3 nodes. We write to it using acks=all.
>
> We experienced a network malfunction, where leader node 1 could not reach
> replica 2 and 3, and vice versa. Nodes 2 and 3 could reach each other. The
> controller broker could reach all nodes, and external services could reach
> all nodes.
>
> What we saw was the ISR degrade to only node 1. Looking at the code, I see
> the ISR shrink when a replica has not caught up to the leader's LEO and it
> hasn't fetched for a while. My guess is the leader had messages that
> weren't yet replicated by the other nodes.
>
> Shouldn't min.insync.replicas = 2 and acks=all prevent the ISR shrinking to
> this size, since new writes should not be accepted unless they are
> replicated by at least 2 nodes?
>

-- 
-- Guozhang

Re: Handling "uneven" network partitions

Reply via email to