Hello Stig, I think there's an ordering of the events here, e.g.:
T0: network partition happens. T1: record-1 received at the leader, at this time the ISR is still 3. Leader will accept this record and wait for it to be replicated. T100: lafter some elapsed time the leader decides to kick other followers out of ISR. T101: record-2 received at leader, at this time the leader will reject the produce request with not-enough-replica error code. So that you see before T100, records reached on the leader are still being accepted. Note that min.isr / acks configurations would not impact how the ISR itself being managed. Guozhang On Fri, Dec 11, 2020 at 5:34 AM Stig Rohde Døssing <stigdoess...@gmail.com> wrote: > Hi, > > We have a topic with min.insync.replicas = 2 where each partition is > replicated to 3 nodes. We write to it using acks=all. > > We experienced a network malfunction, where leader node 1 could not reach > replica 2 and 3, and vice versa. Nodes 2 and 3 could reach each other. The > controller broker could reach all nodes, and external services could reach > all nodes. > > What we saw was the ISR degrade to only node 1. Looking at the code, I see > the ISR shrink when a replica has not caught up to the leader's LEO and it > hasn't fetched for a while. My guess is the leader had messages that > weren't yet replicated by the other nodes. > > Shouldn't min.insync.replicas = 2 and acks=all prevent the ISR shrinking to > this size, since new writes should not be accepted unless they are > replicated by at least 2 nodes? > -- -- Guozhang