Re: KRaft controller number of replicas

2024-02-07 Thread Daniel Saiz
Thanks a lot folks, this is really helpful.

> I believe the limitation that this documentation is hinting at is the
> motivation for KIP-996

I'll make sure to check out KIP-996 and the references linked there.
Thanks for the summary as well, I really appreciate it.


Cheers,

Dani

On Tue, Feb 6, 2024 at 6:52 PM Michael K. Edwards  wrote:
>
> A 5-node quorum doesn't make a lot of sense in a setting where those nodes
> are also Kafka brokers.  When they're ZooKeeper voters, a quorum* of 5
> makes a lot of sense, because you can take an unscheduled voter failure
> during a rolling-reboot scheduled maintenance without significant service
> impact.  You can also spread the ZK quorum across multiple AZs (or your
> cloud's equivalent), which I would rarely recommend doing with Kafka.
>
> The trend in Kafka development and deployment is towards KRaft, and there
> is probably no percentage in bucking that trend.  Just don't expect it to
> cover every "worst realistic case" scenario that a ZK-based deployment can.
>
> Scheduled maintenance on an (N+2 for read integrity, N+1 to stay writable)
> system adds vulnerability, and that's just something you have to build into
> your risk model.  N+1 is good enough for finely partitioned data in any use
> case that Kafka fits, because resilvering after a maintenance or a full
> broker loss is highly parallel.  N+1 is also acceptable for consumer group
> coordinator metadata, as long as you tune for aggressive compaction; I
> haven't looked at whether the coordinator code does a good job of
> parallelizing metadata replay, but if it doesn't, there's no real
> difficulty in fixing that.  For global metadata that needs globally
> serialized replay, which is what the controller metadata is, I was a lot
> happier with N+2 to stay writable.  But that's water under the bridge, and
> I'm just a spectator.
>
> Regards,
> - Michael
>
>
> * I hate this misuse of the word "quorum", but what can one do?
>
>
> On Tue, Feb 6, 2024, 8:51 AM Greg Harris 
> wrote:
>
> > Hi Dani,
> >
> > I believe the limitation that this documentation is hinting at is the
> > motivation for KIP-996 [1], and the notice in the documentation would
> > be removed once KIP-996 lands.
> > You can read the KIP for a brief explanation and link to a more
> > in-depth explanation of the failure scenario.
> >
> > While a 3-node quorum would typically be less reliable or available
> > than a 5-node quorum, it happens to be resistant to this failure mode
> > which makes the additional controllers liabilities instead of assets.
> > In the judgement of the maintainers at least, the risk of a network
> > partition which could trigger unavailability in a 5-node quorum is
> > higher than the risk of a 2-controller failure in a 3-node quorum, so
> > 3-node quorums are recommended.
> > You could do your own analysis and practical testing to make this
> > tradeoff yourself in your network context.
> >
> > I hope this helps!
> > Greg
> >
> > [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-996%3A+Pre-Vote
> >
> > On Tue, Feb 6, 2024 at 4:25 AM Daniel Saiz
> >  wrote:
> > >
> > > Hello,
> > >
> > > I would like to clarify a statement I found in the KRaft documentation,
> > in
> > > the deployment section [1]:
> > >
> > > > More than 3 controllers is not recommended in critical environments. In
> > > the rare case of a partial network failure it is possible for the cluster
> > > metadata quorum to become unavailable. This limitation will be addressed
> > in
> > > a future release of Kafka.
> > >
> > > I would like to clarify what it's meant by that sentence, as intuitively
> > I
> > > don't see why 3 replicas would be better than 5 (or more) for fault
> > > tolerance.
> > > What is the current limitation this is referring to?
> > >
> > > Thanks a lot.
> > >
> > >
> > > Cheers,
> > >
> > > Dani
> > >
> > > [1] https://kafka.apache.org/36/documentation.html#kraft_deployment
> >


Re: KRaft controller number of replicas

2024-02-06 Thread Michael K. Edwards
A 5-node quorum doesn't make a lot of sense in a setting where those nodes
are also Kafka brokers.  When they're ZooKeeper voters, a quorum* of 5
makes a lot of sense, because you can take an unscheduled voter failure
during a rolling-reboot scheduled maintenance without significant service
impact.  You can also spread the ZK quorum across multiple AZs (or your
cloud's equivalent), which I would rarely recommend doing with Kafka.

The trend in Kafka development and deployment is towards KRaft, and there
is probably no percentage in bucking that trend.  Just don't expect it to
cover every "worst realistic case" scenario that a ZK-based deployment can.

Scheduled maintenance on an (N+2 for read integrity, N+1 to stay writable)
system adds vulnerability, and that's just something you have to build into
your risk model.  N+1 is good enough for finely partitioned data in any use
case that Kafka fits, because resilvering after a maintenance or a full
broker loss is highly parallel.  N+1 is also acceptable for consumer group
coordinator metadata, as long as you tune for aggressive compaction; I
haven't looked at whether the coordinator code does a good job of
parallelizing metadata replay, but if it doesn't, there's no real
difficulty in fixing that.  For global metadata that needs globally
serialized replay, which is what the controller metadata is, I was a lot
happier with N+2 to stay writable.  But that's water under the bridge, and
I'm just a spectator.

Regards,
- Michael


* I hate this misuse of the word "quorum", but what can one do?


On Tue, Feb 6, 2024, 8:51 AM Greg Harris 
wrote:

> Hi Dani,
>
> I believe the limitation that this documentation is hinting at is the
> motivation for KIP-996 [1], and the notice in the documentation would
> be removed once KIP-996 lands.
> You can read the KIP for a brief explanation and link to a more
> in-depth explanation of the failure scenario.
>
> While a 3-node quorum would typically be less reliable or available
> than a 5-node quorum, it happens to be resistant to this failure mode
> which makes the additional controllers liabilities instead of assets.
> In the judgement of the maintainers at least, the risk of a network
> partition which could trigger unavailability in a 5-node quorum is
> higher than the risk of a 2-controller failure in a 3-node quorum, so
> 3-node quorums are recommended.
> You could do your own analysis and practical testing to make this
> tradeoff yourself in your network context.
>
> I hope this helps!
> Greg
>
> [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-996%3A+Pre-Vote
>
> On Tue, Feb 6, 2024 at 4:25 AM Daniel Saiz
>  wrote:
> >
> > Hello,
> >
> > I would like to clarify a statement I found in the KRaft documentation,
> in
> > the deployment section [1]:
> >
> > > More than 3 controllers is not recommended in critical environments. In
> > the rare case of a partial network failure it is possible for the cluster
> > metadata quorum to become unavailable. This limitation will be addressed
> in
> > a future release of Kafka.
> >
> > I would like to clarify what it's meant by that sentence, as intuitively
> I
> > don't see why 3 replicas would be better than 5 (or more) for fault
> > tolerance.
> > What is the current limitation this is referring to?
> >
> > Thanks a lot.
> >
> >
> > Cheers,
> >
> > Dani
> >
> > [1] https://kafka.apache.org/36/documentation.html#kraft_deployment
>


Re: KRaft controller number of replicas

2024-02-06 Thread Greg Harris
Hi Dani,

I believe the limitation that this documentation is hinting at is the
motivation for KIP-996 [1], and the notice in the documentation would
be removed once KIP-996 lands.
You can read the KIP for a brief explanation and link to a more
in-depth explanation of the failure scenario.

While a 3-node quorum would typically be less reliable or available
than a 5-node quorum, it happens to be resistant to this failure mode
which makes the additional controllers liabilities instead of assets.
In the judgement of the maintainers at least, the risk of a network
partition which could trigger unavailability in a 5-node quorum is
higher than the risk of a 2-controller failure in a 3-node quorum, so
3-node quorums are recommended.
You could do your own analysis and practical testing to make this
tradeoff yourself in your network context.

I hope this helps!
Greg

[1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-996%3A+Pre-Vote

On Tue, Feb 6, 2024 at 4:25 AM Daniel Saiz
 wrote:
>
> Hello,
>
> I would like to clarify a statement I found in the KRaft documentation, in
> the deployment section [1]:
>
> > More than 3 controllers is not recommended in critical environments. In
> the rare case of a partial network failure it is possible for the cluster
> metadata quorum to become unavailable. This limitation will be addressed in
> a future release of Kafka.
>
> I would like to clarify what it's meant by that sentence, as intuitively I
> don't see why 3 replicas would be better than 5 (or more) for fault
> tolerance.
> What is the current limitation this is referring to?
>
> Thanks a lot.
>
>
> Cheers,
>
> Dani
>
> [1] https://kafka.apache.org/36/documentation.html#kraft_deployment


KRaft controller number of replicas

2024-02-06 Thread Daniel Saiz
Hello,

I would like to clarify a statement I found in the KRaft documentation, in
the deployment section [1]:

> More than 3 controllers is not recommended in critical environments. In
the rare case of a partial network failure it is possible for the cluster
metadata quorum to become unavailable. This limitation will be addressed in
a future release of Kafka.

I would like to clarify what it's meant by that sentence, as intuitively I
don't see why 3 replicas would be better than 5 (or more) for fault
tolerance.
What is the current limitation this is referring to?

Thanks a lot.


Cheers,

Dani

[1] https://kafka.apache.org/36/documentation.html#kraft_deployment