Re: KRaft controller number of replicas
Thanks a lot folks, this is really helpful. > I believe the limitation that this documentation is hinting at is the > motivation for KIP-996 I'll make sure to check out KIP-996 and the references linked there. Thanks for the summary as well, I really appreciate it. Cheers, Dani On Tue, Feb 6, 2024 at 6:52 PM Michael K. Edwards wrote: > > A 5-node quorum doesn't make a lot of sense in a setting where those nodes > are also Kafka brokers. When they're ZooKeeper voters, a quorum* of 5 > makes a lot of sense, because you can take an unscheduled voter failure > during a rolling-reboot scheduled maintenance without significant service > impact. You can also spread the ZK quorum across multiple AZs (or your > cloud's equivalent), which I would rarely recommend doing with Kafka. > > The trend in Kafka development and deployment is towards KRaft, and there > is probably no percentage in bucking that trend. Just don't expect it to > cover every "worst realistic case" scenario that a ZK-based deployment can. > > Scheduled maintenance on an (N+2 for read integrity, N+1 to stay writable) > system adds vulnerability, and that's just something you have to build into > your risk model. N+1 is good enough for finely partitioned data in any use > case that Kafka fits, because resilvering after a maintenance or a full > broker loss is highly parallel. N+1 is also acceptable for consumer group > coordinator metadata, as long as you tune for aggressive compaction; I > haven't looked at whether the coordinator code does a good job of > parallelizing metadata replay, but if it doesn't, there's no real > difficulty in fixing that. For global metadata that needs globally > serialized replay, which is what the controller metadata is, I was a lot > happier with N+2 to stay writable. But that's water under the bridge, and > I'm just a spectator. > > Regards, > - Michael > > > * I hate this misuse of the word "quorum", but what can one do? > > > On Tue, Feb 6, 2024, 8:51 AM Greg Harris > wrote: > > > Hi Dani, > > > > I believe the limitation that this documentation is hinting at is the > > motivation for KIP-996 [1], and the notice in the documentation would > > be removed once KIP-996 lands. > > You can read the KIP for a brief explanation and link to a more > > in-depth explanation of the failure scenario. > > > > While a 3-node quorum would typically be less reliable or available > > than a 5-node quorum, it happens to be resistant to this failure mode > > which makes the additional controllers liabilities instead of assets. > > In the judgement of the maintainers at least, the risk of a network > > partition which could trigger unavailability in a 5-node quorum is > > higher than the risk of a 2-controller failure in a 3-node quorum, so > > 3-node quorums are recommended. > > You could do your own analysis and practical testing to make this > > tradeoff yourself in your network context. > > > > I hope this helps! > > Greg > > > > [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-996%3A+Pre-Vote > > > > On Tue, Feb 6, 2024 at 4:25 AM Daniel Saiz > > wrote: > > > > > > Hello, > > > > > > I would like to clarify a statement I found in the KRaft documentation, > > in > > > the deployment section [1]: > > > > > > > More than 3 controllers is not recommended in critical environments. In > > > the rare case of a partial network failure it is possible for the cluster > > > metadata quorum to become unavailable. This limitation will be addressed > > in > > > a future release of Kafka. > > > > > > I would like to clarify what it's meant by that sentence, as intuitively > > I > > > don't see why 3 replicas would be better than 5 (or more) for fault > > > tolerance. > > > What is the current limitation this is referring to? > > > > > > Thanks a lot. > > > > > > > > > Cheers, > > > > > > Dani > > > > > > [1] https://kafka.apache.org/36/documentation.html#kraft_deployment > >
Re: KRaft controller number of replicas
A 5-node quorum doesn't make a lot of sense in a setting where those nodes are also Kafka brokers. When they're ZooKeeper voters, a quorum* of 5 makes a lot of sense, because you can take an unscheduled voter failure during a rolling-reboot scheduled maintenance without significant service impact. You can also spread the ZK quorum across multiple AZs (or your cloud's equivalent), which I would rarely recommend doing with Kafka. The trend in Kafka development and deployment is towards KRaft, and there is probably no percentage in bucking that trend. Just don't expect it to cover every "worst realistic case" scenario that a ZK-based deployment can. Scheduled maintenance on an (N+2 for read integrity, N+1 to stay writable) system adds vulnerability, and that's just something you have to build into your risk model. N+1 is good enough for finely partitioned data in any use case that Kafka fits, because resilvering after a maintenance or a full broker loss is highly parallel. N+1 is also acceptable for consumer group coordinator metadata, as long as you tune for aggressive compaction; I haven't looked at whether the coordinator code does a good job of parallelizing metadata replay, but if it doesn't, there's no real difficulty in fixing that. For global metadata that needs globally serialized replay, which is what the controller metadata is, I was a lot happier with N+2 to stay writable. But that's water under the bridge, and I'm just a spectator. Regards, - Michael * I hate this misuse of the word "quorum", but what can one do? On Tue, Feb 6, 2024, 8:51 AM Greg Harris wrote: > Hi Dani, > > I believe the limitation that this documentation is hinting at is the > motivation for KIP-996 [1], and the notice in the documentation would > be removed once KIP-996 lands. > You can read the KIP for a brief explanation and link to a more > in-depth explanation of the failure scenario. > > While a 3-node quorum would typically be less reliable or available > than a 5-node quorum, it happens to be resistant to this failure mode > which makes the additional controllers liabilities instead of assets. > In the judgement of the maintainers at least, the risk of a network > partition which could trigger unavailability in a 5-node quorum is > higher than the risk of a 2-controller failure in a 3-node quorum, so > 3-node quorums are recommended. > You could do your own analysis and practical testing to make this > tradeoff yourself in your network context. > > I hope this helps! > Greg > > [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-996%3A+Pre-Vote > > On Tue, Feb 6, 2024 at 4:25 AM Daniel Saiz > wrote: > > > > Hello, > > > > I would like to clarify a statement I found in the KRaft documentation, > in > > the deployment section [1]: > > > > > More than 3 controllers is not recommended in critical environments. In > > the rare case of a partial network failure it is possible for the cluster > > metadata quorum to become unavailable. This limitation will be addressed > in > > a future release of Kafka. > > > > I would like to clarify what it's meant by that sentence, as intuitively > I > > don't see why 3 replicas would be better than 5 (or more) for fault > > tolerance. > > What is the current limitation this is referring to? > > > > Thanks a lot. > > > > > > Cheers, > > > > Dani > > > > [1] https://kafka.apache.org/36/documentation.html#kraft_deployment >
Re: KRaft controller number of replicas
Hi Dani, I believe the limitation that this documentation is hinting at is the motivation for KIP-996 [1], and the notice in the documentation would be removed once KIP-996 lands. You can read the KIP for a brief explanation and link to a more in-depth explanation of the failure scenario. While a 3-node quorum would typically be less reliable or available than a 5-node quorum, it happens to be resistant to this failure mode which makes the additional controllers liabilities instead of assets. In the judgement of the maintainers at least, the risk of a network partition which could trigger unavailability in a 5-node quorum is higher than the risk of a 2-controller failure in a 3-node quorum, so 3-node quorums are recommended. You could do your own analysis and practical testing to make this tradeoff yourself in your network context. I hope this helps! Greg [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-996%3A+Pre-Vote On Tue, Feb 6, 2024 at 4:25 AM Daniel Saiz wrote: > > Hello, > > I would like to clarify a statement I found in the KRaft documentation, in > the deployment section [1]: > > > More than 3 controllers is not recommended in critical environments. In > the rare case of a partial network failure it is possible for the cluster > metadata quorum to become unavailable. This limitation will be addressed in > a future release of Kafka. > > I would like to clarify what it's meant by that sentence, as intuitively I > don't see why 3 replicas would be better than 5 (or more) for fault > tolerance. > What is the current limitation this is referring to? > > Thanks a lot. > > > Cheers, > > Dani > > [1] https://kafka.apache.org/36/documentation.html#kraft_deployment
KRaft controller number of replicas
Hello, I would like to clarify a statement I found in the KRaft documentation, in the deployment section [1]: > More than 3 controllers is not recommended in critical environments. In the rare case of a partial network failure it is possible for the cluster metadata quorum to become unavailable. This limitation will be addressed in a future release of Kafka. I would like to clarify what it's meant by that sentence, as intuitively I don't see why 3 replicas would be better than 5 (or more) for fault tolerance. What is the current limitation this is referring to? Thanks a lot. Cheers, Dani [1] https://kafka.apache.org/36/documentation.html#kraft_deployment