As I understand it, regardless of whether the partitioning strategy is random or hashed, a producer will eventually try to write a message to one of the partitions that is unavailable (as defined by our acks=-1 and min.isr=2 settings).
I suppose that the random partitioner, with enough retries, will eventually find an available partition. But that would certainly be a costly way of finding an available partition. On Tue, Oct 14, 2014 at 3:05 PM, cac...@gmail.com <cac...@gmail.com> wrote: > Wouldn't this work only for producers using random partitioning? > > On Tue, Oct 14, 2014 at 1:51 PM, Kyle Banker <kyleban...@gmail.com> wrote: > > > Consider a 12-node Kafka cluster with a 200-parition topic having a > > replication factor of 3. Let's assume, in addition, that we're running > > Kafka v0.8.2, we've disabled unclean leader election, acks is -1, and > > min.isr is 2. > > > > Now suppose we lose 2 nodes. In this case, there's a good chance that 2/3 > > replicas of one or more partitions will be unavailable. This means that > > messages assigned to those partitions will not be writable. If we're > > writing a large number of messages, I would expect that all producers > would > > eventually halt. It is somewhat surprising that, if we rely on a basic > > durability setting, the cluster would likely be unavailable even after > > losing only 2 / 12 nodes. > > > > It might be useful in this scenario for the producer to be able to detect > > which partitions are no longer available and reroute messages that would > > have hashed to the unavailable partitions (as defined by our acks and > > min.isr settings). This way, the cluster as a whole would remain > available > > for writes at the cost of a slightly higher load on the remaining > machines. > > > > Is this limitation accurately described? Is the proposed producer > > functionality worth pursuing? > > >