Yes, it is true that if all replicas fall out of isr, ack with -1 is the
same as 1. Normally, we don't expect replicas to fall out of isr though.
You may want to read
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowtoreducechurnsinISR?WhendoesabrokerleavetheISR?
to see how to minimize that.

Thanks,

Jun




On Mon, Jul 14, 2014 at 6:36 AM, Jiang Wu (Pricehistory) (BLOOMBERG/ 731
LEX -) <jwu...@bloomberg.net> wrote:

> Hi Jay,
> Thanks for explaining the lag detection mechanism. I think my real concern
> is from the description of request.required.acks=-1 from kafka's document:
> "-1, which means that the producer gets an acknowledgement after all
> in-sync replicas have received the data. This option provides the best
> durability, we guarantee that no messages will be lost as long as at least
> one in sync replica remains."
> Since it states that acks=-1 provides the best durability, I had thought
> it's equivalent to acks=3 for a topic with replicas 3. My understanding is
> that, acks=3 provides the best durability for such a topic, better than
> ack=2 and ack=1. But because followers may fail out of sync, acks=-1
> actually provides the same level of durability as acks=1. It seems to me
> there's inconsistency between the behavior of ack=-1 and its description,
> therefore one of them may need to be modified.
>
> Regards,
> Jiang
>
> From: users@kafka.apache.org At: Jul 11 2014 18:27:46
> To: JIANG WU (PRICEHISTORY) (BLOOMBERG/ 731 LEX -), users@kafka.apache.org
> Cc: wangg...@gmail.com
> Subject: Re: request.required.acks=-1 under high data volume
>
> I think the root problem is that replicas are falling behind and hence
> are effectively "failed" under normal load and also that you have
> unclean leader election enabled which "solves" this catastrophic
> failure by electing new leaders without complete data.
>
> Starting in 0.8.2 you will be able to selectively disable unclean
> leader election.
>
> The root problem for the spuriously failing replicas is the
> configuration replica.lag.max.messages. This configuration defaults to
> 4000. But throughput can be really high, like a million messages per
> second. At a million messages per second, 4k messages of lag is only
> 4ms behind, which can happen for all kinds of reasons (e.g. just
> normal linux i/o latency jitter).
>
> Jiang, I suspect you can resolve your issue by just making this higher.
>
> However, raising this setting is not a panacea. The higher you raise
> it the longer it will take to detect a partition that is actually
> falling behind.
>
> We have been discussing this setting, and if you think about it the
> setting is actually somewhat impossible to set right in a cluster
> which has both low volume and high volume topics/partitions. For the
> low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.
>
> -Jay
>
>
> On Fri, Jul 11, 2014 at 2:55 PM, Jiang Wu (Pricehistory) (BLOOMBERG/
> 731 LEX -) <jwu...@bloomberg.net> wrote:
> > Hi Guozhang,
> >
> > KAFKA-1537 is created.
> https://issues.apache.org/jira/i#browse/KAFKA-1537
> >
> > I'll try to see if I'm able to submit a patch for this, but cannot
> commit a date, so please feel free to assign it to others.
> >
> > Regards,
> > Jiang
> > ----- Original Message -----
> > From: wangg...@gmail.com
> > To: JIANG WU (PRICEHISTORY) (BLOOMBERG/ 731 LEX -),
> users@kafka.apache.org
> > At: Jul 11 2014 16:42:55
> >
> > Hello Jiang,
> >
> > That is a valid point. The reason we design ack=-1 to be "receive acks
> from
> > replicas in ISR" is basically trading consistency for availability. I
> think
> > instead of change it meaning, we could add another ack, -2 for instance,
> to
> > specify "receive acks from all replicas" as a favor of consistency.
> >
> > Since you already did this much investigation would you like to file a
> JIRA
> > and submit a patch for this?
> >
> > Guozhang
> >
> >
> > On Fri, Jul 11, 2014 at 11:49 AM, Jiang Wu (Pricehistory) (BLOOMBERG/ 731
> > LEX -) <jwu...@bloomberg.net> wrote:
> >
> >> Hi,
> >> I'm doing stress and failover tests on a 3 node 0.8.1.1 kafka cluster
> and
> >> have the following observations.
> >>
> >> A topic is created with 1 partition and 3 replications.
> >> request.required.acks is set to -1 for a sync producer. When the
> publishing
> >> speed is high (3M messages, each 2000 bytes, published in lists of size
> >> 2000), the two followers will fail out of sync. Only the leader remains
> in
> >> ISR. But the producer can keep sending. If the leader is killed with
> CTR_C,
> >> one follower will become leader, but message loss will happen because of
> >> the unclean leader election.
> >>
> >> In the same test, request.required.acks=3 gives the desired result.
> >> Followers will fail out of sync, but the producer will be blocked untill
> >> all followers back to ISR. No data loss is observed in this case.
> >>
> >> From the code, this turns out to be how it's designed:
> >> if ((requiredAcks < 0 && numAcks >= inSyncReplicas.size) ||
> >> (requiredAcks > 0 && numAcks >= requiredAcks)) {
> >> /*
> >> * requiredAcks < 0 means acknowledge after all replicas in ISR
> >> * are fully caught up to the (local) leader's offset
> >> * corresponding to this produce request.
> >> */
> >> (true, ErrorMapping.NoError)
> >> }
> >>
> >> I'm wondering if it's more reasonable to let request.required.acks=-1
> mean
> >> "receive acks from all replicas" instead of "receive acks from replicas
> in
> >> ISR"? As in the above test, follower will fail out sync under high
> >> publishing volume; that makes request.required.acks=-1 equivalent to
> >> request.required.acks=1. Since the kafka document states
> >> request.required.acks=-1 provides the best durability, one would expect
> it
> >> is equivalent to request.required.acks=number_of_replications.
> >>
> >> Regards,
> >> Jiang
> >
> >
> >
> >
> > --
> > -- Guozhang
>
>
>

Reply via email to