Re: how to ensure strong consistency with reasonable availability

Jun Rao Thu, 24 Jul 2014 10:06:06 -0700

With ack=-1, the producer is guaranteed to receive an error when publishing
m2 in the above case.


Thanks,

Jun


On Thu, Jul 24, 2014 at 9:46 AM, Scott Clasen <[email protected]> wrote:

> Thanks for the Jira info.
>
> Just to clarify, in the case we are outlining above, would the producer
> would have received an ack on m2 (with acks = -1) or not?  If not, then I
> have no concerns, if so, then how would the producer know to re-publish?
>
>
> On Thu, Jul 24, 2014 at 9:38 AM, Jun Rao <[email protected]> wrote:
>
> > About re-publishing m2. it seems it's better to let the producer choose
> > whether to do this or not.
> >
> > There is another known bug KAFKA-1211 that's not fixed yet. The situation
> > when this can happen is relatively rare and the fix is slightly involved.
> > So, it may not be addressed in 0.8.2.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Jul 22, 2014 at 8:36 PM, scott@heroku <[email protected]> wrote:
> >
> > > Thanks so much for the detailed explanation Jun, it pretty much lines
> up
> > > with my understanding.
> > >
> > > In the case below, if we didn't particularly care about ordering and
> > > re-produced m2, it would then become m5, and in many use cases this
> would
> > > be ok?
> > >
> > > Perhaps a more direct question would be, once 0.8.2 is out and I have a
> > > topic with unclean leader election disabled, and produce with acks =
> -1,
> > >  are there any known series of events (other than disk failures on all
> > > brokers) that would cause the loss of messages that a producer has
> > received
> > > an ack for?
> > >
> > >
> > >
> > >
> > >
> > > Sent from my iPhone
> > >
> > > > On Jul 22, 2014, at 8:17 PM, Jun Rao <[email protected]> wrote:
> > > >
> > > > They key point is that we have to keep all replicas consistent with
> > each
> > > > other such that no matter which replica a consumer reads from, it
> > always
> > > > reads the same data on a given offset.  The following is an example.
> > > >
> > > > Suppose we have 3 brokers A, B and C. Let's assume A is the leader
> and
> > at
> > > > some point, we have the following offsets and messages in each
> replica.
> > > >
> > > > offset   A   B   C
> > > > 1        m1  m1  m1
> > > > 2        m2
> > > >
> > > > Let's assume that message m1 is committed and message m2 is not. At
> > > exactly
> > > > this moment, replica A dies. After a new leader is elected, say B,
>  new
> > > > messages can be committed with just replica B and C. Some point later
> > if
> > > we
> > > > commit two more messages m3 and m4, we will have the following.
> > > >
> > > > offset   A   B   C
> > > > 1        m1  m1  m1
> > > > 2        m2  m3  m3
> > > > 3            m4  m4
> > > >
> > > > Now A comes back. For consistency, it's important for A's log to be
> > > > identical to B and C. So, we have to remove m2 from A's log and add
> m3
> > > and
> > > > m4. As you can see, whether you want to republish m2 or not, m2
> cannot
> > > stay
> > > > in its current offset, since in other replicas, that offset is
> already
> > > > taken by other messages. Therefore, a truncation of replica A's log
> is
> > > > needed to keep the replicas consistent. Currently, we don republish
> > > > messages like m2 since (1) it's not necessary since it's never
> > considered
> > > > committed; (2) it will make our protocol more complicated.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > >
> > > >
> > > >> On Tue, Jul 22, 2014 at 3:40 PM, scott@heroku <[email protected]>
> > wrote:
> > > >>
> > > >> Thanks Jun
> > > >>
> > > >> Can you explain a little more about what an uncommitted message
> means?
> > > >> The messages are in the log so presumably? they have been acked at
> > least
> > > >> by the the local broker.
> > > >>
> > > >> I guess I am hoping for some intuition around why 'replaying' the
> > > messages
> > > >> in question would cause bad things.
> > > >>
> > > >> Thanks!
> > > >>
> > > >>
> > > >> Sent from my iPhone
> > > >>> On Jul 22, 2014, at 3:06 PM, Jun Rao <[email protected]> wrote:
> > > >>>
> > > >>> Scott,
> > > >>>
> > > >>> The reason for truncation is that the broker that comes back may
> have
> > > >> some
> > > >>> un-committed messages. Those messages shouldn't be exposed to the
> > > >> consumer
> > > >>> and therefore need to be removed from the log. So, on broker
> startup,
> > > we
> > > >>> first truncate the log to a safe point before which we know all
> > > messages
> > > >>> are committed. This broker will then sync up with the current
> leader
> > to
> > > >> get
> > > >>> the remaining messages.
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> Jun
> > > >>>
> > > >>>
> > > >>>> On Tue, Jul 22, 2014 at 9:42 AM, Scott Clasen <[email protected]>
> > > wrote:
> > > >>>>
> > > >>>> Ahh, yes that message loss case. I've wondered about that myself.
> > > >>>>
> > > >>>> I guess I dont really understand why truncating messages is ever
> the
> > > >> right
> > > >>>> thing to do.  As kafka is an 'at least once' system. (send a
> > message,
> > > >> get
> > > >>>> no ack, it still might be on the topic) consumers that care will
> > have
> > > to
> > > >>>> de-dupe anyhow.
> > > >>>>
> > > >>>> To the kafka designers:  is there anything preventing
> implementation
> > > of
> > > >>>> alternatives to truncation? when a broker comes back online and
> > needs
> > > to
> > > >>>> truncate, cant it fire up a producer and take the extra messages
> and
> > > >> send
> > > >>>> them back to the original topic or alternatively an error topic?
> > > >>>>
> > > >>>> Would love to understand the rationale for the current design, as
> my
> > > >>>> perspective is doubtfully as clear as the designers'
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Tue, Jul 22, 2014 at 6:21 AM, Jiang Wu (Pricehistory)
> (BLOOMBERG/
> > > 731
> > > >>>> LEX -) <[email protected]> wrote:
> > > >>>>
> > > >>>>> kafka-1028 addressed another unclean leader election problem. It
> > > >> prevents
> > > >>>>> a broker not in ISR from becoming a leader. The problem we are
> > facing
> > > >> is
> > > >>>>> that a broker in ISR but without complete messages may become a
> > > leader.
> > > >>>>> It's also a kind of unclean leader election, but not the one that
> > > >>>>> kafka-1028 addressed.
> > > >>>>>
> > > >>>>> Here I'm trying to give a proof that current kafka doesn't
> achieve
> > > the
> > > >>>>> requirement (no message loss, no blocking when 1 broker down) due
> > to
> > > >> its
> > > >>>>> two behaviors:
> > > >>>>> 1. when choosing a new leader from 2 followers in ISR, the one
> with
> > > >> less
> > > >>>>> messages may be chosen as the leader
> > > >>>>> 2. even when replica.lag.max.messages=0, a follower can stay in
> ISR
> > > >> when
> > > >>>>> it has less messages than the leader.
> > > >>>>>
> > > >>>>> We consider a cluster with 3 brokers and a topic with 3 replicas.
> > We
> > > >>>>> analyze different cases according to the value of
> > > request.required.acks
> > > >>>>> (acks for short). For each case and it subcases, we find
> situations
> > > >> that
> > > >>>>> either message loss or service blocking happens. We assume that
> at
> > > the
> > > >>>>> beginning, all 3 replicas, leader A, followers B and C, are in
> > sync,
> > > >>>> i.e.,
> > > >>>>> they have the same messages and are all in ISR.
> > > >>>>>
> > > >>>>> 1. acks=0, 1, 3. Obviously these settings do not satisfy the
> > > >> requirement.
> > > >>>>> 2. acks=2. Producer sends a message m. It's acknowledged by A and
> > B.
> > > At
> > > >>>>> this time, although C hasn't received m, it's still in ISR. If A
> is
> > > >>>> killed,
> > > >>>>> C can be elected as the new leader, and consumers will miss m.
> > > >>>>> 3. acks=-1. Suppose replica.lag.max.messages=M. There are two
> > > >> sub-cases:
> > > >>>>> 3.1 M>0. Suppose C be killed. C will be out of ISR after
> > > >>>>> replica.lag.time.max.ms. Then the producer publishes M messages
> > to A
> > > >> and
> > > >>>>> B. C restarts. C will join in ISR since it is M messages behind A
> > and
> > > >> B.
> > > >>>>> Before C replicates all messages, A is killed, and C becomes
> > leader,
> > > >> then
> > > >>>>> message loss happens.
> > > >>>>> 3.2 M=0. In this case, when the producer publishes at a high
> > speed, B
> > > >> and
> > > >>>>> C will fail out of ISR, only A keeps receiving messages. Then A
> is
> > > >>>> killed.
> > > >>>>> Either message loss or service blocking will happen, depending on
> > > >> whether
> > > >>>>> unclean leader election is enabled.
> > > >>>>>
> > > >>>>>
> > > >>>>> From: [email protected] At: Jul 21 2014 22:28:18
> > > >>>>> To: JIANG WU (PRICEHISTORY) (BLOOMBERG/ 731 LEX -),
> > > >>>> [email protected]
> > > >>>>> Subject: Re: how to ensure strong consistency with reasonable
> > > >>>> availability
> > > >>>>>
> > > >>>>> You will probably need 0.8.2  which gives
> > > >>>>> https://issues.apache.org/jira/browse/KAFKA-1028
> > > >>>>>
> > > >>>>>
> > > >>>>> On Mon, Jul 21, 2014 at 6:37 PM, Jiang Wu (Pricehistory)
> > (BLOOMBERG/
> > > >> 731
> > > >>>>> LEX -) <[email protected]> wrote:
> > > >>>>>
> > > >>>>>> Hi everyone,
> > > >>>>>>
> > > >>>>>> With a cluster of 3 brokers and a topic of 3 replicas, we want
> to
> > > >>>> achieve
> > > >>>>>> the following two properties:
> > > >>>>>> 1. when only one broker is down, there's no message loss, and
> > > >>>>>> procuders/consumers are not blocked.
> > > >>>>>> 2. in other more serious problems, for example, one broker is
> > > >> restarted
> > > >>>>>> twice in a short period or two brokers are down at the same
> time,
> > > >>>>>> producers/consumers can be blocked, but no message loss is
> > allowed.
> > > >>>>>>
> > > >>>>>> We haven't found any producer/broker paramter combinations that
> > > >> achieve
> > > >>>>>> this. If you know or think some configurations will work, please
> > > post
> > > >>>>>> details. We have a test bed to verify any given configurations.
> > > >>>>>>
> > > >>>>>> In addition, I'm wondering if it's necessary to open a jira to
> > > require
> > > >>>>> the
> > > >>>>>> above feature?
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Jiang
> > > >>
> > >
> >
>

Re: how to ensure strong consistency with reasonable availability

Reply via email to