Re: are offsets per consumer or per consumer group?

2018-02-08 Thread Xavier Noria
Thanks very much Luke!


Re: are offsets per consumer or per consumer group?

2018-02-08 Thread Luke Steensen
1) In more recent versions of Kafka, the consumer group coordinator runs on
the broker. Previously, there was a "high level consumer" that spoke
directly to zookeeper and did group management within the client libraries,
but this is no longer used.

2) That depends on when your consumer commits offsets. The normal case is
to commit the offset for a message after that message has been processed.
In that case, the next consumer to be assigned that partition would
reprocess message 77. The other option is to commit offsets as messages are
received but before they are processed. This cause messages to be processed
at most once, instead of at least once.

3) The best way to get at least once processing is to make sure your client
is not configured to automatically commit offsets, and to do so explicitly.
This way you can be sure commits only happen once the result of processing
a message has been durably stored (or whatever needs to happen for your use
case). That doesn't necessarily mean you need to commit immediately after
each individual message, only that when you commit it is only for messages
that have been completely processed.


On Thu, Feb 8, 2018 at 9:37 AM, Xavier Noria  wrote:

> On Thu, Feb 8, 2018 at 4:27 PM, Luke Steensen <
> luke.steen...@braintreepayments.com> wrote:
>
> Offsets are maintained per consumer group. When an individual consumer
> > crashes, the consumer group coordinator will detect that failure and
> > trigger a rebalance. This redistributes the partitions being consumed
> > across the available consumer processes, using the most recently
> committed
> > offset for each as the starting point.
> >
>
> Excellent, the getting started guide uses "consumer" sometimes meaning an
> individual consumer, and sometimes meaning a consumer group. That
> difficults a bit understanding how it works with exactitude. Thanks for
> clarifying.
>
> Let me followup with these questions then:
>
> 1) The group coordinator runs in Kafka? Or is the client library
> responsible for that?
>
> 2) Say that a consumer group has consumers A, B and C, assigned to the 3
> partitions resectively. Consumer A polls and gets messages 75-80, but when
> it is processing message 77 crashes. The coordinator rebalances and assigns
> that partition to some of the other two, but at which offset is that
> partition left?
>
> 3) If the answer is 81, a critical consumer group that cannot miss messages
> is expected to write custom coordination code to avoid missing 77-80? If
> yes, are there best practices out there for doing this?
>


Re: are offsets per consumer or per consumer group?

2018-02-08 Thread Xavier Noria
On Thu, Feb 8, 2018 at 4:27 PM, Luke Steensen <
luke.steen...@braintreepayments.com> wrote:

Offsets are maintained per consumer group. When an individual consumer
> crashes, the consumer group coordinator will detect that failure and
> trigger a rebalance. This redistributes the partitions being consumed
> across the available consumer processes, using the most recently committed
> offset for each as the starting point.
>

Excellent, the getting started guide uses "consumer" sometimes meaning an
individual consumer, and sometimes meaning a consumer group. That
difficults a bit understanding how it works with exactitude. Thanks for
clarifying.

Let me followup with these questions then:

1) The group coordinator runs in Kafka? Or is the client library
responsible for that?

2) Say that a consumer group has consumers A, B and C, assigned to the 3
partitions resectively. Consumer A polls and gets messages 75-80, but when
it is processing message 77 crashes. The coordinator rebalances and assigns
that partition to some of the other two, but at which offset is that
partition left?

3) If the answer is 81, a critical consumer group that cannot miss messages
is expected to write custom coordination code to avoid missing 77-80? If
yes, are there best practices out there for doing this?


Re: are offsets per consumer or per consumer group?

2018-02-08 Thread Luke Steensen
Offsets are maintained per consumer group. When an individual consumer
crashes, the consumer group coordinator will detect that failure and
trigger a rebalance. This redistributes the partitions being consumed
across the available consumer processes, using the most recently committed
offset for each as the starting point.


On Thu, Feb 8, 2018 at 6:58 AM, Xavier Noria  wrote:

> Let's suppose a topic has three partitions and two consumer groups
> listening.
>
> The offset maintained by Kafka in each partition is associated with the
> consumer group? Or with the individual consumer polling from that partition
> in each consumer group respectively?
>
> I am trying to understand the system behavior when listeners crash, but in
> order to formulate more questions I need to double-check that before.
>


are offsets per consumer or per consumer group?

2018-02-08 Thread Xavier Noria
Let's suppose a topic has three partitions and two consumer groups
listening.

The offset maintained by Kafka in each partition is associated with the
consumer group? Or with the individual consumer polling from that partition
in each consumer group respectively?

I am trying to understand the system behavior when listeners crash, but in
order to formulate more questions I need to double-check that before.