Hi Pushkar.

Just for your information, https://github.com/line/decaton is a Kafka
consumer framework that supports parallel processing per single partition.

It manages committable (i.e. the offset that all preceding offsets have
been processed) offset internally so that preserves at-least-once semantics
even when processing in parallel.


2020年11月24日(火) 1:16 Pushkar Deole <pdeole2...@gmail.com>:

> Thanks Liam!
> We don't have a requirement to maintain order of processing for events even
> within a partition. Essentially, these are events for various accounts
> (customers) that we want to support and do necessary database provisioning
> for those in our database. So they can be processed in parallel.
>
> I think the 2nd option would suit our requirement to have a single consumer
> and a bound thread pool for processors. However, the issue we may face is
> to commit the offsets only after processing an event since we don't want
> the consumer to auto commit offsets before the provisioning done for the
> customer. How can that be achieved with model #2  ?
>
> On Tue, Oct 27, 2020 at 2:50 PM Liam Clarke-Hutchinson <
> liam.cla...@adscale.co.nz> wrote:
>
> > Hi Pushkar,
> >
> > No. You'd need to combine a consumer with a thread pool or similar as you
> > prefer. As the docs say (from
> >
> >
> https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> > )
> >
> > We have intentionally avoided implementing a particular threading model
> for
> > > processing. This leaves several options for implementing multi-threaded
> > > processing of records.
> > > 1. One Consumer Per Thread
> > > A simple option is to give each thread its own consumer instance. Here
> > are
> > > the pros and cons of this approach:
> > >
> > >    - *PRO*: It is the easiest to implement
> > >
> > >
> > >    - *PRO*: It is often the fastest as no inter-thread co-ordination is
> > >    needed
> > >
> > >
> > >    - *PRO*: It makes in-order processing on a per-partition basis very
> > >    easy to implement (each thread just processes messages in the order
> it
> > >    receives them).
> > >
> > >
> > >    - *CON*: More consumers means more TCP connections to the cluster
> (one
> > >    per thread). In general Kafka handles connections very efficiently
> so
> > this
> > >    is generally a small cost.
> > >
> > >
> > >    - *CON*: Multiple consumers means more requests being sent to the
> > >    server and slightly less batching of data which can cause some drop
> > in I/O
> > >    throughput.
> > >
> > >
> > >    - *CON*: The number of total threads across all processes will be
> > >    limited by the total number of partitions.
> > >
> > > 2. Decouple Consumption and Processing
> > > Another alternative is to have one or more consumer threads that do all
> > > data consumption and hands off ConsumerRecords
> > > <
> >
> https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html
> >
> > instances
> > > to a blocking queue consumed by a pool of processor threads that
> actually
> > > handle the record processing. This option likewise has pros and cons:
> > >
> > >    - *PRO*: This option allows independently scaling the number of
> > >    consumers and processors. This makes it possible to have a single
> > consumer
> > >    that feeds many processor threads, avoiding any limitation on
> > partitions.
> > >
> > >
> > >    - *CON*: Guaranteeing order across the processors requires
> particular
> > >    care as the threads will execute independently an earlier chunk of
> > data may
> > >    actually be processed after a later chunk of data just due to the
> > luck of
> > >    thread execution timing. For processing that has no ordering
> > requirements
> > >    this is not a problem.
> > >
> > >
> > >    - *CON*: Manually committing the position becomes harder as it
> > >    requires that all threads co-ordinate to ensure that processing is
> > complete
> > >    for that partition.
> > >
> > > There are many possible variations on this approach. For example each
> > > processor thread can have its own queue, and the consumer threads can
> > hash
> > > into these queues using the TopicPartition to ensure in-order
> consumption
> > > and simplify commit.
> >
> >
> > Cheers,
> >
> > Liam Clarke-Hutchinson
> >
> > On Tue, Oct 27, 2020 at 8:04 PM Pushkar Deole <pdeole2...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Is there any configuration in kafka consumer to specify multiple
> threads
> > > the way it is there in kafka streams?
> > > Essentially, can we have a consumer with multiple threads where the
> > threads
> > > would divide partitions of topic among them?
> > >
> >
>


-- 
========================
Okada Haruki
ocadar...@gmail.com
========================

Reply via email to