Re: multi-threaded consumer configuration like stream threads?

Haruki Okada Mon, 23 Nov 2020 16:05:02 -0800

I see.

Then I think the appropriate approach depends on your delivery latency
requirements.
Just retrying until success is simpler but it could block subsequent
messages to get processed. (also depends on thread pool size though)


Then another concern when using dead letter topic would be retrying backoff.
If you don't use backoff, the production for dead letter topic could burst
when downstream db experiences transient problems but on the other hand
injecting backoff-delay would require consideration about how to not block
subsequent messages.

(FYI, Decaton provides retry-queueing with backoff out-of-the box. :)
https://github.com/line/decaton/blob/master/docs/retry-queueing.adoc)

2020年11月24日(火) 2:38 Pushkar Deole <pdeole2...@gmail.com>:

> Thanks Haruki... right now the max of such types of events that we would
> have is 100 since we would be supporting those many customers (accounts)
> for now, for which we are considering a simple approach of a single
> consumer and a thread pool with around 10 threads. So the question was
> regarding how to manage failed events, should those be retried until
> successful or sent to a dead letter queue/topic from where they will be
> processed again until successful.
>
>
> On Mon, Nov 23, 2020 at 10:16 PM Haruki Okada <ocadar...@gmail.com> wrote:
>
> > Hi Pushkar.
> >
> > Just for your information, https://github.com/line/decaton is a Kafka
> > consumer framework that supports parallel processing per single
> partition.
> >
> > It manages committable (i.e. the offset that all preceding offsets have
> > been processed) offset internally so that preserves at-least-once
> semantics
> > even when processing in parallel.
> >
> >
> > 2020年11月24日(火) 1:16 Pushkar Deole <pdeole2...@gmail.com>:
> >
> > > Thanks Liam!
> > > We don't have a requirement to maintain order of processing for events
> > even
> > > within a partition. Essentially, these are events for various accounts
> > > (customers) that we want to support and do necessary database
> > provisioning
> > > for those in our database. So they can be processed in parallel.
> > >
> > > I think the 2nd option would suit our requirement to have a single
> > consumer
> > > and a bound thread pool for processors. However, the issue we may face
> is
> > > to commit the offsets only after processing an event since we don't
> want
> > > the consumer to auto commit offsets before the provisioning done for
> the
> > > customer. How can that be achieved with model #2  ?
> > >
> > > On Tue, Oct 27, 2020 at 2:50 PM Liam Clarke-Hutchinson <
> > > liam.cla...@adscale.co.nz> wrote:
> > >
> > > > Hi Pushkar,
> > > >
> > > > No. You'd need to combine a consumer with a thread pool or similar as
> > you
> > > > prefer. As the docs say (from
> > > >
> > > >
> > >
> >
> https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> > > > )
> > > >
> > > > We have intentionally avoided implementing a particular threading
> model
> > > for
> > > > > processing. This leaves several options for implementing
> > multi-threaded
> > > > > processing of records.
> > > > > 1. One Consumer Per Thread
> > > > > A simple option is to give each thread its own consumer instance.
> > Here
> > > > are
> > > > > the pros and cons of this approach:
> > > > >
> > > > >    - *PRO*: It is the easiest to implement
> > > > >
> > > > >
> > > > >    - *PRO*: It is often the fastest as no inter-thread
> co-ordination
> > is
> > > > >    needed
> > > > >
> > > > >
> > > > >    - *PRO*: It makes in-order processing on a per-partition basis
> > very
> > > > >    easy to implement (each thread just processes messages in the
> > order
> > > it
> > > > >    receives them).
> > > > >
> > > > >
> > > > >    - *CON*: More consumers means more TCP connections to the
> cluster
> > > (one
> > > > >    per thread). In general Kafka handles connections very
> efficiently
> > > so
> > > > this
> > > > >    is generally a small cost.
> > > > >
> > > > >
> > > > >    - *CON*: Multiple consumers means more requests being sent to
> the
> > > > >    server and slightly less batching of data which can cause some
> > drop
> > > > in I/O
> > > > >    throughput.
> > > > >
> > > > >
> > > > >    - *CON*: The number of total threads across all processes will
> be
> > > > >    limited by the total number of partitions.
> > > > >
> > > > > 2. Decouple Consumption and Processing
> > > > > Another alternative is to have one or more consumer threads that do
> > all
> > > > > data consumption and hands off ConsumerRecords
> > > > > <
> > > >
> > >
> >
> https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html
> > > >
> > > > instances
> > > > > to a blocking queue consumed by a pool of processor threads that
> > > actually
> > > > > handle the record processing. This option likewise has pros and
> cons:
> > > > >
> > > > >    - *PRO*: This option allows independently scaling the number of
> > > > >    consumers and processors. This makes it possible to have a
> single
> > > > consumer
> > > > >    that feeds many processor threads, avoiding any limitation on
> > > > partitions.
> > > > >
> > > > >
> > > > >    - *CON*: Guaranteeing order across the processors requires
> > > particular
> > > > >    care as the threads will execute independently an earlier chunk
> of
> > > > data may
> > > > >    actually be processed after a later chunk of data just due to
> the
> > > > luck of
> > > > >    thread execution timing. For processing that has no ordering
> > > > requirements
> > > > >    this is not a problem.
> > > > >
> > > > >
> > > > >    - *CON*: Manually committing the position becomes harder as it
> > > > >    requires that all threads co-ordinate to ensure that processing
> is
> > > > complete
> > > > >    for that partition.
> > > > >
> > > > > There are many possible variations on this approach. For example
> each
> > > > > processor thread can have its own queue, and the consumer threads
> can
> > > > hash
> > > > > into these queues using the TopicPartition to ensure in-order
> > > consumption
> > > > > and simplify commit.
> > > >
> > > >
> > > > Cheers,
> > > >
> > > > Liam Clarke-Hutchinson
> > > >
> > > > On Tue, Oct 27, 2020 at 8:04 PM Pushkar Deole <pdeole2...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Is there any configuration in kafka consumer to specify multiple
> > > threads
> > > > > the way it is there in kafka streams?
> > > > > Essentially, can we have a consumer with multiple threads where the
> > > > threads
> > > > > would divide partitions of topic among them?
> > > > >
> > > >
> > >
> >
> >
> > --
> > ========================
> > Okada Haruki
> > ocadar...@gmail.com
> > ========================
> >
>


-- 
========================
Okada Haruki
ocadar...@gmail.com
========================

Re: multi-threaded consumer configuration like stream threads?

Reply via email to