Parth, I am skeptical that you actually need 500+ consumers. A well tuned
consumer can process hundreds of thousands of records per second.

Some notes to consider:

- You'll need at least 500 partitions if you have 500 consumers.
- You almost never need more consumers than you have brokers in your
cluster. If you can store N bps to disk on a broker, you can usually
process at least N bps in a consumer.
- If your consumers can't process fast enough, add parallelism within each
consumer, e.g. process records asynchronously. You don't necessarily need
more consumers.
- It might not make sense to auto-scale consumers, since scaling up
triggers a rebalance, which can cause even more consumer lag on bursty
streams.
- Unless you have a real-time use case, you can generally under-provision
your consumers and let them catch up with bursts over time.

For example, I've processed 2 TB of records with 10 consumers in about 15
minutes in stress tests, and I generally have provisioned one 64GB server
for every 20K records/s sustained.

This of course varies wildly depending on your use case, but I just want to
call out that you don't necessarily need a lot of consumers to process huge
amounts of data.

Ryanne

On Dec 18, 2018 10:25 AM, "Manoj Khangaonkar" <khangaon...@gmail.com> wrote:

Rebalancing of partitions consumers does not necessarily mean loss of
message.

But I understand it can be annoying.

If Kafka is rebalancing between consumers frequently, It means your
consumer code is not polling within the expected timeout, as a result of
which
Kafka thinks the consumer is gone. You should tune your consumer
implementation to keep the polling loop duration reasonable. See
heartbeat.interval and session.timeout.ms
configuration params in documentation.

regards



On Tue, Dec 18, 2018 at 3:34 AM Parth Gandhi <
parth.gan...@excellenceinfonet.com> wrote:

> Team,
> We want to build a scalable kafka system for pub sub message and want to
> run consumers (500+) on docker. We want the system to scale up the
consumer
> based on the message inflow. However in kafka this triggers a rebalancing
> and we fear loss of message.
> What is the best practices/way to achieve this with no or least message
> failure?
>
> Disclaimer
>
> The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others
> authorized to receive it. If you are not the recipient, you are hereby
> notified that any disclosure, copying, distribution or taking action in
> relation of the contents of this information is strictly prohibited and
may
> be unlawful.
>
> This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast Ltd, an innovator in Software as a
> Service (SaaS) for business. Providing a safer and more useful place for
> your human generated data. Specializing in; Security, archiving and
> compliance. To find out more visit the Mimecast website.
>


-- 
http://khangaonkar.blogspot.com/

Reply via email to