Parth, I am skeptical that you actually need 500+ consumers. A well tuned consumer can process hundreds of thousands of records per second.
Some notes to consider: - You'll need at least 500 partitions if you have 500 consumers. - You almost never need more consumers than you have brokers in your cluster. If you can store N bps to disk on a broker, you can usually process at least N bps in a consumer. - If your consumers can't process fast enough, add parallelism within each consumer, e.g. process records asynchronously. You don't necessarily need more consumers. - It might not make sense to auto-scale consumers, since scaling up triggers a rebalance, which can cause even more consumer lag on bursty streams. - Unless you have a real-time use case, you can generally under-provision your consumers and let them catch up with bursts over time. For example, I've processed 2 TB of records with 10 consumers in about 15 minutes in stress tests, and I generally have provisioned one 64GB server for every 20K records/s sustained. This of course varies wildly depending on your use case, but I just want to call out that you don't necessarily need a lot of consumers to process huge amounts of data. Ryanne On Dec 18, 2018 10:25 AM, "Manoj Khangaonkar" <khangaon...@gmail.com> wrote: Rebalancing of partitions consumers does not necessarily mean loss of message. But I understand it can be annoying. If Kafka is rebalancing between consumers frequently, It means your consumer code is not polling within the expected timeout, as a result of which Kafka thinks the consumer is gone. You should tune your consumer implementation to keep the polling loop duration reasonable. See heartbeat.interval and session.timeout.ms configuration params in documentation. regards On Tue, Dec 18, 2018 at 3:34 AM Parth Gandhi < parth.gan...@excellenceinfonet.com> wrote: > Team, > We want to build a scalable kafka system for pub sub message and want to > run consumers (500+) on docker. We want the system to scale up the consumer > based on the message inflow. However in kafka this triggers a rebalancing > and we fear loss of message. > What is the best practices/way to achieve this with no or least message > failure? > > Disclaimer > > The information contained in this communication from the sender is > confidential. It is intended solely for use by the recipient and others > authorized to receive it. If you are not the recipient, you are hereby > notified that any disclosure, copying, distribution or taking action in > relation of the contents of this information is strictly prohibited and may > be unlawful. > > This email has been scanned for viruses and malware, and may have been > automatically archived by Mimecast Ltd, an innovator in Software as a > Service (SaaS) for business. Providing a safer and more useful place for > your human generated data. Specializing in; Security, archiving and > compliance. To find out more visit the Mimecast website. > -- http://khangaonkar.blogspot.com/