For very large number of consumers, you can manually manage the offsets and/or assign partitions yourself per consumer to avoid rebalancing.
On Dec 18, 2018 9:58 AM, "Ryanne Dolan" <ryannedo...@gmail.com> wrote: > Parth, I am skeptical that you actually need 500+ consumers. A well tuned > consumer can process hundreds of thousands of records per second. > > Some notes to consider: > > - You'll need at least 500 partitions if you have 500 consumers. > - You almost never need more consumers than you have brokers in your > cluster. If you can store N bps to disk on a broker, you can usually > process at least N bps in a consumer. > - If your consumers can't process fast enough, add parallelism within each > consumer, e.g. process records asynchronously. You don't necessarily need > more consumers. > - It might not make sense to auto-scale consumers, since scaling up > triggers a rebalance, which can cause even more consumer lag on bursty > streams. > - Unless you have a real-time use case, you can generally under-provision > your consumers and let them catch up with bursts over time. > > For example, I've processed 2 TB of records with 10 consumers in about 15 > minutes in stress tests, and I generally have provisioned one 64GB server > for every 20K records/s sustained. > > This of course varies wildly depending on your use case, but I just want to > call out that you don't necessarily need a lot of consumers to process huge > amounts of data. > > Ryanne > > On Dec 18, 2018 10:25 AM, "Manoj Khangaonkar" <khangaon...@gmail.com> > wrote: > > Rebalancing of partitions consumers does not necessarily mean loss of > message. > > But I understand it can be annoying. > > If Kafka is rebalancing between consumers frequently, It means your > consumer code is not polling within the expected timeout, as a result of > which > Kafka thinks the consumer is gone. You should tune your consumer > implementation to keep the polling loop duration reasonable. See > heartbeat.interval and session.timeout.ms > configuration params in documentation. > > regards > > > > On Tue, Dec 18, 2018 at 3:34 AM Parth Gandhi < > parth.gan...@excellenceinfonet.com> wrote: > > > Team, > > We want to build a scalable kafka system for pub sub message and want to > > run consumers (500+) on docker. We want the system to scale up the > consumer > > based on the message inflow. However in kafka this triggers a rebalancing > > and we fear loss of message. > > What is the best practices/way to achieve this with no or least message > > failure? > > > > Disclaimer > > > > The information contained in this communication from the sender is > > confidential. It is intended solely for use by the recipient and others > > authorized to receive it. If you are not the recipient, you are hereby > > notified that any disclosure, copying, distribution or taking action in > > relation of the contents of this information is strictly prohibited and > may > > be unlawful. > > > > This email has been scanned for viruses and malware, and may have been > > automatically archived by Mimecast Ltd, an innovator in Software as a > > Service (SaaS) for business. Providing a safer and more useful place for > > your human generated data. Specializing in; Security, archiving and > > compliance. To find out more visit the Mimecast website. > > > > > -- > http://khangaonkar.blogspot.com/ >