So as I noted, it really does depend on what you need. In the case of a small number of topics, I would say to make the number of partitions be a multiple of the number of brokers. That will balance them in the cluster, while still giving you some freedom to have larger partition counts for larger topics.
-Todd On Wed, Apr 8, 2015 at 9:29 AM, Akshat Aranya <aara...@gmail.com> wrote: > Thanks for the info, Todd. This is very useful. Please see my question > inline: > > On Mon, Apr 6, 2015 at 10:24 AM, Todd Palino <tpal...@gmail.com> wrote: > > > > > - Partition count (leader and follower combined) on each broker > should > > stay under 4000 > > > > As far as topic volume goes, it varies widely. We have topics that only > see > > a single message per minute (or less). Our largest topic by bytes has a > > peak rate of about 290 Mbits/sec. Our largest topic by messages has a > peak > > rate of about 225k messages/sec. Note that those are in the same cluster. > > When we are sizing topics (number of partitions), we use the following > > guidelines: > > - Have at least as many partitions as there are consumers in the > > largest group > > - Keep partition size on disk under 50GB per partition (better > balance) > > - Take into account any other application requirements (keyed > messages, > > specific topic counts required, etc.) > > > > What would you say is a recommended configuration when you don't have > too > many topics? It seems like having too many partitions is not recommended, > but at the same time, you need more partitions to be able to utilize all > the disks and handle the data rate, especially for high volume topics. > > I hope this helps. I'll be covering some of this at my ApacheCon talk > > (Kafka at Scale: Multi-Tier Architectures) and at the meet up that Jun > has > > set up at ApacheCon. If you have any questions, just ask! > > > > -Todd > > > > > > On Mon, Apr 6, 2015 at 9:35 AM, Rama Ramani <rama.ram...@live.com> > wrote: > > > > > Hello, > > > I am trying to understand some of the common Kafka deployment > > > sizes ("small", "medium", "large") and configuration to come up with a > > set > > > of common templates for deployment on Linux. Some of the Qs to answer > > are: > > > > > > - Number of nodes in the cluster > > > - Machine Specs (cpu, memory, number of disks, network etc.) > > > - Speeds & Feeds of messages > > > - What are some of the best practices to consider when laying out the > > > clusters? > > > - Is there a sizing calculator for coming up with this? > > > > > > If you can please share pointers to existing materials or specific > > details > > > of your deployment, that will be great. > > > > > > Regards > > > Rama > > > > > >