I am using kafka as a buffer for data streaming in from various sources. Since its a time series data, I generate the key to the message by combining source ID and minute in the timestamp. This means I can utmost have 60 partitions per topic (as each source has its own topic). I have set num.partitions to be 30 (60/2) for each topic in broker config. I don't have a very good reason to pick 30 as default number of partitions per topic but I wanted it to be a high number so that I can achieve high parallelism during in-stream processing. I am worried that having a high number like 30 (default configuration had it as 2), it can negatively impact kafka performance in terms of message throughput or memory consumption. I understand that this can lead to many files per partition but I am thinking of dealing with it by having multiple directories on the same disk if at all I run into issues.
My question to the community is that am I prematurely attempting to optimizing the partition number as right now even a partition number of 5 seems sufficient and hence will run into unwanted issues? Or is 30 an Ok number to use for number of partitions?
