Hey All, Are there some guidelines / documentation around capacity planning for Kafka streams?
We have a Streams application which consumes messages from a topic with 400 partitions. At peak time, there are around 20K messages coming into that topic per second. The Streams app consumes these messages to do some aggregations. Currently we have 3 VMs, each of them running 10 threads. The lag keeps fluctuating between a few tens to a few lakhs. We have also noticed that lag on the partitions being consumed on one particular machine is way higher than the other two machines. Has anybody faced similar issues? How did you guys resolve it?