I'm using Kafka version In my cluster, I've 4 nodes running Kafka
of which 3 nodes also running Zookeeper. I've a few producer processes that
publish to Kafka and multiple consumer processes, a streaming engine
(Spark) that ingests from Kafka and also publishes data to Kafka, and a
distributed data store (Druid) which reads all messages from Kafka and
stores in the DB. Druid also uses the same Zookeeper cluster being used by
Kafka for cluster state management.

*Kafka Configs:*
1) No replication being used
2) Number of network threads 30
3) Number of IO threads 8
4) Machines have 64GB RAM and 16 cores
5) 3 topics with 64 partitions per topic


1) *Partitions going offline*
I frequently see partitions going offline because of which the scheduling
delay of the Spark application increases and input rate gets jittery. I
tried enabling replication too to see if it helped with the problem. It
didn't quite make a difference. What could be the cause of this issue? Lack
of resources or cluster misconfigurations? Can the cause be large number of
receiver processes?

*2) Colocation of Zookeeper and Kafka:*
As I mentioned above, I'm running 3 nodes with both Zookeeper and Kafka
colocated. Both the components are containerized, so they are running
inside docker containers. I found a few blogs that suggested not colocating
them for performance reasons. Is it necessary to run them on dedicated

*3) Using same Zookeeper cluster across different components*
In my cluster, I use the same Zookeeper cluster for state management of the
Kafka cluster and the Druid cluster. Could this cause instability of the
overall system?

