For #2 and #3, you would get better stability if zookeeper and Kafka get
dedicated machines.

Have you profiled the performance of the nodes where multiple processes ran
(zookeeper / Kafka / Druid) ? How was disk and network IO like ?


> Hi,
> I'm using Kafka version In my cluster, I've 4 nodes running Kafka
> of which 3 nodes also running Zookeeper. I've a few producer processes that
> publish to Kafka and multiple consumer processes, a streaming engine
> (Spark) that ingests from Kafka and also publishes data to Kafka, and a
> distributed data store (Druid) which reads all messages from Kafka and
> stores in the DB. Druid also uses the same Zookeeper cluster being used by
> Kafka for cluster state management.
> *Kafka Configs:*
> 1) No replication being used
> 2) Number of network threads 30
> 3) Number of IO threads 8
> 4) Machines have 64GB RAM and 16 cores
> 5) 3 topics with 64 partitions per topic
> *Questions:*
> 1) *Partitions going offline*
> I frequently see partitions going offline because of which the scheduling
> delay of the Spark application increases and input rate gets jittery. I
> tried enabling replication too to see if it helped with the problem. It
> didn't quite make a difference. What could be the cause of this issue? Lack
> of resources or cluster misconfigurations? Can the cause be large number of
> receiver processes?
> *2) Colocation of Zookeeper and Kafka:*
> As I mentioned above, I'm running 3 nodes with both Zookeeper and Kafka
> colocated. Both the components are containerized, so they are running
> inside docker containers. I found a few blogs that suggested not colocating
> them for performance reasons. Is it necessary to run them on dedicated
> machines?
> *3) Using same Zookeeper cluster across different components*
> In my cluster, I use the same Zookeeper cluster for state management of the
> Kafka cluster and the Druid cluster. Could this cause instability of the
> overall system?
> Hope I've covered all the necessary information needed. Please let me know
> if more information about my cluster is needed.
> Thanks in advance,
> Avinash
