I'm using Kafka version In my cluster, I've 4 nodes running Kafka
of which 3 nodes also running Zookeeper. I've a few producer processes that
publish to Kafka and multiple consumer processes, a streaming engine
(Spark) that ingests from Kafka and also publishes data to Kafka, and a
distributed data store (Druid) which reads all messages from Kafka and
stores in the DB. Druid also uses the same Zookeeper cluster being used by
Kafka for cluster state management.

*Kafka Configs:*
1) No replication being used
2) Number of network threads 30
3) Number of IO threads 8
4) Machines have 64GB RAM and 16 cores
5) 3 topics with 64 partitions per topic


1) *Partitions going offline*
I frequently see partitions going offline because of which the scheduling
delay of the Spark application increases and input rate gets jittery. I
tried enabling replication too to see if it helped with the problem. It
didn't quite make a difference. What could be the cause of this issue? Lack
of resources or cluster misconfigurations? Can the cause be large number of
receiver processes?

*2) Colocation of Zookeeper and Kafka:*
As I mentioned above, I'm running 3 nodes with both Zookeeper and Kafka
colocated. Both the components are containerized, so they are running
inside docker containers. I found a few blogs that suggested not colocating
them for performance reasons. Is it necessary to run them on dedicated

*3) Using same Zookeeper cluster across different components*
In my cluster, I use the same Zookeeper cluster for state management of the
Kafka cluster and the Druid cluster. Could this cause instability of the
overall system?

Hope I've covered all the necessary information needed. Please let me know
if more information about my cluster is needed.

Thanks in advance,

Excuse brevity and typos. Sent from mobile device.

Reply via email to