Hi ,
 I have an environment like kafka cluster with 3 brokers & kafka-streams to
process data of kafka topic.
 Here kafka & kafka-streams versions are  2.7.0 .
 Which is working fine for sometime , later having issues in
kafka-streams, in logs showing below error's

   - Execution error java.util.concurrent.ExecutionException:
   org.apache.kafka.common.errors.TimeoutException
   - Rebalance failed. org.apache.kafka.common.errors.DisconnectException
   - Rebalance failed.
   org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The
   coordinator is loading and hence can't process requests.

Whenever restarting kafka-cluster & kafka-streams , then again working fine
for some time.
I am not sure exactly where the problem is , But when i looked at kafka
metrics which are below.
Here i thought mostly **** indicated  metrics having high error rate &
looks like *kafka cluster is unstable*.

network<type=RequestMetrics, name=ErrorsPerSec, request=ApiVersions,
error=NONE><>Count)
# TYPE kafka_network_requestmetrics_errors_total untyped
kafka_network_requestmetrics_errors_total{request="ApiVersions,
error=NONE",} 510.0
kafka_network_requestmetrics_errors_total{request="Fetch,
error=NOT_LEADER_OR_FOLLOWER",} 53.0
**** kafka_network_requestmetrics_errors_total{request="Fetch,
error=FENCED_LEADER_EPOCH",} 334507.0
kafka_network_requestmetrics_errors_total{request="JoinGroup, error=NONE",}
9.0
kafka_network_requestmetrics_errors_total{request="JoinGroup,
error=COORDINATOR_LOAD_IN_PROGRESS",} 252.0
kafka_network_requestmetrics_errors_total{request="OffsetForLeaderEpoch,
error=UNKNOWN_LEADER_EPOCH",} 42.0
kafka_network_requestmetrics_errors_total{request="JoinGroup,
error=NOT_COORDINATOR",} 2.0
**** kafka_network_requestmetrics_errors_total{request="LeaderAndIsr,
error=NONE",} 346.0
kafka_network_requestmetrics_errors_total{request="OffsetForLeaderEpoch,
error=NONE",} 62.0
**** kafka_network_requestmetrics_errors_total{request="FindCoordinator,
error=COORDINATOR_NOT_AVAILABLE",} 104.0
kafka_network_requestmetrics_errors_total{request="ListOffsets,
error=NOT_LEADER_OR_FOLLOWER",} 15.0
kafka_network_requestmetrics_errors_total{request="SyncGroup,
error=UNKNOWN_MEMBER_ID",} 3.0
**** kafka_network_requestmetrics_errors_total{request="OffsetCommit,
error=NONE",} 1883.0
kafka_network_requestmetrics_errors_total{request="Heartbeat,
error=NOT_COORDINATOR",} 2.0
**** kafka_network_requestmetrics_errors_total{request="Metadata,
error=NONE",} 1091.0
kafka_network_requestmetrics_errors_total{request="Heartbeat,
error=UNKNOWN_MEMBER_ID",} 5.0
kafka_network_requestmetrics_errors_total{request="ListOffsets,
error=FENCED_LEADER_EPOCH",} 5.0
**** kafka_network_requestmetrics_errors_total{request="DeleteRecords,
error=NONE",} 756.0
kafka_network_requestmetrics_errors_total{request="OffsetFetch,
error=NONE",} 134.0
kafka_network_requestmetrics_errors_total{request="FindCoordinator,
error=NONE",} 19.0
kafka_network_requestmetrics_errors_total{request="ListOffsets,
error=NONE",} 321.0
kafka_network_requestmetrics_errors_total{request="SyncGroup, error=NONE",}
6.0
kafka_network_requestmetrics_errors_total{request="JoinGroup,
error=MEMBER_ID_REQUIRED",} 9.0
kafka_network_requestmetrics_errors_total{request="UpdateMetadata,
error=NONE",} 13.0
kafka_network_requestmetrics_errors_total{request="Fetch,
error=UNKNOWN_LEADER_EPOCH",} 88.0
**** kafka_network_requestmetrics_errors_total{request="Fetch,
error=NONE",} 16927.0
**** kafka_network_requestmetrics_errors_total{request="Heartbeat,
error=NONE",} 18353.0
kafka_network_requestmetrics_errors_total{request="OffsetForLeaderEpoch,
error=UNKNOWN_TOPIC_OR_PARTITION",} 24.0
kafka_network_requestmetrics_errors_total{request="OffsetForLeaderEpoch,
error=NOT_LEADER_OR_FOLLOWER",} 17.0
**** kafka_network_requestmetrics_errors_total{request="Produce,
error=NONE",} 4450.0
# HELP jmx_scrape_error Non-zero if this scrape failed.

These are the kafka configurations which i used

        rm -f /var/lib/kafka/kafka-0/.lock;
rm -f /var/lib/kafka/kafka-0/meta.properties;
exec kafka-server-start.sh /opt/kafka/config/server.properties
--override unclean.leader.election.enable=true
--override broker.id=0
--override listeners=PLAINTEXT://\${LOCAL_POD_IP}:9093
--override host.name=#[[${HOSTNAME}]]#
--override advertised.listeners=PLAINTEXT://\${LOCAL_POD_IP}:9093
--override log.dirs=/var/lib/kafka/kafka-0
--override auto.create.topics.enable=true
--override auto.leader.rebalance.enable=true
--override compression.type=producer
--override delete.topic.enable=false
--override offsets.topic.replication.factor=2
--override broker.id.generation.enable=true
--override default.replication.factor=2
--override num.partitions=10
--override log.retention.bytes=536870912000
--override socket.request.max.bytes=1195725856
--override log.retention.hours=360
--override log.roll.hours=360
--override max.message.bytes=5242880
--override zookeeper.ssl.endpoint.identification.algorithm
--override zookeeper.ssl.client.enable=true
--override
zookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
--override zookeeper.ssl.keystore.type=PEM
--override zookeeper.ssl.truststore.type=PEM
--override zookeeper.ssl.keystore.location=/zoo/cert_key
--override zookeeper.ssl.truststore.location=/zoo/caBundle
--override zookeeper.set.acl=false
--override zookeeper.connect=zookeeper-svc:4095/kafka-brokers/kafka

Please have a look at the kafka configurations & metrics  , let me know
what all changes to do make kafka-cluster stable.
Thanks in advance to looking into this.
--


Thank's&Regard's,
Prasad.

Reply via email to