Sampath Kumar created KAFKA-7017:
------------------------------------

             Summary: GroupCoordinator response error: Broker: Group 
coordinator not available
                 Key: KAFKA-7017
                 URL: https://issues.apache.org/jira/browse/KAFKA-7017
             Project: Kafka
          Issue Type: Bug
          Components: consumer, controller, core, offset manager
    Affects Versions: 1.1.0
         Environment: Our Setup details as follows

Confluent Kafka Image : confluentinc/cp-enterprise-kafka:4.1.0

In testing setup, we are using Single Broker setup, Deployed in a K8S cluster

We newly deployed our application including broker in K8S cluster, observed the 
following issue for the first time, resulting in our applications failed to 
come up
            Reporter: Sampath Kumar
             Fix For: 1.1.0


__
1. Most of the consumers got stuck while reading the data from Kafka topic, the 
stuck stack trace is given as below, After certain timeout application got 
restarted, try to connect with the same consumer group, however, it still went 
to same stuck stack
 
 "main" #1 prio=5 os_prio=0 tid=0x0000000001811800 nid=0x194 runnable 
[0x00007ffe513bd000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:104)
        at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:122)
        at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
        at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:235)
        at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:196)
        at 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:557)
        at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:495)
        at org.apache.kafka.common.network.Selector.poll(Selector.java:424)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156)
        at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228)
        - locked <0x00000000ae7acf08> (a 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
        at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205)
        - locked <0x00000000ae7acf08> (a 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.fetchCommittedOffsets(ConsumerCoordinator.java:465)
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.committed(KafkaConsumer.java:1461)
 
 
2.  To debug further installed KafkaCat, tried to consume the data using same 
consumer group which is getting stuck, and then with the new consumer group. 
Stuck consumer group we are not able to consume data, however new consumer 
group it was able to consume the data, the error is seen for stuck consumer 
group as follows
 
7|1528304675.172|COMMIT|rdkafka#consumer-1| OffsetCommit for -1 partition(s) 
returned: Local: No offset stored
%7|1528304675.172|UNASSIGN|rdkafka#consumer-1| Group "agent.defaultagent": 
unassign done in state wait-broker (join state init): without new assignment: 
OffsetCommit done (__NO_OFFSET)
%7|1528304675.223|CGRPQUERY|rdkafka#consumer-1| broker:9092/bootstrap: Group 
"agent.defaultagent": querying for coordinator: intervaled in state wait-broker
%7|1528304675.244|SEND|rdkafka#consumer-1| broker:9092/bootstrap: Sent 
GroupCoordinatorRequest (v0, 41 bytes @ 0, CorrId 25)
%7|1528304675.255|RECV|rdkafka#consumer-1| broker:9092/bootstrap: Received 
GroupCoordinatorResponse (v0, 12 bytes, CorrId 25, rtt 10.91ms)
%7|1528304675.326|CGRPCOORD|rdkafka#consumer-1| broker:9092/bootstrap: Group 
"agent.defaultagent" GroupCoordinator response error: Broker: Group coordinator 
not available
%7|1528304676.226|CGRPQUERY|rdkafka#consumer-1| 
broker-0.broker.default.svc.cluster.local:9092/0: Group "agent.defaultagent": 
querying for coordinator: intervaled in state wait-broker
%7|1528304676.330|SEND|rdkafka#consumer-1| 
broker-0.broker.default.svc.cluster.local:9092/0: Sent GroupCoordinatorRequest 
(v0, 41 bytes @ 0, CorrId 33)
%7|1528304676.350|RECV|rdkafka#consumer-1| 
broker-0.broker.default.svc.cluster.local:9092/0: Received 
GroupCoordinatorResponse (v0, 12 bytes, CorrId 33, rtt 19.93ms)
*%7|1528304676.430|CGRPCOORD|rdkafka#consumer-1| 
broker-0.broker.default.svc.cluster.local:9092/0: Group "agent.defaultagent" 
GroupCoordinator response error: Broker: Group coordinator not available*
%7|1528304677.226|CGRPQUERY|rdkafka#consumer-1| broker:9092/bootstrap: Group 
"agent.defaultagent": querying for coordinator: intervaled in state wait-broker
 
 
3. Tried to delete the stuck consumer group, however, its failing with the same 
highlighted error 
 
Error: Deletion of some consumer groups failed:
* Group 'agent.defaultagent' could not be deleted due to: 
COORDINATOR_NOT_AVAILABLE
 
4. From the link I can see 
[http://home.apache.org/~ewencp/kafka-0.10.2.0-rc1/javadoc/org/apache/kafka/common/errors/GroupCoordinatorNotAvailableException.html]
 this is a temporary issue, will get resolved once offset topic created, but in 
our case, it's not recovered, however for the same topic with different 
consumer group consumption is happenings
 
 
Can you let me know the way to recover the system, without restarting the 
broker or Zookeeper, What is the way to avoid this race condition, also is this 
is a bug in Kafka?
 
Let me know if any other details required 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to