Hi Pierre,

Do your brokers remain responsive? In other words, do you see any other
symptoms such as decreased write or read throughput which may indicate long
GC pauses or possibly heavy load on your zookeeper cluster as evidenced by
any SocketTimeoutExceptions on the Kafka and/or Zookeeper sides?

--John

On Tue, Jul 11, 2017 at 6:15 AM, Pierre Coquentin <
pierre.coquen...@gmail.com> wrote:

> Hi,
>
> We are using kafka 0.10.2 with 2 brokers and 2 application nodes composed
> of 6 consumers each (all in one group). And recently we experienced
> disconnection of both nodes simultaneously and an infinite retry to connect
> to the coordinator. Currently, just restarting the nodes solve the problem
> but it will occur a few hours later.
> In the application log we see a lot of :
> 11.07.2017 06:47:08,905 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:631]
> Marking the coordinator kafka-2:9092 (id: 2147483646 rack: null) dead for
> group ABC
> 11.07.2017 06:47:09,007 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:586]
> Discovered coordinator kafka-2:9092 (id: 2147483646 rack: null) for group
> ABC.
> 11.07.2017 06:47:09,008 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:420]
> (Re-)joining group ABC
> 11.07.2017 06:47:09,274 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:631]
> Marking the coordinator kafka-2:9092 (id: 2147483646 rack: null) dead for
> group ABC
> 11.07.2017 06:47:09,375 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:586]
> Discovered coordinator kafka-2:9092 (id: 2147483646 rack: null) for group
> ABC.
> 11.07.2017 06:47:09,375 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:420]
> (Re-)joining group ABC
> 11.07.2017 06:47:10,820 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:631]
> Marking the coordinator kafka-2:9092 (id: 2147483646 rack: null) dead for
> group ABC
> 11.07.2017 06:47:10,921 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:586]
> Discovered coordinator kafka-2:9092 (id: 2147483646 rack: null) for group
> ABC.
> 11.07.2017 06:47:10,922 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:420]
> (Re-)joining group ABC
>
> There is nothing in the log of the brokers.
> We have no problem to contact the coordinator from both nodes. Could it be
> a periodic instability of the network which leads to this infinite retries?
> This problem could it be related to
> https://issues.apache.org/jira/browse/KAFKA-5464 ?
>
> Here is the configuration of the Stream (lots of option are default ones)
>         application.id = ABC
>         application.server =
>         bootstrap.servers = [kafka-1:9092, kafka-2:9092]
>         buffered.records.per.partition = 1000
>         cache.max.bytes.buffering = 10485760
>         client.id =
>         commit.interval.ms = 30000
>         connections.max.idle.ms = 540000
>         key.serde = class
> org.apache.kafka.common.serialization.Serdes$StringSerde
>         metadata.max.age.ms = 300000
>         num.standby.replicas = 0
>         num.stream.threads = 6
>         partition.grouper = class
> org.apache.kafka.streams.processor.DefaultPartitionGrouper
>         poll.ms = 100
>         receive.buffer.bytes = 32768
>         reconnect.backoff.ms = 50
>         replication.factor = 1
>         request.timeout.ms = 40000
>         retry.backoff.ms = 100
>         rocksdb.config.setter = null
>         security.protocol = PLAINTEXT
>         send.buffer.bytes = 131072
>         state.cleanup.delay.ms = 60000
>         state.dir = null
>         timestamp.extractor = class
> org.apache.kafka.streams.processor.FailOnInvalidTimestamp
>         value.serde = class com.sigfox.kafka.serde.AvroStreamRecordSerde
>         windowstore.changelog.additional.retention.ms = 86400000
>         zookeeper.connect =
> zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
>
>
> Any thoughts?
> Regards,
>
> Pierre
>

Reply via email to