Hi,
We are using zookeeper 3.3.6 with kafka 0.7.2. We have a topic with 8
partitions on each of 3 brokers that we are consuming with a consumer group
with multiple threads. We are using the following settings for our consumers:
zk.connectiontimeout.ms=12000000
fetch_size=52428800
queuedchunks.max=6
consumer.timeout.ms=5000
Our brokers have the following configuration:
socket.send.buffer=1048576
socket.receive.buffer=1048576
max.socket.request.bytes=104857600
log.flush.interval=10000
log.default.flush.interval.ms=1000
log.default.flush.scheduler.interval.ms=1000
log.retention.hours=4
log.file.size=536870912
enable.zookeeper=true
zk.connectiontimeout.ms=6000
zk.sessiontimeout.ms=6000
max.message.size=52428800
We are noticing that after the consumer runs for a short while, some threads
stop consuming and start throwing the following timeout exceptions:
kafka.consumer.ConsumerTimeoutException
at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:66)
at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:32)
at
kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:59)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:51)
When this happens, message consumption on the affected partitions doesn't
recover but stalls and the consumer offset remains frozen. The exceptions also
continue to be thrown in the logs as the thread logic logs the error then tries
to create another iterator from the stream and consume from it. We also notice
that consumption tends to freeze on 2/3 brokers but there is one that always
seems to keep the consumers fed. Are there settings or logic we can use to
avoid or recover from such exceptions?
-drew