[ https://issues.apache.org/jira/browse/KAFKA-9213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ismael Juma resolved KAFKA-9213. -------------------------------- Resolution: Duplicate Duplicate of KAFKA-9156. > BufferOverflowException on rolling new segment after upgrading Kafka from > 1.1.0 to 2.3.1 > ---------------------------------------------------------------------------------------- > > Key: KAFKA-9213 > URL: https://issues.apache.org/jira/browse/KAFKA-9213 > Project: Kafka > Issue Type: Bug > Components: log > Affects Versions: 2.3.1 > Environment: Ubuntu 16.04, AWS instance d2.8xlarge. > JAVA Options: > -Xms16G > -Xmx16G > -XX:G1HeapRegionSize=16M > -XX:MetaspaceSize=96m > -XX:MinMetaspaceFreeRatio=50 > Reporter: Daniyar > Priority: Blocker > > We updated our Kafka cluster from 1.1.0 version to 2.3.1. We followed up to > step 2 of the [update > instruction|[https://kafka.apache.org/documentation/#upgrade]]. > Message format and inter-broker protocol versions were left the same: > inter.broker.protocol.version=1.1 > log.message.format.version=1.1 > > After upgrading, we started to get some occasional exceptions: > {code:java} > 2019/11/19 05:30:53 INFO [ProducerStateManager > partition=matchmaker_retry_clicks_15m-2] Writing producer snapshot at > offset 788532 (kafka.log.ProducerStateManager) > 2019/11/19 05:30:53 INFO [Log partition=matchmaker_retry_clicks_15m-2, > dir=/mnt/kafka] Rolled new log segment at offset 788532 in 1 ms. > (kafka.log.Log) > 2019/11/19 05:31:01 ERROR [ReplicaManager broker=0] Error processing append > operation on partition matchmaker_retry_clicks_15m-2 > (kafka.server.ReplicaManager) > 2019/11/19 05:31:01 java.nio.BufferOverflowException > 2019/11/19 05:31:01 at java.nio.Buffer.nextPutIndex(Buffer.java:527) > 2019/11/19 05:31:01 at > java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:797) > 2019/11/19 05:31:01 at > kafka.log.TimeIndex.$anonfun$maybeAppend$1(TimeIndex.scala:134) > 2019/11/19 05:31:01 at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > 2019/11/19 05:31:01 at > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) > 2019/11/19 05:31:01 at > kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:114) > 2019/11/19 05:31:01 at > kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:520) > 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$8(Log.scala:1690) > 2019/11/19 05:31:01 at > kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1690) > 2019/11/19 05:31:01 at scala.Option.foreach(Option.scala:407) > 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$2(Log.scala:1690) > 2019/11/19 05:31:01 at > kafka.log.Log.maybeHandleIOException(Log.scala:2085) > 2019/11/19 05:31:01 at kafka.log.Log.roll(Log.scala:1654) > 2019/11/19 05:31:01 at kafka.log.Log.maybeRoll(Log.scala:1639) > 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$append$2(Log.scala:966) > 2019/11/19 05:31:01 at > kafka.log.Log.maybeHandleIOException(Log.scala:2085) > 2019/11/19 05:31:01 at kafka.log.Log.append(Log.scala:850) > 2019/11/19 05:31:01 at kafka.log.Log.appendAsLeader(Log.scala:819) > 2019/11/19 05:31:01 at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772) > 2019/11/19 05:31:01 at > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) > 2019/11/19 05:31:01 at > kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259) > 2019/11/19 05:31:01 at > kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759) > 2019/11/19 05:31:01 at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763) > 2019/11/19 05:31:01 at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > 2019/11/19 05:31:01 at > scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) > 2019/11/19 05:31:01 at > scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > 2019/11/19 05:31:01 at > scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > 2019/11/19 05:31:01 at > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > 2019/11/19 05:31:01 at > scala.collection.mutable.HashMap.foreach(HashMap.scala:149) > 2019/11/19 05:31:01 at > scala.collection.TraversableLike.map(TraversableLike.scala:238) > 2019/11/19 05:31:01 at > scala.collection.TraversableLike.map$(TraversableLike.scala:231) > 2019/11/19 05:31:01 at > scala.collection.AbstractTraversable.map(Traversable.scala:108) > 2019/11/19 05:31:01 at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:751) > 2019/11/19 05:31:01 at > kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:492) > 2019/11/19 05:31:01 at > kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:544) > 2019/11/19 05:31:01 at > kafka.server.KafkaApis.handle(KafkaApis.scala:113) > 2019/11/19 05:31:01 at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) > 2019/11/19 05:31:01 at java.lang.Thread.run(Thread.java:748) > {code} > The error persists until broker gets restarted (or leadership gets moved to > another broker). > > Brokers config: > {code:java} > advertised.host.name={{ hostname }} > port=9092 > # Default number of partitions if a value isn't set when the topic is created. > num.partitions=3 > auto.create.topics.enable=false > delete.topic.enable=false > # Prevent not in-sync replica to become a leader. > unclean.leader.election.enable=false > # The number of threads per data directory to be used for log recovery at > # startup and flushing at shutdown. > num.recovery.threads.per.data.dir=36 > log.flush.interval.messages=10000 > log.flush.interval.ms=2000 > # 1 week > log.retention.hours=168 > log.retention.check.interval.ms=300000 > log.cleaner.enable=false > # Use broker time for message timestamps. > log.message.timestamp.type=LogAppendTime > zookeeper.connect={{zookeeper_host }}:2181 > zookeeper.connection.timeout.ms=6000 > controller.socket.timeout.ms=30000 > controller.message.queue.size=10 > # Replication configuration > num.replica.fetchers=10 > # Socket server configuration > num.io.threads=32 > num.network.threads=16 > socket.request.max.bytes=104857600 > socket.receive.buffer.bytes=1048576 > socket.send.buffer.bytes=1048576 > queued.max.requests=32 > fetch.purgatory.purge.interval.requests=100 > producer.purgatory.purge.interval.requests=100 > inter.broker.protocol.version=1.1 > log.message.format.version=1.1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)