[jira] [Comment Edited] (KAFKA-6649) ReplicaFetcher stopped after non fatal exception is thrown
[ https://issues.apache.org/jira/browse/KAFKA-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440047#comment-16440047 ] Srinivas Dhruvakumar edited comment on KAFKA-6649 at 4/16/18 9:27 PM: -- The Offset Out Of Range Exception has been fixed but we hit the below when we were in the middle of our release. We got an exception while releasing the latest patch on one of the clusters. Is this expected ? 2018-04-13 00:14:18,344 [ReplicaFetcherThread-2-348] ERROR (kafka.server.ReplicaFetcherThread) - [ReplicaFetcherThread-2-348]: Error due to kafka.common.KafkaException: error processing data for partition [topic1-0,0] offset 325231 at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:208) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:174) at scala.Option.foreach(Option.scala:257) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:174) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:171) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:171) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:171) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:171) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:169) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64) Caused by: java.lang.IllegalArgumentException: High watermark offset should be non-negative at kafka.cluster.Replica.highWatermark_$eq(Replica.scala:144) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:109) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:186) ... 13 more was (Author: srinivas.d...@gmail.com): The Offset Out Of Range Exception has been fixed but we hit the below when we were in the middle of our release. We got an exception while releasing the latest patch on one of the clusters. 2018-04-13 00:14:18,344 [ReplicaFetcherThread-2-348] ERROR (kafka.server.ReplicaFetcherThread) - [ReplicaFetcherThread-2-348]: Error due to kafka.common.KafkaException: error processing data for partition [topic1-0,0] offset 325231 at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:208) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:174) at scala.Option.foreach(Option.scala:257) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:174) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:171) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:171) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:171) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:171) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:169) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64) Caused by: java.lang.IllegalArgumentException: High watermark offset should be non-negative at kafka.cluster.Replica.highWatermark_$eq(Replica.scala:144) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:109) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42) at
[jira] [Comment Edited] (KAFKA-6649) ReplicaFetcher stopped after non fatal exception is thrown
[ https://issues.apache.org/jira/browse/KAFKA-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439951#comment-16439951 ] Srinivas Dhruvakumar edited comment on KAFKA-6649 at 4/16/18 7:51 PM: -- [~hachikuji] - Sorry for the miscommunication. We had an internal bug. I can confirm that the fix works and is no longer an issue. This bug is fixed as part of -KAFKA-3978-. patch was (Author: srinivas.d...@gmail.com): [~hachikuji] - Sorry for the miscommunication. We had an internal bug. I can confirm that the fix works and is no longer an issue. This bug is fixed as part of KAFKA-3978. > ReplicaFetcher stopped after non fatal exception is thrown > -- > > Key: KAFKA-6649 > URL: https://issues.apache.org/jira/browse/KAFKA-6649 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 1.0.0, 0.11.0.2, 1.1.0, 1.0.1 >Reporter: Julio Ng >Priority: Major > > We have seen several under-replication partitions, usually triggered by topic > creation. After digging in the logs, we see the below: > {noformat} > [2018-03-12 22:40:17,641] ERROR [ReplicaFetcher replicaId=12, leaderId=0, > fetcherId=1] Error due to (kafka.server.ReplicaFetcherThread) > kafka.common.KafkaException: Error processing data for partition > [[TOPIC_NAME_REMOVED]]-84 offset 2098535 > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:204) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:169) > at scala.Option.foreach(Option.scala:257) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:169) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:166) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:166) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:164) > at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) > Caused by: org.apache.kafka.common.errors.OffsetOutOfRangeException: Cannot > increment the log start offset to 2098535 of partition > [[TOPIC_NAME_REMOVED]]-84 since it is larger than the high watermark -1 > [2018-03-12 22:40:17,641] INFO [ReplicaFetcher replicaId=12, leaderId=0, > fetcherId=1] Stopped (kafka.server.ReplicaFetcherThread){noformat} > It looks like that after the ReplicaFetcherThread is stopped, the replicas > start to lag behind, presumably because we are not fetching from the leader > anymore. Further examining, the ShutdownableThread.scala object: > {noformat} > override def run(): Unit = { > info("Starting") > try { >while (isRunning) > doWork() > } catch { >case e: FatalExitError => > shutdownInitiated.countDown() > shutdownComplete.countDown() > info("Stopped") > Exit.exit(e.statusCode()) >case e: Throwable => > if (isRunning) >error("Error due to", e) > } finally { >shutdownComplete.countDown() > } > info("Stopped") > }{noformat} > For the Throwable (non-fatal) case, it just exits the while loop and the > thread stops doing work. I am not sure whether this is the intended behavior > of the ShutdownableThread, or the exception should be caught and we should > keep calling doWork() > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-6649) ReplicaFetcher stopped after non fatal exception is thrown
[ https://issues.apache.org/jira/browse/KAFKA-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416382#comment-16416382 ] Srinivas Dhruvakumar edited comment on KAFKA-6649 at 3/28/18 2:31 AM: -- I am trying out the patch "high watermark could be incorrectly set to -1" KAFKA-3978. But I am unable to reproduce the above scenario " : org.apache.kafka.common.errors.OffsetOutOfRangeException: Cannot increment the log start offset to 2098535 of partition [[TOPIC_NAME_REMOVED]]-84 since it is larger than the high watermark -1 " Does anyone know how to reproduce the above error ? was (Author: srinivas.d...@gmail.com): I am trying out the patch "high watermark could be incorrectly set to -1". But I am unable to reproduce the above scenario " : org.apache.kafka.common.errors.OffsetOutOfRangeException: Cannot increment the log start offset to 2098535 of partition [[TOPIC_NAME_REMOVED]]-84 since it is larger than the high watermark -1 " Does anyone know how to reproduce the above error ? > ReplicaFetcher stopped after non fatal exception is thrown > -- > > Key: KAFKA-6649 > URL: https://issues.apache.org/jira/browse/KAFKA-6649 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 1.0.0, 0.11.0.2, 1.1.0, 1.0.1 >Reporter: Julio Ng >Priority: Major > > We have seen several under-replication partitions, usually triggered by topic > creation. After digging in the logs, we see the below: > {noformat} > [2018-03-12 22:40:17,641] ERROR [ReplicaFetcher replicaId=12, leaderId=0, > fetcherId=1] Error due to (kafka.server.ReplicaFetcherThread) > kafka.common.KafkaException: Error processing data for partition > [[TOPIC_NAME_REMOVED]]-84 offset 2098535 > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:204) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:169) > at scala.Option.foreach(Option.scala:257) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:169) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:166) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:166) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:164) > at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) > Caused by: org.apache.kafka.common.errors.OffsetOutOfRangeException: Cannot > increment the log start offset to 2098535 of partition > [[TOPIC_NAME_REMOVED]]-84 since it is larger than the high watermark -1 > [2018-03-12 22:40:17,641] INFO [ReplicaFetcher replicaId=12, leaderId=0, > fetcherId=1] Stopped (kafka.server.ReplicaFetcherThread){noformat} > It looks like that after the ReplicaFetcherThread is stopped, the replicas > start to lag behind, presumably because we are not fetching from the leader > anymore. Further examining, the ShutdownableThread.scala object: > {noformat} > override def run(): Unit = { > info("Starting") > try { >while (isRunning) > doWork() > } catch { >case e: FatalExitError => > shutdownInitiated.countDown() > shutdownComplete.countDown() > info("Stopped") > Exit.exit(e.statusCode()) >case e: Throwable => > if (isRunning) >error("Error due to", e) > } finally { >shutdownComplete.countDown() > } > info("Stopped") > }{noformat} > For the Throwable (non-fatal) case, it just exits the while loop and the > thread stops doing work. I am not sure whether this is the intended behavior > of the ShutdownableThread, or the exception should be caught and we should > keep calling doWork() > -- This message was sent by Atlassian JIRA (v7.6.3#76005)