[jira] [Updated] (KAFKA-4384) ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted message
[ https://issues.apache.org/jira/browse/KAFKA-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-4384: --- Assignee: Jun He > ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted > message > > > Key: KAFKA-4384 > URL: https://issues.apache.org/jira/browse/KAFKA-4384 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0, 0.9.0.1, 0.10.0.0, 0.10.0.1 > Environment: Ubuntu 12.04, AWS D2 instance >Reporter: Jun He >Assignee: Jun He > Fix For: 0.10.1.1, 0.10.2.0 > > Attachments: KAFKA-4384.patch > > > We recently discovered an issue in Kafka 0.9.0.1 (), where > ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted > message. As the same logic exists also in Kafka 0.10.0.0 and 0.10.0.1, they > may have the similar issue. > Here are system logs related to this issue. > 2016-10-30 - 02:33:05,606 ERROR ReplicaFetcherThread-5-1590174474 > ReplicaFetcherThread.apply - Found invalid messages during fetch for > partition [logs,41] offset 39021512238 error Message is corrupt (stored crc = > 2028421553, computed crc = 3577227678) > 2016-10-30 - 02:33:06,582 ERROR ReplicaFetcherThread-5-1590174474 > ReplicaFetcherThread.error - [ReplicaFetcherThread-5-1590174474], Error due > to kafka.common.KafkaException: - error processing data for partition > [logs,41] offset 39021512301 Caused - by: java.lang.RuntimeException: Offset > mismatch: fetched offset = 39021512301, log end offset = 39021512238. > First, ReplicaFetcherThread got a corrupted message (offset 39021512238) due > to some blip. > Line > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L138 > threw exception > Then, Line > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L145 > caught it and logged this error. > Because > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L134 > updated the topic partition offset to the fetched latest one in > partitionMap. So ReplicaFetcherThread skipped the batch with corrupted > messages. > Based on > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L84, > the ReplicaFetcherThread then directly fetched the next batch of messages > (with offset 39021512301) > Next, ReplicaFetcherThread stopped because the log end offset (still > 39021512238) didn't match the fetched message (offset 39021512301). > A quick fix is to move line 134 to be after line 138. > Would be great to have your comments and please let me know if a Jira issue > is needed. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4384) ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted message
[ https://issues.apache.org/jira/browse/KAFKA-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-4384: --- Resolution: Fixed Fix Version/s: 0.10.2.0 Status: Resolved (was: Patch Available) Issue resolved by pull request 2127 [https://github.com/apache/kafka/pull/2127] > ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted > message > > > Key: KAFKA-4384 > URL: https://issues.apache.org/jira/browse/KAFKA-4384 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0, 0.9.0.1, 0.10.0.0, 0.10.0.1 > Environment: Ubuntu 12.04, AWS D2 instance >Reporter: Jun He > Fix For: 0.10.2.0, 0.10.1.1 > > Attachments: KAFKA-4384.patch > > > We recently discovered an issue in Kafka 0.9.0.1 (), where > ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted > message. As the same logic exists also in Kafka 0.10.0.0 and 0.10.0.1, they > may have the similar issue. > Here are system logs related to this issue. > 2016-10-30 - 02:33:05,606 ERROR ReplicaFetcherThread-5-1590174474 > ReplicaFetcherThread.apply - Found invalid messages during fetch for > partition [logs,41] offset 39021512238 error Message is corrupt (stored crc = > 2028421553, computed crc = 3577227678) > 2016-10-30 - 02:33:06,582 ERROR ReplicaFetcherThread-5-1590174474 > ReplicaFetcherThread.error - [ReplicaFetcherThread-5-1590174474], Error due > to kafka.common.KafkaException: - error processing data for partition > [logs,41] offset 39021512301 Caused - by: java.lang.RuntimeException: Offset > mismatch: fetched offset = 39021512301, log end offset = 39021512238. > First, ReplicaFetcherThread got a corrupted message (offset 39021512238) due > to some blip. > Line > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L138 > threw exception > Then, Line > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L145 > caught it and logged this error. > Because > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L134 > updated the topic partition offset to the fetched latest one in > partitionMap. So ReplicaFetcherThread skipped the batch with corrupted > messages. > Based on > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L84, > the ReplicaFetcherThread then directly fetched the next batch of messages > (with offset 39021512301) > Next, ReplicaFetcherThread stopped because the log end offset (still > 39021512238) didn't match the fetched message (offset 39021512301). > A quick fix is to move line 134 to be after line 138. > Would be great to have your comments and please let me know if a Jira issue > is needed. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4384) ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted message
[ https://issues.apache.org/jira/browse/KAFKA-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun He updated KAFKA-4384: -- Reviewer: Jiangjie Qin Affects Version/s: 0.10.1.0 Status: Patch Available (was: Open) Patch for KAFKA-4384 (ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted message) > ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted > message > > > Key: KAFKA-4384 > URL: https://issues.apache.org/jira/browse/KAFKA-4384 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.0.1, 0.10.0.0, 0.9.0.1, 0.10.1.0 > Environment: Ubuntu 12.04, AWS D2 instance >Reporter: Jun He > Fix For: 0.10.1.1 > > Attachments: KAFKA-4384.patch > > > We recently discovered an issue in Kafka 0.9.0.1 (), where > ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted > message. As the same logic exists also in Kafka 0.10.0.0 and 0.10.0.1, they > may have the similar issue. > Here are system logs related to this issue. > 2016-10-30 - 02:33:05,606 ERROR ReplicaFetcherThread-5-1590174474 > ReplicaFetcherThread.apply - Found invalid messages during fetch for > partition [logs,41] offset 39021512238 error Message is corrupt (stored crc = > 2028421553, computed crc = 3577227678) > 2016-10-30 - 02:33:06,582 ERROR ReplicaFetcherThread-5-1590174474 > ReplicaFetcherThread.error - [ReplicaFetcherThread-5-1590174474], Error due > to kafka.common.KafkaException: - error processing data for partition > [logs,41] offset 39021512301 Caused - by: java.lang.RuntimeException: Offset > mismatch: fetched offset = 39021512301, log end offset = 39021512238. > First, ReplicaFetcherThread got a corrupted message (offset 39021512238) due > to some blip. > Line > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L138 > threw exception > Then, Line > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L145 > caught it and logged this error. > Because > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L134 > updated the topic partition offset to the fetched latest one in > partitionMap. So ReplicaFetcherThread skipped the batch with corrupted > messages. > Based on > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L84, > the ReplicaFetcherThread then directly fetched the next batch of messages > (with offset 39021512301) > Next, ReplicaFetcherThread stopped because the log end offset (still > 39021512238) didn't match the fetched message (offset 39021512301). > A quick fix is to move line 134 to be after line 138. > Would be great to have your comments and please let me know if a Jira issue > is needed. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4384) ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted message
[ https://issues.apache.org/jira/browse/KAFKA-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun He updated KAFKA-4384: -- Attachment: KAFKA-4384.patch Patch for KAFKA-4384 (ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted message) > ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted > message > > > Key: KAFKA-4384 > URL: https://issues.apache.org/jira/browse/KAFKA-4384 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.0.1 > Environment: Ubuntu 12.04, AWS D2 instance >Reporter: Jun He > Fix For: 0.10.1.1 > > Attachments: KAFKA-4384.patch > > > We recently discovered an issue in Kafka 0.9.0.1 (), where > ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted > message. As the same logic exists also in Kafka 0.10.0.0 and 0.10.0.1, they > may have the similar issue. > Here are system logs related to this issue. > 2016-10-30 - 02:33:05,606 ERROR ReplicaFetcherThread-5-1590174474 > ReplicaFetcherThread.apply - Found invalid messages during fetch for > partition [logs,41] offset 39021512238 error Message is corrupt (stored crc = > 2028421553, computed crc = 3577227678) > 2016-10-30 - 02:33:06,582 ERROR ReplicaFetcherThread-5-1590174474 > ReplicaFetcherThread.error - [ReplicaFetcherThread-5-1590174474], Error due > to kafka.common.KafkaException: - error processing data for partition > [logs,41] offset 39021512301 Caused - by: java.lang.RuntimeException: Offset > mismatch: fetched offset = 39021512301, log end offset = 39021512238. > First, ReplicaFetcherThread got a corrupted message (offset 39021512238) due > to some blip. > Line > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L138 > threw exception > Then, Line > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L145 > caught it and logged this error. > Because > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L134 > updated the topic partition offset to the fetched latest one in > partitionMap. So ReplicaFetcherThread skipped the batch with corrupted > messages. > Based on > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L84, > the ReplicaFetcherThread then directly fetched the next batch of messages > (with offset 39021512301) > Next, ReplicaFetcherThread stopped because the log end offset (still > 39021512238) didn't match the fetched message (offset 39021512301). > A quick fix is to move line 134 to be after line 138. > Would be great to have your comments and please let me know if a Jira issue > is needed. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4384) ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted message
[ https://issues.apache.org/jira/browse/KAFKA-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-4384: --- Fix Version/s: 0.10.1.1 > ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted > message > > > Key: KAFKA-4384 > URL: https://issues.apache.org/jira/browse/KAFKA-4384 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.0.1 > Environment: Ubuntu 12.04, AWS D2 instance >Reporter: Jun He > Fix For: 0.10.1.1 > > > We recently discovered an issue in Kafka 0.9.0.1 (), where > ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted > message. As the same logic exists also in Kafka 0.10.0.0 and 0.10.0.1, they > may have the similar issue. > Here are system logs related to this issue. > 2016-10-30 - 02:33:05,606 ERROR ReplicaFetcherThread-5-1590174474 > ReplicaFetcherThread.apply - Found invalid messages during fetch for > partition [logs,41] offset 39021512238 error Message is corrupt (stored crc = > 2028421553, computed crc = 3577227678) > 2016-10-30 - 02:33:06,582 ERROR ReplicaFetcherThread-5-1590174474 > ReplicaFetcherThread.error - [ReplicaFetcherThread-5-1590174474], Error due > to kafka.common.KafkaException: - error processing data for partition > [logs,41] offset 39021512301 Caused - by: java.lang.RuntimeException: Offset > mismatch: fetched offset = 39021512301, log end offset = 39021512238. > First, ReplicaFetcherThread got a corrupted message (offset 39021512238) due > to some blip. > Line > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L138 > threw exception > Then, Line > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L145 > caught it and logged this error. > Because > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L134 > updated the topic partition offset to the fetched latest one in > partitionMap. So ReplicaFetcherThread skipped the batch with corrupted > messages. > Based on > https://github.com/apache/kafka/blob/0.9.0.1/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L84, > the ReplicaFetcherThread then directly fetched the next batch of messages > (with offset 39021512301) > Next, ReplicaFetcherThread stopped because the log end offset (still > 39021512238) didn't match the fetched message (offset 39021512301). > A quick fix is to move line 134 to be after line 138. > Would be great to have your comments and please let me know if a Jira issue > is needed. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)