[jira] [Commented] (KAFKA-8724) log cleaner thread dies when attempting to clean a __consumer_offsets partition after upgrade from 2.0->2.3
[ https://issues.apache.org/jira/browse/KAFKA-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923578#comment-16923578 ] ASF GitHub Bot commented on KAFKA-8724: --- hachikuji commented on pull request #7264: KAFKA-8724; Improve range checking when computing cleanable partitions URL: https://github.com/apache/kafka/pull/7264 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > log cleaner thread dies when attempting to clean a __consumer_offsets > partition after upgrade from 2.0->2.3 > --- > > Key: KAFKA-8724 > URL: https://issues.apache.org/jira/browse/KAFKA-8724 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0 > Environment: Linux 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 > 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Keith So >Assignee: Jason Gustafson >Priority: Critical > Fix For: 2.3.1 > > Attachments: KAFKA-308-stack-trace.txt > > > We are attempting an upgrade from Kafka 2.0 to 2.3 on a single cluster setup. > We have a mixture of Java/C++ and Python clients (Python clients are using > kafka-python libraries). > After the upgrade, the log cleaner occasionally dies with the attached stack > trace. Using timestamp correlation, we pinned it down to the cleaning of a > __consumer_offsets partition. The config logged at initialization shows that > inter.broker.protocol.version = 2.3-IV1 > log.message.format.version = 2.3-IV1 > We initially thought this was to do with unclean upgrade from 2.0 to 2.3, but > after resetting the consumer offsets topic (via > [https://medium.com/@nblaye/reset-consumer-offsets-topic-in-kafka-with-zookeeper-5910213284a2]) > this still recurs on initially empty consumer offset partitions. > At the moment we are working around by toggling log.cleaner.threads option > using dynamic broker configuration to restore the log cleaner threads -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (KAFKA-8724) log cleaner thread dies when attempting to clean a __consumer_offsets partition after upgrade from 2.0->2.3
[ https://issues.apache.org/jira/browse/KAFKA-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917940#comment-16917940 ] ASF GitHub Bot commented on KAFKA-8724: --- hachikuji commented on pull request #7264: KAFKA-8724; Improve range checking when computing cleanable partitions URL: https://github.com/apache/kafka/pull/7264 This patch contains a few improvements on the offset range handling when computing the cleanable range of offsets. 1. It adds bounds checking to ensure the dirty offset cannot be larger than the log end offset. If it is, we reset to the log start offset. 2. It adds a method to get the non-active segments in the log while holding the lock. This ensures that a truncation cannot lead to an invalid segment range. 3. It improves exception messages in the case that an inconsistent segment range is provided so that we have more information to find the root cause. The patch also fixes a few problems in `LogCleanerManagerTest` due to unintended reuse of the underlying log directory. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > log cleaner thread dies when attempting to clean a __consumer_offsets > partition after upgrade from 2.0->2.3 > --- > > Key: KAFKA-8724 > URL: https://issues.apache.org/jira/browse/KAFKA-8724 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0 > Environment: Linux 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 > 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Keith So >Assignee: Jason Gustafson >Priority: Critical > Fix For: 2.3.1 > > Attachments: KAFKA-308-stack-trace.txt > > > We are attempting an upgrade from Kafka 2.0 to 2.3 on a single cluster setup. > We have a mixture of Java/C++ and Python clients (Python clients are using > kafka-python libraries). > After the upgrade, the log cleaner occasionally dies with the attached stack > trace. Using timestamp correlation, we pinned it down to the cleaning of a > __consumer_offsets partition. The config logged at initialization shows that > inter.broker.protocol.version = 2.3-IV1 > log.message.format.version = 2.3-IV1 > We initially thought this was to do with unclean upgrade from 2.0 to 2.3, but > after resetting the consumer offsets topic (via > [https://medium.com/@nblaye/reset-consumer-offsets-topic-in-kafka-with-zookeeper-5910213284a2]) > this still recurs on initially empty consumer offset partitions. > At the moment we are working around by toggling log.cleaner.threads option > using dynamic broker configuration to restore the log cleaner threads -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (KAFKA-8724) log cleaner thread dies when attempting to clean a __consumer_offsets partition after upgrade from 2.0->2.3
[ https://issues.apache.org/jira/browse/KAFKA-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917085#comment-16917085 ] Jason Gustafson commented on KAFKA-8724: There is a race condition here, but I'm not sure I can explain how it could be regularly hit. When the log cleaner attempts to collect the non-active segments, it calls `{{log.logSegments(firstDirtyOffset, log.activeSegment.baseOffset)}}`. Since it's not holding the log lock, a log roll might invalidate the expectation that the active segment base offset is larger than the dirty offset. I'm guessing the fact that the log directory was wiped is also playing a factor here breaking normal offset assumptions. For example, the checkpointed dirty offset may have gotten well ahead of an empty log. It wouldn't surprise me to find some unprotected cases when that happens. I will submit a patch to fix the known race condition and improve error logging a little bit so we'll have more to go on if we miss a case. > log cleaner thread dies when attempting to clean a __consumer_offsets > partition after upgrade from 2.0->2.3 > --- > > Key: KAFKA-8724 > URL: https://issues.apache.org/jira/browse/KAFKA-8724 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0 > Environment: Linux 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 > 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Keith So >Priority: Critical > Fix For: 2.3.1 > > Attachments: KAFKA-308-stack-trace.txt > > > We are attempting an upgrade from Kafka 2.0 to 2.3 on a single cluster setup. > We have a mixture of Java/C++ and Python clients (Python clients are using > kafka-python libraries). > After the upgrade, the log cleaner occasionally dies with the attached stack > trace. Using timestamp correlation, we pinned it down to the cleaning of a > __consumer_offsets partition. The config logged at initialization shows that > inter.broker.protocol.version = 2.3-IV1 > log.message.format.version = 2.3-IV1 > We initially thought this was to do with unclean upgrade from 2.0 to 2.3, but > after resetting the consumer offsets topic (via > [https://medium.com/@nblaye/reset-consumer-offsets-topic-in-kafka-with-zookeeper-5910213284a2]) > this still recurs on initially empty consumer offset partitions. > At the moment we are working around by toggling log.cleaner.threads option > using dynamic broker configuration to restore the log cleaner threads -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (KAFKA-8724) log cleaner thread dies when attempting to clean a __consumer_offsets partition after upgrade from 2.0->2.3
[ https://issues.apache.org/jira/browse/KAFKA-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896827#comment-16896827 ] Ismael Juma commented on KAFKA-8724: cc [~hachikuji] [~mumrah] > log cleaner thread dies when attempting to clean a __consumer_offsets > partition after upgrade from 2.0->2.3 > --- > > Key: KAFKA-8724 > URL: https://issues.apache.org/jira/browse/KAFKA-8724 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0 > Environment: Linux 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 > 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Keith So >Priority: Critical > Fix For: 2.3.1 > > Attachments: KAFKA-308-stack-trace.txt > > > We are attempting an upgrade from Kafka 2.0 to 2.3 on a single cluster setup. > We have a mixture of Java/C++ and Python clients (Python clients are using > kafka-python libraries). > After the upgrade, the log cleaner occasionally dies with the attached stack > trace. Using timestamp correlation, we pinned it down to the cleaning of a > __consumer_offsets partition. The config logged at initialization shows that > inter.broker.protocol.version = 2.3-IV1 > log.message.format.version = 2.3-IV1 > We initially thought this was to do with unclean upgrade from 2.0 to 2.3, but > after resetting the consumer offsets topic (via > [https://medium.com/@nblaye/reset-consumer-offsets-topic-in-kafka-with-zookeeper-5910213284a2]) > this still recurs on initially empty consumer offset partitions. > At the moment we are working around by toggling log.cleaner.threads option > using dynamic broker configuration to restore the log cleaner threads -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8724) log cleaner thread dies when attempting to clean a __consumer_offsets partition after upgrade from 2.0->2.3
[ https://issues.apache.org/jira/browse/KAFKA-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895679#comment-16895679 ] Keith So commented on KAFKA-8724: - This is the config we have for __consumer_offsets, if there is a quick workaround via config we'd much appreciate it. {{$ kafka-topics --bootstrap-server localhost:9091 --describe --topic __consumer_offsets}} {{Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:1 Configs:compression.type=producer,cleanup.policy=compact,segment.bytes=104857600,retention.ms=1000,message.timestamp.type=LogAppendTime,delete.retention.ms=1000,segment.ms=6}} > log cleaner thread dies when attempting to clean a __consumer_offsets > partition after upgrade from 2.0->2.3 > --- > > Key: KAFKA-8724 > URL: https://issues.apache.org/jira/browse/KAFKA-8724 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0 > Environment: Linux 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 > 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Keith So >Priority: Critical > Fix For: 2.3.1 > > Attachments: KAFKA-308-stack-trace.txt > > > We are attempting an upgrade from Kafka 2.0 to 2.3 on a single cluster setup. > We have a mixture of Java/C++ and Python clients (Python clients are using > kafka-python libraries). > After the upgrade, the log cleaner occasionally dies with the attached stack > trace. Using timestamp correlation, we pinned it down to the cleaning of a > __consumer_offsets partition. The config logged at initialization shows that > inter.broker.protocol.version = 2.3-IV1 > log.message.format.version = 2.3-IV1 > We initially thought this was to do with unclean upgrade from 2.0 to 2.3, but > after resetting the consumer offsets topic (via > [https://medium.com/@nblaye/reset-consumer-offsets-topic-in-kafka-with-zookeeper-5910213284a2]) > this still recurs on initially empty consumer offset partitions. > At the moment we are working around by toggling log.cleaner.threads option > using dynamic broker configuration to restore the log cleaner threads -- This message was sent by Atlassian JIRA (v7.6.14#76016)