[jira] [Created] (KAFKA-5142) KIP-145 - Expose Record Headers in Kafka Connect
Michael Andre Pearce (IG) created KAFKA-5142: Summary: KIP-145 - Expose Record Headers in Kafka Connect Key: KAFKA-5142 URL: https://issues.apache.org/jira/browse/KAFKA-5142 Project: Kafka Issue Type: New Feature Components: clients Reporter: Michael Andre Pearce (IG) https://cwiki.apache.org/confluence/display/KAFKA/KIP-145+-+Expose+Record+Headers+in+Kafka+Connect As KIP-82 introduced Headers into the core Kafka Product, it would be advantageous to expose them in the Kafka Connect Framework. Connectors that replicate data between Kafka cluster or between other messaging products and Kafka would want to replicate the headers. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804713#comment-15804713 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 3:00 PM: -- [~ijuma], [~becket_qin], [~junrao] - thanks and sorry again for your time, effort and patience. was (Author: michael.andre.pearce): [~ijuma][~becket_qin][~junrao] - thanks and sorry again for your time, effort and patience. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.sc
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804713#comment-15804713 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- [~ijuma][~becket_qin][~junrao] - thanks and sorry again for your time, effort and patience. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) > at kafka.log.LogSegment.append(LogSegment.scala:106) > at kafka.log.Cleaner.cleanInto(LogCl
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804705#comment-15804705 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- [~ijuma] Yes we all work together. Re what tripped us up, re the un-versioned kafka.jar in the confluent package is there a reason this is done? (I know this is Apache but worth the ask) We are happy for closing this. We cannot see or repeat the issue on 0.10.1.1 > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) >
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804660#comment-15804660 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 2:46 PM: -- [~junrao] Sorry about this, it does indeed seem to be our patching that caused the issue, now we have removed the jar in question all seems to be good. We have been testing all morning. We have not seen the issue exist again. To explain what occurred - Our patching of the confluent package was done by replacing all jars. download latest from apache site. (2.11 scala version) expand and copy all jars from: ./kafka_2.11-0.10.1.1/libs into confluent-3.1.0/share/java/kafka then remove all files in confluent-3.1.0/share/java/kafka with *-0.10.1.0-cp1*.jar What has caught us out was that in the confluent bundle the jar "kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any versioning, unlike all the other Apache jars, which are versioned and are appended with -CP so on doing the above we ended up with both kafka_2.11-0.10.1.1.jar and kafka.jar as the kafka.jar was not removed. We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned jars on the class path, and like wise we did see 0.10.1.1 versions for all the expected jars. Whilst this is obviously the apache jira, we're aware that confluent bundle is by yourselves, is there a reason why this is not versioned like all the other jars and as per the apache release in the confluent bundle? This obviously has tripped us up. was (Author: michael.andre.pearce): [~junrao] Sorry about this, it does indeed seem to be our patching that caused the issue, now we have removed the jar in question all seems to be good. We have been testing all morning. We have not seen the issue exist again. To explain what occurred - Our patching of the confluent package was done by replacing all jars. download latest from apache site. (2.11 scala version) expand and copy all jars from: ./kafka_2.11.-0.10.1.1/libs into confluent-3.1.0/share/java/kafka then remove all files in confluent-3.1.0/share/java/kafka with *-0.10.1.0-cp1*.jar What has caught us out was that in the confluent bundle the jar "kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any versioning, unlike all the other Apache jars, which are versioned and are appended with -CP so on doing the above we ended up with both kafka_2.11-0.10.1.1.jar and kafka.jar as the kafka.jar was not removed. We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned jars on the class path, and like wise we did see 0.10.1.1 versions for all the expected jars. Whilst this is obviously the apache jira, we're aware that confluent bundle is by yourselves, is there a reason why this is not versioned like all the other jars and as per the apache release in the confluent bundle? This obviously has tripped us up. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cl
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804660#comment-15804660 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 2:45 PM: -- [~junrao] Sorry about this, it does indeed seem to be our patching that caused the issue, now we have removed the jar in question all seems to be good. We have been testing all morning. We have not seen the issue exist again. To explain what occurred - Our patching of the confluent package was done by replacing all jars. download latest from apache site. (2.11 scala version) expand and copy all jars from: ./kafka_2.11.-0.10.1.1/libs into confluent-3.1.0/share/java/kafka then remove all files in confluent-3.1.0/share/java/kafka with *-0.10.1.0-cp1*.jar What has caught us out was that in the confluent bundle the jar "kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any versioning, unlike all the other Apache jars, which are versioned and are appended with -CP so on doing the above we ended up with both kafka_2.11-0.10.1.1.jar and kafka.jar as the kafka.jar was not removed. We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned jars on the class path, and like wise we did see 0.10.1.1 versions for all the expected jars. Whilst this is obviously the apache jira, we're aware that confluent bundle is by yourselves, is there a reason why this is not versioned like all the other jars and as per the apache release in the confluent bundle? This obviously has tripped us up. was (Author: michael.andre.pearce): [~junrao] Sorry about this, it does indeed seem to be our patching that caused the issue, now we have removed the jar in question all seems to be good. We have been testing all morning. We have not seen the issue exist again. To explain what occurred - Our patching of the confluent package was done by replacing all jars. download latest from apache site. (2.11 scala version) expand and copy all jars from: ./kafka_2.11.-0.10.1.1/libs into confluent-3.1-2.1/share/java/kafka then remove all files in confluent-3.1-2.1/share/java/kafka with *-0.10.1.0-cp2*.jar What has caught us out was that in the confluent bundle the jar "kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any versioning, unlike all the other Apache jars, which are versioned and are appended with -CP so on doing the above we ended up with both kafka_2.11-0.10.1.1.jar and kafka.jar as the kafka.jar was not removed. We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned jars on the class path, and like wise we did see 0.10.1.1 versions for all the expected jars. Whilst this is obviously the apache jira, we're aware that confluent bundle is by yourselves, is there a reason why this is not versioned like all the other jars and as per the apache release in the confluent bundle? This obviously has tripped us up. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from t
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804660#comment-15804660 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- [~junrao] Sorry about this, it does indeed seem to be our patching that caused the issue, now we have removed the jar in question all seems to be good. We have been testing all morning. We have not seen the issue exist again. To explain what occurred - Our patching of the confluent package was done by replacing all jars. download latest from apache site. (2.11 scala version) expand and copy all jars from: ./kafka_2.11.-0.10.1.1/libs into confluent-3.1-2.1/share/java/kafka then remove all files in confluent-3.1-2.1/share/java/kafka with *-0.10.1.0-cp2*.jar What has caught us out was that in the confluent bundle the jar "kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any versioning, unlike all the other Apache jars, which are versioned and are appended with -CP so on doing the above we ended up with both kafka_2.11-0.10.1.1.jar and kafka.jar as the kafka.jar was not removed. We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned jars on the class path, and like wise we did see 0.10.1.1 versions for all the expected jars. Whilst this is obviously the apache jira, we're aware that confluent bundle is by yourselves, is there a reason why this is not versioned like all the other jars and as per the apache release in the confluent bundle? This obviously has tripped us up. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogClea
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802030#comment-15802030 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:35 AM: -- When we got the issue on 0.10.1.1 we were just as surprised and as such that was our very first thing we checked. our internal mail at the time when this occured in UAT (first env we saw it post 0.10.1.1 upgrade): > Date: Tuesday, 3 January 2017 at 17:07 Subject: Re: Kafka Issues fixed in 0.10.1.1 how we can patch until confluent release. Yeah I already double checked that, all the cp1 jars are gone and the 0.10.1.1’s exist, pasted below, hope ive missed something..? Class path: :/opt/projects/confluent/bin/../share/java/kafka/jetty-continuation-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/aopalliance-repackaged-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-file-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/argparse4j-0.5.0.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-json-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/avro-1.7.7.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-codec-1.9.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-http-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-collections-3.2.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-compress-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-digester-1.8.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-lang3-3.1.jar:/opt/projects/confluent/bin/../share/java/kafka/log4j-1.2.17.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-logging-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/lz4-1.3.0.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-validator-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/metrics-core-2.2.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-io-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jopt-simple-4.9.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-runtime-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/guava-18.0.jar:/opt/projects/confluent/bin/../share/java/kafka/osgi-resource-locator-1.0.1.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-api-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/paranamer-2.3.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-locator-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/reflections-0.9.10.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-utils-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-clients-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpclient-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpcore-4.4.3.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-tools-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpmime-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-annotations-2.6.0.jar:/opt/projects/confluent/bin/../share/java/kafka/rocksdbjni-4.9.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/scala-library-2.11.8.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-api-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-databind-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-log4j12-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-base-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-media-jaxb-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/opt/projects/confluent/bin/../share/java/kafka/javassist-3.18.2-GA.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-client-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-1.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-common-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-2.4.0-b34.jar:/opt/projects/confluent/bin/../s
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802030#comment-15802030 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:33 AM: -- When we got the issue on 0.10.1.1 we were just as surprised and as such that was our very first thing we checked. our internal mail at the time when this occured in UAT (first env we saw it post 0.10.1.1 upgrade): > From: Andrew Holford Date: Tuesday, 3 January 2017 at 17:07 Subject: Re: Kafka Issues fixed in 0.10.1.1 how we can patch until confluent release. Yeah I already double checked that, all the cp1 jars are gone and the 0.10.1.1’s exist, pasted below, hope ive missed something..? Class path: :/opt/projects/confluent/bin/../share/java/kafka/jetty-continuation-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/aopalliance-repackaged-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-file-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/argparse4j-0.5.0.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-json-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/avro-1.7.7.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-codec-1.9.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-http-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-collections-3.2.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-compress-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-digester-1.8.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-lang3-3.1.jar:/opt/projects/confluent/bin/../share/java/kafka/log4j-1.2.17.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-logging-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/lz4-1.3.0.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-validator-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/metrics-core-2.2.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-io-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jopt-simple-4.9.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-runtime-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/guava-18.0.jar:/opt/projects/confluent/bin/../share/java/kafka/osgi-resource-locator-1.0.1.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-api-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/paranamer-2.3.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-locator-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/reflections-0.9.10.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-utils-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-clients-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpclient-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpcore-4.4.3.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-tools-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpmime-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-annotations-2.6.0.jar:/opt/projects/confluent/bin/../share/java/kafka/rocksdbjni-4.9.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/scala-library-2.11.8.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-api-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-databind-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-log4j12-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-base-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-media-jaxb-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/opt/projects/confluent/bin/../share/java/kafka/javassist-3.18.2-GA.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-client-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-1.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-common-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-2.4.0-b34.jar:/opt/proje
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803984#comment-15803984 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:31 AM: -- [~junrao] As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, just as anyway we will deploy 3.1.2 when its released (we assume you're ensuring all bugs really are fixed before releasing) to be sure, is there an SNAPSHOT build / package of the 3.1.2 with kafka 0.10.1.1 you have this way can avoid any possible bad patching or discrepancies by ourselves. was (Author: michael.andre.pearce): As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, just as anyway we will deploy 3.1.2 when its released (we assume you're ensuring all bugs really are fixed before releasing) to be sure, is there an SNAPSHOT build / package of the 3.1.2 with kafka 0.10.1.1 you have this way can avoid any possible bad patching or discrepancies by ourselves. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogClean
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803984#comment-15803984 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:30 AM: -- As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, just as anyway we will deploy 3.1.2 when its released (we assume you're ensuring all bugs really are fixed before releasing) to be sure, is there an SNAPSHOT build / package of the 3.1.2 with kafka 0.10.1.1 you have this way can avoid any possible bad patching or discrepancies by ourselves. was (Author: michael.andre.pearce): As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, just as anyway we will deploy 3.1.2 when its released (we assume you're ensuring all bugs really are fixed before releasing) to be sure, is there an SNAPSHOT build / package of the 3.1.2 you guys have this way can avoid any possible bad patching or discrepancies by ourselves. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.Invali
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803984#comment-15803984 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, just as anyway we will deploy 3.1.2 when its released (we assume you're ensuring all bugs really are fixed before releasing) to be sure, is there an SNAPSHOT build / package of the 3.1.2 you guys have this way can avoid any possible bad patching or discrepancies by ourselves. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeInd
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803919#comment-15803919 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:17 AM: -- we have one discrepancy found with an old unversioned jar on cp, going to remove it and retest, if you hold of doing any further investigations we will let you know once we have re-deployed and checked. This indeed might be our bad. was (Author: michael.andre.pearce): we have one discrepancy found, going to remove it and retest > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$an
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803829#comment-15803829 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:03 AM: -- [~junrao] We are fairly confident jars are good based on process cp, see above we double checked again the process class path on getting the issue last night, post reducing min.cleanable.dirty.ratio to make the cleaner kick in. If you note all jars are 0.10.1.1 that are kafka. We can offer to deploy a custom built jar, which can contain further logging/debug in for you if you wish (as we have this in a testing env) or we can offer to even host a webex or something so you can look at the system your self. was (Author: michael.andre.pearce): [~junrao] We are 99.9% jars are good, see above we double checked again the process class path on getting the issue last night, post reducing min.cleanable.dirty.ratio to make the cleaner kick in. If you note all jars are 0.10.1.1 that are kafka. We can offer to deploy a custom built jar, which can contain further logging/debug in for you if you wish (as we have this in a testing env) or we can offer to even host a webex or something so you can look at the system your self. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803919#comment-15803919 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- we have one discrepancy found, going to remove it and retest > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) > at kafka.log.LogSegment.append(LogSegment.scala:106) > at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518) > at > k
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803893#comment-15803893 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- we are going to md5 checksum check every file this morning. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) > at kafka.log.LogSegment.append(LogSegment.scala:106) > at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518) > at > ka
[jira] [Issue Comment Deleted] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Andre Pearce (IG) updated KAFKA-4497: - Comment: was deleted (was: [~junrao] We are 99.9% jars are good, see above we double checked again the process class path on getting the issue last night, post reducing min.cleanable.dirty.ratio to make the cleaner kick in. If you note all jars are 0.10.1.1 that are kafka. We can offer to deploy a custom built jar, which can contain further logging/debug in for you if you wish (as we have this in a testing env) or we can offer to even host a webex or something so you can look at the system your self. ) > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonf
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803829#comment-15803829 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- [~junrao] We are 99.9% jars are good, see above we double checked again the process class path on getting the issue last night, post reducing min.cleanable.dirty.ratio to make the cleaner kick in. If you note all jars are 0.10.1.1 that are kafka. We can offer to deploy a custom built jar, which can contain further logging/debug in for you if you wish (as we have this in a testing env) or we can offer to even host a webex or something so you can look at the system your self. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > k
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803830#comment-15803830 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- [~junrao] We are 99.9% jars are good, see above we double checked again the process class path on getting the issue last night, post reducing min.cleanable.dirty.ratio to make the cleaner kick in. If you note all jars are 0.10.1.1 that are kafka. We can offer to deploy a custom built jar, which can contain further logging/debug in for you if you wish (as we have this in a testing env) or we can offer to even host a webex or something so you can look at the system your self. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > k
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802412#comment-15802412 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- confirming class path via the process: conflue+ 14435 14401 1 13:21 ?00:05:53 /opt/projects/java/sun-jdk8-1.8.0.66/jre/bin/java -cp :/opt/projects/confluent/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-client-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/argparse4j-0.5.0.jar:/opt/projects/confluent/bin/../share/java/kafka/avro-1.7.7.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-codec-1.9.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-common-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-collections-3.2.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-digester-1.8.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-lang3-3.1.jar:/opt/projects/confluent/bin/../share/java/kafka/lz4-1.3.0.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test.jar:/opt/projects/confluent/bin/../share/java/kafka/xz-1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-api-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-runtime-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-log4j-appender-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/opt/projects/confluent/bin/../share/java/kafka/zkclient-0.9.jar:/opt/projects/confluent/bin/../share/java/kafka/httpclient-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-scaladoc.jar:/opt/projects/confluent/bin/../share/java/kafka/httpcore-4.4.3.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-file-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-clients-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-examples-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-sources.jar:/opt/projects/confluent/bin/../share/java/kafka/zookeeper-3.4.8.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/httpmime-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-tools-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-json-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-1.jar:/opt/projects/confluent/bin/../share/java/kafka/rocksdbjni-4.9.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.servlet-api-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-logging-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.ws.rs-api-2.0.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-validator-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-javadoc.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test-sources.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-locator-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-utils-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-databind-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-base-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/javassist-3.18.2-GA.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/reflections-0.9.10.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-client-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-common-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jopt-simple-4.9.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-container-servlet-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-container-servlet-core-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-guava-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-media-jaxb-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/scala-library-2.11.8.jar:/opt/projects/confluent/bin/../share/java/kafka/log4j-1.2.17.jar:/opt/projects/confluent/bin/../share/java/kafka/metr
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802394#comment-15802394 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- [~junrao] [~becket_qin] anything else you guys need? As its just happened and more chance of capturing anything extra. [2017-01-05 19:37:07,144] ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 20 no larger than the last offset appended (50346) to /v ar/kafka/logs/com_ig_trade_v1_position_event--demo--compacted-14/.timeindex.cleaned. at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) at kafka.log.LogSegment.append(LogSegment.scala:106) at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518) at kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404) at kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400) at scala.collection.immutable.List.foreach(List.scala:381) at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:400) at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:364) at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363) at scala.collection.immutable.List.foreach(List.scala:381) at kafka.log.Cleaner.clean(LogCleaner.scala:363) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:239) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:218) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) [2017-01-05 19:37:07,144] INFO [kafka-log-cleaner-thread-0], Stopped (kafka.log.LogCleaner) > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,9
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802390#comment-15802390 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- log4j log-cleaner.log with TRACE, https://issues.apache.org/jira/secure/attachment/12845849/log-cleaner.log.1.zip > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) > at kafka.log.LogSegment.append(LogSegment.scala:106) > at kafka.log.Cl
[jira] [Updated] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Andre Pearce (IG) updated KAFKA-4497: - Attachment: log-cleaner.log.1.zip logcleaner log for error on 0.10.1.1 @2017-01-05 19:37:07,144 > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) > at kafka.log.LogSegment.append(LogSegment.scala:106) > at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518) > at > kafka.log.Cleaner$$
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802365#comment-15802365 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- https://issues.apache.org/jira/secure/attachment/12845847/com_ig_trade_v1_position_event--demo--compacted-14.tar.gz > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) > at kafka.log.LogSegment.append(LogSegment.scala:106) > at kafka.log.Cleaner.cleanInto(LogCleaner
[jira] [Updated] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Andre Pearce (IG) updated KAFKA-4497: - Attachment: com_ig_trade_v1_position_event--demo--compacted-14.tar.gz partiton/segment logs for issue on 0.10.1.1 - 2017-01-05 19:37:07,144 > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) > at kafka.log.LogSegment.append(LogSegment.scala:106) > at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518) > at > k
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802326#comment-15802326 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- [2017-01-05 19:37:07,137] TRACE Inserting 88493 bytes at offset 49647 at position 4393661 with largest timestamp 1480518430821 at offset 49859 (kafka.log.LogSegment) [2017-01-05 19:37:07,138] INFO Cleaner 0: Cleaning segment 49860 in log com_ig_trade_v1_position_event--demo--compacted-14 (largest timestamp Wed Dec 07 15:06:26 GMT 2016) into 0, retaining deletes. (kafka.log.LogCleaner) [2017-01-05 19:37:07,138] TRACE Inserting 15951 bytes at offset 49863 at position 4482154 with largest timestamp 1481123186887 at offset 49971 (kafka.log.LogSegment) [2017-01-05 19:37:07,138] INFO Cleaner 0: Cleaning segment 49972 in log com_ig_trade_v1_position_event--demo--compacted-14 (largest timestamp Wed Dec 14 14:22:00 GMT 2016) into 0, retaining deletes. (kafka.log.LogCleaner) [2017-01-05 19:37:07,138] TRACE Inserting 7496 bytes at offset 49972 at position 4498105 with largest timestamp 1481722619382 at offset 50151 (kafka.log.LogSegment) [2017-01-05 19:37:07,138] INFO Cleaner 0: Cleaning segment 50155 in log com_ig_trade_v1_position_event--demo--compacted-14 (largest timestamp Wed Dec 21 15:26:33 GMT 2016) into 0, retaining deletes. (kafka.log.LogCleaner) [2017-01-05 19:37:07,138] TRACE Inserting 12632 bytes at offset 50158 at position 4505601 with largest timestamp 1482321739128 at offset 50346 (kafka.log.LogSegment) [2017-01-05 19:37:07,138] INFO Cleaner 0: Cleaning segment 50349 in log com_ig_trade_v1_position_event--demo--compacted-14 (largest timestamp Wed Dec 28 16:07:21 GMT 2016) into 0, retaining deletes. (kafka.log.LogCleaner) [2017-01-05 19:37:07,140] TRACE Inserting 11743 bytes at offset 50350 at position 4518233 with largest timestamp 1482933300393 at offset -1 (kafka.log.LogSegment) [2017-01-05 19:37:07,144] ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 20 no larger than the last offset appended (50346) to /var/kafka/logs/com_ig_trade_v1_position_event--demo--compacted-14/.timeindex.cleaned. at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) at kafka.log.LogSegment.append(LogSegment.scala:106) at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518) at kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404) at kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400) at scala.collection.immutable.List.foreach(List.scala:381) at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:400) at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:364) at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363) at scala.collection.immutable.List.foreach(List.scala:381) at kafka.log.Cleaner.clean(LogCleaner.scala:363) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:239) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:218) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created comple
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802319#comment-15802319 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- Perfect, that triggered it to run, and now has occured again, just going to capture all the bits for you again. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) > at kafka.log.LogSegment.append(LogSegment.scala:106) > at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518) > at > kafka.log.Cleaner$$anonfun$cleanSe
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802233#comment-15802233 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 7:14 PM: -- [~junrao] , we took it from the website, via the links. (edited remove exact mirror link, as actually im not 100% i know we downloaded from website though, will confirm which mirror exactly tomorrow if it makes any difference) Will look to reduce that down then. was (Author: michael.andre.pearce): [~junrao] , we took it from the website, via the links, downloading from the first mirror site, http://apache.mirror.anlx.net/kafka/0.10.1.1/kafka_2.11-0.10.1.1.tgz Will look to reduce that down then. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802233#comment-15802233 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- [~junrao] , we took it from the website, via the links, downloading from the first mirror site, http://apache.mirror.anlx.net/kafka/0.10.1.1/kafka_2.11-0.10.1.1.tgz Will look to reduce that down then. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) > at kafka.log.LogSegment.append(LogSegment.scala:106) > at kafka.log
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802034#comment-15802034 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- Btw since re-rolling forward the 0.10.1.1 back into a testing env, we're still waiting for re-occurance. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) > at kafka.log.LogSegment.append(LogSegment.scala:106) > at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518) > at > kafka.log.Cleaner$$anonfun$cleanSegments$
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802030#comment-15802030 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- When we got the issue on 0.10.1.1 we were just as surprised and as such that was our very first thing we checked. our internal mail at the time when this occured in UAT (first env we saw it post 0.10.1.1 upgrade): > From: Andrew Holford Date: Tuesday, 3 January 2017 at 17:07 To: Michael Pearce , Thomas Brown Cc: William Hargrove , Anton Goldenfarb , Neil Laurance , Gavin Sandie Subject: Re: Kafka Issues fixed in 0.10.1.1 how we can patch until confluent release. Yeah I already double checked that, all the cp1 jars are gone and the 0.10.1.1’s exist, pasted below, hope ive missed something..? Class path: :/opt/projects/confluent/bin/../share/java/kafka/jetty-continuation-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/aopalliance-repackaged-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-file-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/argparse4j-0.5.0.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-json-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/avro-1.7.7.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-codec-1.9.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-http-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-collections-3.2.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-compress-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-digester-1.8.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-lang3-3.1.jar:/opt/projects/confluent/bin/../share/java/kafka/log4j-1.2.17.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-logging-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/lz4-1.3.0.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-validator-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/metrics-core-2.2.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-io-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jopt-simple-4.9.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-runtime-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/guava-18.0.jar:/opt/projects/confluent/bin/../share/java/kafka/osgi-resource-locator-1.0.1.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-api-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/paranamer-2.3.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-locator-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/reflections-0.9.10.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-utils-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-clients-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpclient-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpcore-4.4.3.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-tools-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpmime-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-annotations-2.6.0.jar:/opt/projects/confluent/bin/../share/java/kafka/rocksdbjni-4.9.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/scala-library-2.11.8.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-api-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-databind-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-log4j12-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-base-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-media-jaxb-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/opt/projects/confluent/bin/../share/java/kafka/javassist-3.18.2-GA.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-client-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-1.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-common-3.1.0.jar:/opt/projects/confluent
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801230#comment-15801230 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 12:37 PM: --- [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments, when we had it occur on 0.10.1.1. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 version, whilst subsequently we have seen it re-occur much faster, we cannot tell how long we will have to wait) was (Author: michael.andre.pearce): [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments, when we had it occur on 0.10.1.1. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have seen it re-occur much faster, we cannot tell how long we will have to wait) > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retai
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801230#comment-15801230 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 12:36 PM: --- [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments, when we had it occur on 0.10.1.1. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have seen it re-occur much faster, we cannot tell how long we will have to wait) was (Author: michael.andre.pearce): [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have seen it re-occur much faster, we cannot tell how long we will have to wait) > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner)
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801230#comment-15801230 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 12:34 PM: --- [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have seen it re-occur much faster, we cannot tell how long we will have to wait) was (Author: michael.andre.pearce): [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have seen it much faster, we cannot tell how long we will have to wait) > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-l
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801230#comment-15801230 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 12:33 PM: --- [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have seen it much faster, we cannot tell how long we will have to wait) was (Author: michael.andre.pearce): [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the issue, we cannot tell how long we will have to wait) > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidO
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801244#comment-15801244 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- https://issues.apache.org/jira/secure/attachment/12845787/vrtstokf005_4thJan > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) > at kafka.log.LogSegment.append(LogSegment.scala:106) > at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518) > at > kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
[jira] [Updated] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Andre Pearce (IG) updated KAFKA-4497: - Attachment: vrtstokf005_4thJan [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the issue, we cannot tell how long we will have to wait) > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800869#comment-15800869 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- re configs, i can give these now. log.cleaner.backoff.ms = 15000 log.cleaner.dedupe.buffer.size = 134217728 log.cleaner.delete.retention.ms = 8640 log.cleaner.enable = true log.cleaner.io.buffer.load.factor = 0.9 log.cleaner.io.buffer.size = 524288 log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308 log.cleaner.min.cleanable.ratio = 0.5 log.cleaner.min.compaction.lag.ms = 0 log.cleaner.threads = 1 log.cleanup.policy = [delete] > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800617#comment-15800617 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- Hi [~becket_qin] Sorry we thought [~junrao] only asked for the segment logs so we just packaged up that partition logs not the log4j also. We've now rolled back that environment. I will discuss today to see if we can roll back forwards and see if we can re-create again on 0.10.1.1, so we're 100% clear, as if we do it, we will probably roll back once we detect the issue again, can you list what else you will need or want precisely so we capture everything. Cheers Mike > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$may
[jira] [Comment Edited] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799732#comment-15799732 ] Michael Andre Pearce (IG) edited comment on KAFKA-4477 at 1/4/17 11:44 PM: --- We haven't seen this re-occur, though running <1 week only running 0.10.1.1 in testing and uat envs. was (Author: michael.andre.pearce): We haven't seen this re-occur, though running <1 week only running in testing and uat envs. > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: 2016_12_15.zip, issue_node_1001.log, > issue_node_1001_ext.log, issue_node_1002.log, issue_node_1002_ext.log, > issue_node_1003.log, issue_node_1003_ext.log, kafka.jstack, > state_change_controller.tar.gz > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799732#comment-15799732 ] Michael Andre Pearce (IG) commented on KAFKA-4477: -- We haven't seen this re-occur, though running <1 week only running in testing and uat envs. > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: 2016_12_15.zip, issue_node_1001.log, > issue_node_1001_ext.log, issue_node_1002.log, issue_node_1002_ext.log, > issue_node_1003.log, issue_node_1003_ext.log, kafka.jstack, > state_change_controller.tar.gz > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799599#comment-15799599 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- Btw, we are using actually normally the Confluent build. though we decided to patch 3.1.x with 0.10.1.1 manually to see if it solved the issue in our testing and uat envs, thus we found it didn't resolve the issue. If you find the issue and have a fix, would you imagine you'd get it patched into the upcoming confluent build based on 0.10.1.1 , so we wouldn't have to immediately hot patch that? > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inL
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799594#comment-15799594 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- [~junrao] just attached, with everything. As its testing env its all non real data, so we can share. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) > at kafka.log.LogSegment.append(LogSegment.scala:106) > at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518) > at > kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:40
[jira] [Updated] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Andre Pearce (IG) updated KAFKA-4497: - Attachment: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2 [~junrao] files (logs and everything) from the partition testing env, where we had it occur at 15:52 today (4th Jan 2017) . > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107) > at kafka.log.LogSegment.append(LogSegment.scala:106) > at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518) > at > kafka.log.Cleaner$$anonfun$clea
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799490#comment-15799490 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/4/17 10:09 PM: --- [~junrao]We had this occur now in our testing env also (so now 4 envs affected), so we can get the log segments from there. Currently just sorting out another environment with our operations team tonight trying to handle the fact disks are filling rapidly, once done we will go try get them out of testing env, onto this ticket tonight, so you have them to look at during your day/our night. RE: our patch apart from it being naive, and obviously a plaster not a cure, as obviously the time index code is new, and i assume you have better knowledge of that code, would you for see any knock on issues with applying that? As we can tell it does look like it doesn't affect anything else, but we don't use time index feature yet. was (Author: michael.andre.pearce): [~junrao]We had this occur now in our testing env also (so now 4 envs affected), so we can get the log segments from there. Currently just sorting out another environment with our operations team tonight trying to handle the fact disks are filling rapidly, once done we will go try get them out of testing env, onto this ticket tonight, so you have them to look at during your day/our night. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799490#comment-15799490 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- [~junrao]We had this occur now in our testing env also (so now 4 envs affected), so we can get the log segments from there. Currently just sorting out another environment with our operations team tonight trying to handle the fact disks are filling rapidly, once done we will go try get them out of testing env, onto this ticket tonight, so you have them to look at during your day/our night. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798847#comment-15798847 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/4/17 5:43 PM: -- [~junrao] This is causing us a production issue on the current version, currently also looking now at rolling back like eBay has to 0.10.0.1, unless we get this fixed asap. This is a terminal issue. Whilst we agree in the purist view that we should fix the root cause, and happy to add logging later to further find that, is there any reason not to accept this PR to fix/treating the symptom to avoid production issue. I would imagine others also will start hitting the issue too, we're not alone, as such would cause blocker to anyone upgrading to 0.10.1.0/0.10.1.1 now. was (Author: michael.andre.pearce): [~junrao] This is causing us a production issue on the current version, currently also looking now at rolling back like eBay has to 0.10.0.1, unless we get this fixed asap. Whilst we agree in the purist view that we should fix the root cause, and happy to add logging later to further find that, is there any reason not to accept this PR to fix/treating the symptom to avoid production issue. I would imagine others also will start hitting the issue too, we're not alone, as such would cause blocker to anyone upgrading to 0.10.1.0/0.10.1.1 now. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798847#comment-15798847 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- [~junrao] This is causing us a production issue on the current version, currently also looking now at rolling back like eBay has to 0.10.0.1, unless we get this fixed asap. Whilst we agree in the purist view that we should fix the root cause, and happy to add logging later to further find that, is there any reason not to accept this PR to fix/treating the symptom to avoid production issue. I would imagine others also will start hitting the issue too, we're not alone, as such would cause blocker to anyone upgrading to 0.10.1.0/0.10.1.1 now. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppen
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798007#comment-15798007 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- [~ijuma] it seems exactly the same, we note that its blowing up because -1 is still being sent to the TimeIndex, as noted above when -1 it means there is no offset for the messageset, as noted above it can be empty, also as noted earlier a naive approach to solve is to simply avoid it throwing the exception so the compaction thread isn't killed. As such can you review and comment on our PR which is simply in the maybeAppend method to not do anything if offset is -1 (special case). Whilst naive it seems this avoids the issue. Can we confirm there is nothing adversely missed with this approach. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:1
[jira] [Issue Comment Deleted] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Andre Pearce (IG) updated KAFKA-4497: - Comment: was deleted (was: [~ijuma] it seems exactly the same, we note that its blowing up because -1 is still being sent to the TimeIndex, as noted above when -1 it means there is no offset for the messageset, as noted above it can be empty, also as noted earlier a naive approach to solve is to simply avoid it throwing the exception so the compaction thread isn't killed. As such can you review and comment on our PR which is simply in the maybeAppend method to not do anything if offset is -1 (special case). Whilst naive it seems this avoids the issue. Can we confirm there is nothing adversely missed with this approach. ) > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka
[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798006#comment-15798006 ] Michael Andre Pearce (IG) commented on KAFKA-4497: -- [~ijuma] it seems exactly the same, we note that its blowing up because -1 is still being sent to the TimeIndex, as noted above when -1 it means there is no offset for the messageset, as noted above it can be empty, also as noted earlier a naive approach to solve is to simply avoid it throwing the exception so the compaction thread isn't killed. As such can you review and comment on our PR which is simply in the maybeAppend method to not do anything if offset is -1 (special case). Whilst naive it seems this avoids the issue. Can we confirm there is nothing adversely missed with this approach. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:1
[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755064#comment-15755064 ] Michael Andre Pearce (IG) commented on KAFKA-4477: -- Agreed, this one indeed does seem very similar to that. > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: 2016_12_15.zip, issue_node_1001.log, > issue_node_1001_ext.log, issue_node_1002.log, issue_node_1002_ext.log, > issue_node_1003.log, issue_node_1003_ext.log, kafka.jstack, > state_change_controller.tar.gz > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Andre Pearce (IG) updated KAFKA-4477: - Comment: was deleted (was: IG ISR issue of 2016-12-15 04:27 (this time we see deadlock) attached.) > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: 2016_12_15.zip, issue_node_1001.log, > issue_node_1001_ext.log, issue_node_1002.log, issue_node_1002_ext.log, > issue_node_1003.log, issue_node_1003_ext.log, kafka.jstack, > state_change_controller.tar.gz > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Andre Pearce (IG) updated KAFKA-4477: - Attachment: 2016_12_15.zip IG ISR issue of 2016-12-15 04:27 (this time we see deadlock) attached. > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: 2016_12_15.zip, issue_node_1001.log, > issue_node_1001_ext.log, issue_node_1002.log, issue_node_1002_ext.log, > issue_node_1003.log, issue_node_1003_ext.log, kafka.jstack, > state_change_controller.tar.gz > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15751019#comment-15751019 ] Michael Andre Pearce (IG) commented on KAFKA-4477: -- Hi [~junrao] We had similar issue this morning, this time though we DO see deadlock. I've attached all the logs, and stack we gathered from all the process's. logs are in csv original log line is in column _raw in the csv. Ive also attached screenshots of our monitoring graphs of the follower stats, we do see a spike, but this seems to be post restart of the process (i think we should expect that?) All are in 2016_15_12.zip. Cheers Mike > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: issue_node_1001.log, issue_node_1001_ext.log, > issue_node_1002.log, issue_node_1002_ext.log, issue_node_1003.log, > issue_node_1003_ext.log, kafka.jstack, state_change_controller.tar.gz > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750994#comment-15750994 ] Michael Andre Pearce (IG) commented on KAFKA-4477: -- Have we a timeline on RC1? It would be good to have this available asap. > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: issue_node_1001.log, issue_node_1001_ext.log, > issue_node_1002.log, issue_node_1002_ext.log, issue_node_1003.log, > issue_node_1003_ext.log, kafka.jstack, state_change_controller.tar.gz > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747271#comment-15747271 ] Michael Andre Pearce (IG) edited comment on KAFKA-4477 at 12/14/16 4:59 AM: Hi [~apurva], Whilst i await the issue to occur again to provide some further logs for you. Just reading the above comment, and a query on this. Whilst obviously theres by the sounds of it a possible deadlock causing the ISR not to re-expand (though some stacks we have captured don't show this). The question in the first place is why even are the ISR's shrinking in the first place? Re 0.10.1.1 RC unfortunately in the environments we see it in, we will only be able to deploy it once 0.10.1.1 is GA/Tagged as they're UAT and PROD environments. Maybe its worth we push for getting 0.10.1.1 tagged and released now, without waiting for additional fixes, as from what i understand this version is just fixes anyhow, then if still issues detected we get a 0.10.1.2 with further hot fixes. On a note it seems 0.10.0.0 doesn't seem according to others to contain this issue (we can only confirm 0.9.0.1 doesnt, we didn't run for a long period on 0.10.0.0 before upgrading some brokers to 0.10.1.0), is there any possible way to downgrade from 0.10.1.0 to 0.10.0.0 , is there a doc for this? Obviously all docs are for upgrade paths not downgrade. Cheers Mike was (Author: michael.andre.pearce): Hi [~apurva], Whilst i await the issue to occur again to provide some further logs for you. Just reading the above comment, and a query on this. Whilst obviously theres by the sounds of it a possible deadlock causing the ISR not to re-expand (though some stacks we have captured don't show this). The question in the first place is why even are the ISR's shrinking in the first place? Re 0.10.1.1 RC unfortunately in the environments we see it in, we will only be able to deploy it once 0.10.1.1 is GA/Tagged as they're UAT and PROD environments. Maybe its worth we push for getting 0.10.1.1 tagged and released now, without waiting for additional fixes, as from what i understand this version is just fixes anyhow, then if still issues detected we get a 0.10.1.2 with further hot fixes. On a note it seems 0.10.0.0 doesn't seem according to others to contain this issue (we can only confirm 0.9.0.1 doesnt), is there any possible way to downgrade from 0.10.1.0 to 0.10.0.0 , is there a doc for this? Obviously all docs are for upgrade paths not downgrade. Cheers Mike > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: issue_node_1001.log, issue_node_1001_ext.log, > issue_node_1002.log, issue_node_1002_ext.log, issue_node_1003.log, > issue_node_1003_ext.log, kafka.jstack, state_change_controller.tar.gz > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > Al
[jira] [Comment Edited] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747271#comment-15747271 ] Michael Andre Pearce (IG) edited comment on KAFKA-4477 at 12/14/16 4:58 AM: Hi [~apurva], Whilst i await the issue to occur again to provide some further logs for you. Just reading the above comment, and a query on this. Whilst obviously theres by the sounds of it a possible deadlock causing the ISR not to re-expand (though some stacks we have captured don't show this). The question in the first place is why even are the ISR's shrinking in the first place? Re 0.10.1.1 RC unfortunately in the environments we see it in, we will only be able to deploy it once 0.10.1.1 is GA/Tagged as they're UAT and PROD environments. Maybe its worth we push for getting 0.10.1.1 tagged and released now, without waiting for additional fixes, as from what i understand this version is just fixes anyhow, then if still issues detected we get a 0.10.1.2 with further hot fixes. On a note it seems 0.10.0.0 doesn't seem according to others to contain this issue (we can only confirm 0.9.0.1 doesnt), is there any possible way to downgrade from 0.10.1.0 to 0.10.0.0 , is there a doc for this? Obviously all docs are for upgrade paths not downgrade. Cheers Mike was (Author: michael.andre.pearce): Hi [~apurva], Whilst i await the issue to occur again to provide some further logs for you. Just reading the above comment, and a query on this. Whilst obviously theres by the sounds of it a possible deadlock causing the ISR not to re-expand (though some stacks we have captured don't show this). The question in the first place is why even are the ISR's shrinking in the first place? Re 0.10.1.1 RC unfortunately in the environments we see it in, we will only be able to deploy it once 0.10.1.1 is GA/Tagged as they're UAT and PROD environments. On a note it seems 0.10.0.0 doesn't seem according to others to contain this issue (we can only confirm 0.9.0.1 doesnt), is there any possible way to downgrade from 0.10.1.0 to 0.10.0.0 , is there a doc for this? Obviously all docs are for upgrade paths not downgrade. Cheers Mike > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: issue_node_1001.log, issue_node_1001_ext.log, > issue_node_1002.log, issue_node_1002_ext.log, issue_node_1003.log, > issue_node_1003_ext.log, kafka.jstack, state_change_controller.tar.gz > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are curr
[jira] [Comment Edited] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747271#comment-15747271 ] Michael Andre Pearce (IG) edited comment on KAFKA-4477 at 12/14/16 4:55 AM: Hi [~apurva], Whilst i await the issue to occur again to provide some further logs for you. Just reading the above comment, and a query on this. Whilst obviously theres by the sounds of it a possible deadlock causing the ISR not to re-expand (though some stacks we have captured don't show this). The question in the first place is why even are the ISR's shrinking in the first place? Re 0.10.1.1 RC unfortunately in the environments we see it in, we will only be able to deploy it once 0.10.1.1 is GA/Tagged as they're UAT and PROD environments. On a note it seems 0.10.0.0 doesn't seem according to others to contain this issue (we can only confirm 0.9.0.1 doesnt), is there any possible way to downgrade from 0.10.1.0 to 0.10.0.0 , is there a doc for this? Obviously all docs are for upgrade paths not downgrade. Cheers Mike was (Author: michael.andre.pearce): Hi Apurva, Whilst i await the issue to occur again to provide some further logs for you. Just reading the above comment, and a query on this. Whilst obviously theres by the sounds of it a possible deadlock causing the ISR not to re-expand (though some stacks we have captured don't show this). The question in the first place is why even are the ISR's shrinking in the first place? Re 0.10.1.1 RC unfortunately in the environments we see it in, we will only be able to deploy it once 0.10.1.1 is GA/Tagged as they're UAT and PROD environments. On a note it seems 0.10.0.0 doesn't seem according to others to contain this issue (we can only confirm 0.9.0.1 doesnt), is there any possible way to downgrade from 0.10.1.0 to 0.10.0.0 , is there a doc for this? Obviously all docs are for upgrade paths not downgrade. Cheers Mike > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: issue_node_1001.log, issue_node_1001_ext.log, > issue_node_1002.log, issue_node_1002_ext.log, issue_node_1003.log, > issue_node_1003_ext.log, kafka.jstack, state_change_controller.tar.gz > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(
[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747271#comment-15747271 ] Michael Andre Pearce (IG) commented on KAFKA-4477: -- Hi Apurva, Whilst i await the issue to occur again to provide some further logs for you. Just reading the above comment, and a query on this. Whilst obviously theres by the sounds of it a possible deadlock causing the ISR not to re-expand (though some stacks we have captured don't show this). The question in the first place is why even are the ISR's shrinking in the first place? Re 0.10.1.1 RC unfortunately in the environments we see it in, we will only be able to deploy it once 0.10.1.1 is GA/Tagged as they're UAT and PROD environments. On a note it seems 0.10.0.0 doesn't seem according to others to contain this issue (we can only confirm 0.9.0.1 doesnt), is there any possible way to downgrade from 0.10.1.0 to 0.10.0.0 , is there a doc for this? Obviously all docs are for upgrade paths not downgrade. Cheers Mike > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: issue_node_1001.log, issue_node_1001_ext.log, > issue_node_1002.log, issue_node_1002_ext.log, issue_node_1003.log, > issue_node_1003_ext.log, kafka.jstack, state_change_controller.tar.gz > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745749#comment-15745749 ] Michael Andre Pearce (IG) edited comment on KAFKA-4477 at 12/13/16 5:49 PM: It is worth noting we see the open file descriptors increase as mentioned by someone else if we leave the process in a sick mode (now we restart quickly we don't get to observe this). was (Author: michael.andre.pearce): It is worth noting we see the open file descriptors if we leave the process in a sick mode (now we restart quickly we don't get to observe this). > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: issue_node_1001.log, issue_node_1001_ext.log, > issue_node_1002.log, issue_node_1002_ext.log, issue_node_1003.log, > issue_node_1003_ext.log, kafka.jstack > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745749#comment-15745749 ] Michael Andre Pearce (IG) commented on KAFKA-4477: -- It is worth noting we see the open file descriptors if we leave the process in a sick mode (now we restart quickly we don't get to observe this). > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: issue_node_1001.log, issue_node_1001_ext.log, > issue_node_1002.log, issue_node_1002_ext.log, issue_node_1003.log, > issue_node_1003_ext.log, kafka.jstack > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745734#comment-15745734 ] Michael Andre Pearce (IG) commented on KAFKA-4477: -- Hi Jun, The stack was taken by the automated restart script we've had to put in place before it restarted the nodes, which picked up the issue 20 seconds after it started. The broker during the period is not under high load. We do not see any GC issues, nor do we see any ZK issues. The logs we are seeing are matching those of other people, we have had this occur 3 times further all having very similar logs aka nothing new is showing up. On a side note, we are looking to upgrade to 0.10.1.1 as soon as its released and we see it released by Confluent also. We do this as we expect some further sanity checks have occurred and use this as a measure to check no critical issues, We will aim to push to UAT quickly (where we see this issue also (weirdly we haven't had this occur in TEST or DEV)) to see if this is resolved. What is the expected timeline for this? We still expecting it to be released today? And when would Confluent likely to complete their testing and release. Cheers Mike > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: issue_node_1001.log, issue_node_1001_ext.log, > issue_node_1002.log, issue_node_1002_ext.log, issue_node_1003.log, > issue_node_1003_ext.log, kafka.jstack > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15721875#comment-15721875 ] Michael Andre Pearce (IG) commented on KAFKA-4477: -- This occurred again in a prod environment just after 2am on Sat. Ive attached the stack trace that was captured before the node was restarted by our platform operations team. Looking at the stack trace, there are no deadlocks, unlike the JIRA ticket you mentioned. > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Priority: Critical > Labels: reliability > Attachments: kafka.jstack > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Andre Pearce (IG) updated KAFKA-4477: - Attachment: kafka.jstack > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Priority: Critical > Labels: reliability > Attachments: kafka.jstack > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15715205#comment-15715205 ] Michael Andre Pearce (IG) commented on KAFKA-4477: -- What we see as per the logs is that the node that goes ill first reduces its ISR's then we see disconnects from everywhere else. It does seem similar, but we never see it re-expand the ISR's unlike the log files they've submitted. We have had the issue in 3 environments, 5 times in the last week and a half. We are trying to push through automated stack trace gathering. We unfortunately don't use our APM tools to auto instrument on Kafka (note one reason for KIP-82) > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Priority: Critical > Labels: reliability > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
Michael Andre Pearce (IG) created KAFKA-4477: Summary: Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted. Key: KAFKA-4477 URL: https://issues.apache.org/jira/browse/KAFKA-4477 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.10.1.0 Environment: RHEL7 java version "1.8.0_66" Java(TM) SE Runtime Environment (build 1.8.0_66-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) Reporter: Michael Andre Pearce (IG) Priority: Critical We have encountered a critical issue that has re-occured in different physical environments. We haven't worked out what is going on. We do though have a nasty work around to keep service alive. We do have not had this issue on clusters still running 0.9.01. We have noticed a node randomly shrinking for the partitions it owns the ISR's down to itself, moments later we see other nodes having disconnects, followed by finally app issues, where producing to these partitions is blocked. It seems only by restarting the kafka instance java process resolves the issues. We have had this occur multiple times and from all network and machine monitoring the machine never left the network, or had any other glitches. Below are seen logs from the issue. Node 7: [2016-12-01 07:01:28,112] INFO Partition [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from 1,2,7 to 7 (kafka.cluster.Partition) All other nodes: [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 (kafka.server.ReplicaFetcherThread) java.io.IOException: Connection to 7 was disconnected before the response was read All clients: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. After this occurs, we then suddenly see on the sick machine an increasing amount of close_waits and file descriptors. As a work around to keep service we are currently putting in an automated process that tails and regex's for: and where new_partitions hit just itself we restart the node. "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for partition \[.*\] from (?P.+) to (?P.+) \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691817#comment-15691817 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/24/16 1:03 AM: The link you reference re virtual calls. This is much more about monomorphic call or polymorphic calls. Making a class that implements an interface final, where the method invocation is by interface methods, does not change this. This is more to do with the number of class's loaded/invoked that implement the interface. So in case of single implementation being used and loaded your jvm you have a monomorphic case for the interface, the JVM will inline this (final or not). If you happen to have two implementations being used and loaded the jvm will still be able to inline but will create a branch case, the second loaded implementation will be slower if invoked due to the branch. If you have more than two implementations loaded the JVM will on loading these do on stack replacement of the previously loaded inlined, and move to using virtual(jump) tables. You'll see this occur if you turn on -XX:+PrintCompilation A classical implementation/test and write up showing this is: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note taking the code in the blog, and running it with or without final implementations makes no difference. Also i've taken this test from the above blog, for your final and non final cases (i've attached to this jira), if you note I've uploaded two versions, one with the final being declared and loaded by the JVM first and vice versa. As you note in both the implementation loaded first due to the inlined branch will be more performant. On checking your original test case we noted that the FinalByteArraySerializer version runs first (due to alphabetic ordering that test are run in) , as such it would be always the first in the inline branch benefitting from this, this would explain why it seems always final was negligible faster when running your benchmark test case. was (Author: michael.andre.pearce): The link you reference re virtual calls. This is much more about monomorphic call or polymorphic calls. Making a class that implements an interface final, where the method invocation is by interface methods, does not change this. This is more to do with the number of class's loaded/invoked that implement the interface. So in case of single implementation being used and loaded your jvm you have a monomorphic case for the interface, the JVM will inline this (final or not). If you happen to have two implementations being used and loaded the jvm will still be able to inline but will create a branch case, the second loaded implementation will be slower if invoked due to the branch. If you have more than two implementations loaded the JVM will on loading these do on stack replacement of the previously loaded inlined, and move to using virtual tables. You'll see this occur if you turn on -XX:+PrintCompilation A classical implementation/test and write up showing this is: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note taking the code in the blog, and running it with or without final implementations makes no difference. Also i've taken this test from the above blog, for your final and non final cases (i've attached to this jira), if you note I've uploaded two versions, one with the final being declared and loaded by the JVM first and vice versa. As you note in both the implementation loaded first due to the inlined branch will be more performant. On checking your original test case we noted that the FinalByteArraySerializer version runs first (due to alphabetic ordering that test are run in) , as such it would be always the first in the inline branch benefitting from this, this would explain why it seems always final was negligible faster when running your benchmark test case. > Make serializer classes final > - > > Key: KAFKA-4424 > URL: https://issues.apache.org/jira/browse/KAFKA-4424 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Matthias Bechtold >Priority: Minor > Attachments: FinalTest.java, FinalTestReversed.java > > > Implementations of simple serializers / deserializers should be final to > prevent JVM method call overhead. > See also: > https://wiki.openjdk.java.net/display/HotSpot/VirtualCalls > This breaks the API slightly, inheritors must change to generic interfaces > Serializer / Deserializer. But architecture-wise final serialization classes > make the most sense to me. > So what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691817#comment-15691817 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/24/16 12:59 AM: - The link you reference re virtual calls. This is much more about monomorphic call or polymorphic calls. Making a class that implements an interface final, where the method invocation is by interface methods, does not change this. This is more to do with the number of class's loaded/invoked that implement the interface. So in case of single implementation being used and loaded your jvm you have a monomorphic case for the interface, the JVM will inline this (final or not). If you happen to have two implementations being used and loaded the jvm will still be able to inline but will create a branch case, the second loaded implementation will be slower if invoked due to the branch. If you have more than two implementations loaded the JVM will on loading these do on stack replacement of the previously loaded inlined, and move to using virtual tables. You'll see this occur if you turn on -XX:+PrintCompilation A classical implementation/test and write up showing this is: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note taking the code in the blog, and running it with or without final implementations makes no difference. Also i've taken this test from the above blog, for your final and non final cases (i've attached to this jira), if you note I've uploaded two versions, one with the final being declared and loaded by the JVM first and vice versa. As you note in both the implementation loaded first due to the inlined branch will be more performant. On checking your original test case we noted that the FinalByteArraySerializer version runs first (due to alphabetic ordering that test are run in) , as such it would be always the first in the inline branch benefitting from this, this would explain why it seems always final was negligible faster when running your benchmark test case. was (Author: michael.andre.pearce): The link you reference re virtual calls. This is much more about monomorphic call or polymorphic calls. Making a class that implements an interface final, where the method invocation is by interface methods, does not change this. This is more to do with the number of class's loaded/invoked that implement the interface. So in case of single implementation being used and loaded your jvm you have a monomorphic case for the interface, the JVM will inline this (final or not). If you happen to have two implementations being used and loaded the jvm will still be able to inline but will create a branch case, the second loaded implementation will be slower if invoked due to the branch. If you have more than two implementations loaded the JVM will on loading these do on stack replacement of the previously loaded inlined, and move to using virtual tables. You'll see this occur if you turn on -XX:+PrintCompilation A classical implementation and write up showing this is: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note taking the code, and running it with or without final implementations makes no difference. Also i've taken this classical test, for your final and non final cases (attached to this jira), if you note I've loaded two versions, one with the final being declared and loaded by the JVM first and vice versa. As you note in both the implementation loaded first due to the inlined branch will be more performant. On checking your original test case we noted that the FinalByteArraySerializer version runs first (due to alphabetic ordering that test are run in) , as such it would be always the first in the inline branch benefitting from this, this would explain why it seems always final was negligible faster when running your benchmark test case. > Make serializer classes final > - > > Key: KAFKA-4424 > URL: https://issues.apache.org/jira/browse/KAFKA-4424 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Matthias Bechtold >Priority: Minor > Attachments: FinalTest.java, FinalTestReversed.java > > > Implementations of simple serializers / deserializers should be final to > prevent JVM method call overhead. > See also: > https://wiki.openjdk.java.net/display/HotSpot/VirtualCalls > This breaks the API slightly, inheritors must change to generic interfaces > Serializer / Deserializer. But architecture-wise final serialization classes > make the most sense to me. > So what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691817#comment-15691817 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/24/16 12:57 AM: - The link you reference re virtual calls. This is much more about monomorphic call or polymorphic calls. Making a class that implements an interface final, where the method invocation is by interface methods, does not change this. This is more to do with the number of class's loaded/invoked that implement the interface. So in case of single implementation being used and loaded your jvm you have a monomorphic case for the interface, the JVM will inline this (final or not). If you happen to have two implementations being used and loaded the jvm will still be able to inline but will create a branch case, the second loaded implementation will be slower if invoked due to the branch. If you have more than two implementations loaded the JVM will on loading these do on stack replacement of the previously loaded inlined, and move to using virtual tables. You'll see this occur if you turn on -XX:+PrintCompilation A classical implementation and write up showing this is: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note taking the code, and running it with or without final implementations makes no difference. Also i've taken this classical test, for your final and non final cases (attached to this jira), if you note I've loaded two versions, one with the final being declared and loaded by the JVM first and vice versa. As you note in both the implementation loaded first due to the inlined branch will be more performant. On checking your original test case we noted that the FinalByteArraySerializer version runs first (due to alphabetic ordering that test are run in) , as such it would be always the first in the inline branch benefitting from this, this would explain why it seems always final was negligible faster when running your benchmark test case. was (Author: michael.andre.pearce): The link you reference re virtual calls. This is much more about monomorphic call or polymorphic calls. Making a class that implements an interface final, where the method invocation is by interface methods, does not change this. This is more to do with the number of class's loaded that implement the interface. So in case of single implementation being used and loaded your jvm you have a monomorphic case for the interface, the JVM will inline this (final or not). If you happen to have two implementations being used and loaded the jvm will still be able to inline but will create a branch case, the second loaded implementation will be slower if invoked due to the branch. If you have more than two implementations loaded the JVM will on loading these do on stack replacement of the previously loaded inlined, and move to using virtual tables. You'll see this occur if you turn on -XX:+PrintCompilation A classical implementation and write up showing this is: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note taking the code, and running it with or without final implementations makes no difference. Also i've taken this classical test, for your final and non final cases (attached to this jira), if you note I've loaded two versions, one with the final being declared and loaded by the JVM first and vice versa. As you note in both the implementation loaded first due to the inlined branch will be more performant. On checking your original test case we noted that the FinalByteArraySerializer version runs first (due to alphabetic ordering that test are run in) , as such it would be always the first in the inline branch benefitting from this, this would explain why it seems always final was negligible faster when running your benchmark test case. > Make serializer classes final > - > > Key: KAFKA-4424 > URL: https://issues.apache.org/jira/browse/KAFKA-4424 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Matthias Bechtold >Priority: Minor > Attachments: FinalTest.java, FinalTestReversed.java > > > Implementations of simple serializers / deserializers should be final to > prevent JVM method call overhead. > See also: > https://wiki.openjdk.java.net/display/HotSpot/VirtualCalls > This breaks the API slightly, inheritors must change to generic interfaces > Serializer / Deserializer. But architecture-wise final serialization classes > make the most sense to me. > So what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691817#comment-15691817 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/24/16 12:55 AM: - The link you reference re virtual calls. This is much more about monomorphic call or polymorphic calls. Making a class that implements an interface final, where the method invocation is by interface methods, does not change this. This is more to do with the number of class's loaded that implement the interface. So in case of single implementation being used and loaded your jvm you have a monomorphic case for the interface, the JVM will inline this (final or not). If you happen to have two implementations being used and loaded the jvm will still be able to inline but will create a branch case, the second loaded implementation will be slower if invoked due to the branch. If you have more than two implementations loaded the JVM will on loading these do on stack replacement of the previously loaded inlined, and move to using virtual tables. You'll see this occur if you turn on -XX:+PrintCompilation A classical implementation and write up showing this is: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note taking the code, and running it with or without final implementations makes no difference. Also i've taken this classical test, for your final and non final cases (attached to this jira), if you note I've loaded two versions, one with the final being declared and loaded by the JVM first and vice versa. As you note in both the implementation loaded first due to the inlined branch will be more performant. On checking your original test case we noted that the FinalByteArraySerializer version runs first (due to alphabetic ordering that test are run in) , as such it would be always the first in the inline branch benefitting from this, this would explain why it seems always final was negligible faster when running your benchmark test case. was (Author: michael.andre.pearce): The link you reference re virtual calls. This is much more about monomorphic call or polymorphic calls. Making a class that implements an interface final, where the method invocation is by interface methods, does not change this. This is more to do with the number of class's loaded that implement the interface. So in case of single implementation being used and loaded your jvm you have a monomorphic case for the interface, the JVM will inline this (final or not). If you happen to have two implementations being used and loaded the jvm will still be able to inline but will create a branch case, the second loaded implementation will be slower if invoked due to the branch. If you have more than two implementations loaded the JVM will on loading these do on stack replacement of the previously loaded inlined, and move to using virtual tables. You'll see this occur if you turn on -XX:+PrintCompilation A classical implementation and write up showing this is: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note taking the code, and running it with or without final implementations makes no difference. Also i've taken this classical test, for your final and non final cases (attached to this jira), if you note I've loaded two versions, one with the final being declared and loaded by the JVM first and vice versa. As you not in both the implementation loaded first due to the inlined branch will be more performant. On checking your original test case we noted that the FinalByteArraySerializer version runs first (due to alphabetic ordering that test are run in) , as such it would be always the first in the inline branch benefitting from this, this would explain why it seems always final was negligible faster when running your benchmark test case. > Make serializer classes final > - > > Key: KAFKA-4424 > URL: https://issues.apache.org/jira/browse/KAFKA-4424 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Matthias Bechtold >Priority: Minor > Attachments: FinalTest.java, FinalTestReversed.java > > > Implementations of simple serializers / deserializers should be final to > prevent JVM method call overhead. > See also: > https://wiki.openjdk.java.net/display/HotSpot/VirtualCalls > This breaks the API slightly, inheritors must change to generic interfaces > Serializer / Deserializer. But architecture-wise final serialization classes > make the most sense to me. > So what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691817#comment-15691817 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/24/16 12:54 AM: - The link you reference re virtual calls. This is much more about monomorphic call or polymorphic calls. Making a class that implements an interface final, where the method invocation is by interface methods, does not change this. This is more to do with the number of class's loaded that implement the interface. So in case of single implementation being used and loaded your jvm you have a monomorphic case for the interface, the JVM will inline this (final or not). If you happen to have two implementations being used and loaded the jvm will still be able to inline but will create a branch case, the second loaded implementation will be slower if invoked due to the branch. If you have more than two implementations loaded the JVM will on loading these do on stack replacement of the previously loaded inlined, and move to using virtual tables. You'll see this occur if you turn on -XX:+PrintCompilation A classical implementation and write up showing this is: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note taking the code, and running it with or without final implementations makes no difference. Also i've taken this classical test, for your final and non final cases (attached to this jira), if you note I've loaded two versions, one with the final being declared and loaded by the JVM first and vice versa. As you not in both the implementation loaded first due to the inlined branch will be more performant. On checking your original test case we noted that the FinalByteArraySerializer version runs first (due to alphabetic ordering that test are run in) , as such it would be always the first in the inline branch benefitting from this, this would explain why it seems always final was negligible faster when running your benchmark test case. was (Author: michael.andre.pearce): The link you reference re virtual calls. This is much more about monomorphic call or polymorphic calls. Making a class that implements an interface final, where the method invocation is by interface methods, does not change this. This is more to do with the number of class's loaded that implement the interface. So in case of single implementation being used and loaded your jvm you have a monomorphic case for the interface, the JVM will inline this (final or not). If you happen to have two implementations being used and loaded the jvm will still be able to inline but will create a branch case, the second loaded implementation will be slower if invoked due to the branch. If you have more than two implementations loaded the JVM will on loading these do on stack replacement of the previously loaded inlined, and more to using visual tables. You'll see this occur if you turn on -XX:+PrintCompilation A classical implementation and write up showing this is: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note taking the code, and running it with or without final implementations makes no difference. Also i've taken this classical test, for your final and non final cases (attached to this jira), if you note I've loaded two versions, one with the final being declared and loaded by the JVM first and vice versa. As you not in both the implementation loaded first due to the inlined branch will be more performant. On checking your original test case we noted that the FinalByteArraySerializer version runs first (due to alphabetic ordering that test are run in) , as such it would be always the first in the inline branch benefitting from this, this would explain why it seems always final was negligible faster when running your benchmark test case. > Make serializer classes final > - > > Key: KAFKA-4424 > URL: https://issues.apache.org/jira/browse/KAFKA-4424 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Matthias Bechtold >Priority: Minor > Attachments: FinalTest.java, FinalTestReversed.java > > > Implementations of simple serializers / deserializers should be final to > prevent JVM method call overhead. > See also: > https://wiki.openjdk.java.net/display/HotSpot/VirtualCalls > This breaks the API slightly, inheritors must change to generic interfaces > Serializer / Deserializer. But architecture-wise final serialization classes > make the most sense to me. > So what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691817#comment-15691817 ] Michael Andre Pearce (IG) commented on KAFKA-4424: -- The link you reference re virtual calls. This is much more about monomorphic call or polymorphic calls. Making a class that implements an interface final, where the method invocation is by interface methods, does not change this. This is more to do with the number of class's loaded that implement the interface. So in case of single implementation being used and loaded your jvm you have a monomorphic case for the interface, the JVM will inline this (final or not). If you happen to have two implementations being used and loaded the jvm will still be able to inline but will create a branch case, the second loaded implementation will be slower if invoked due to the branch. If you have more than two implementations loaded the JVM will on loading these do on stack replacement of the previously loaded inlined, and more to using visual tables. You'll see this occur if you turn on -XX:+PrintCompilation A classical implementation and write up showing this is: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note taking the code, and running it with or without final implementations makes no difference. Also i've taken this classical test, for your final and non final cases (attached to this jira), if you note I've loaded two versions, one with the final being declared and loaded by the JVM first and vice versa. As you not in both the implementation loaded first due to the inlined branch will be more performant. On checking your original test case we noted that the FinalByteArraySerializer version runs first (due to alphabetic ordering that test are run in) , as such it would be always the first in the inline branch benefitting from this, this would explain why it seems always final was negligible faster when running your benchmark test case. > Make serializer classes final > - > > Key: KAFKA-4424 > URL: https://issues.apache.org/jira/browse/KAFKA-4424 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Matthias Bechtold >Priority: Minor > Attachments: FinalTest.java, FinalTestReversed.java > > > Implementations of simple serializers / deserializers should be final to > prevent JVM method call overhead. > See also: > https://wiki.openjdk.java.net/display/HotSpot/VirtualCalls > This breaks the API slightly, inheritors must change to generic interfaces > Serializer / Deserializer. But architecture-wise final serialization classes > make the most sense to me. > So what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Andre Pearce (IG) updated KAFKA-4424: - Attachment: FinalTestReversed.java FinalTest.java > Make serializer classes final > - > > Key: KAFKA-4424 > URL: https://issues.apache.org/jira/browse/KAFKA-4424 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Matthias Bechtold >Priority: Minor > Attachments: FinalTest.java, FinalTestReversed.java > > > Implementations of simple serializers / deserializers should be final to > prevent JVM method call overhead. > See also: > https://wiki.openjdk.java.net/display/HotSpot/VirtualCalls > This breaks the API slightly, inheritors must change to generic interfaces > Serializer / Deserializer. But architecture-wise final serialization classes > make the most sense to me. > So what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Andre Pearce (IG) updated KAFKA-4424: - Comment: was deleted (was: Just reading the link you reference. Virtual calls and jump tables is monomorphic vs polymorphic calls. If only one implementation(this implementation being final or not) of an interface is loaded then you will get a monomorphic implementation and it can be fully inlined. If you load another, your method is inlined but with a branch, once you load further hotspot will do on stack replacement and this is when jump tables are needed. You can see this occurring with on stack replacement with -XX:+PrintCompilation, as classes are loaded. A classical test case showing this: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note that making the class's final or not in the test, makes no difference to the outcome. In case of serialisers/deserialisers this is our case as we have an interface that is implemented, as such making these implementations final doesn't help, as all reference by calling classes are invoking methods on the interface, as such its performance is about how many implementations are loaded.) > Make serializer classes final > - > > Key: KAFKA-4424 > URL: https://issues.apache.org/jira/browse/KAFKA-4424 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Matthias Bechtold >Priority: Minor > > Implementations of simple serializers / deserializers should be final to > prevent JVM method call overhead. > See also: > https://wiki.openjdk.java.net/display/HotSpot/VirtualCalls > This breaks the API slightly, inheritors must change to generic interfaces > Serializer / Deserializer. But architecture-wise final serialization classes > make the most sense to me. > So what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691736#comment-15691736 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/24/16 12:12 AM: - Just reading the link you reference. Virtual calls and jump tables is monomorphic vs polymorphic calls. If only one implementation(this implementation being final or not) of an interface is loaded then you will get a monomorphic implementation and it can be fully inlined. If you load another, your method is inlined but with a branch, once you load further hotspot will do on stack replacement and this is when jump tables are needed. You can see this occurring with on stack replacement with -XX:+PrintCompilation, as classes are loaded. A classical test case showing this: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note that making the class's final or not in the test, makes no difference to the outcome. In case of serialisers/deserialisers this is our case as we have an interface that is implemented, as such making these implementations final doesn't help, as all reference by calling classes are invoking methods on the interface, as such its performance is about how many implementations are loaded. was (Author: michael.andre.pearce): Just reading the link you reference. Virtual calls and jump tables is monomorphic vs polymorphic calls. If only one implementation(this implementation being final or not) of an interface is loaded then you will get a monomorphic implementation and it can be fully inlined. If you load another, your method is inlined but with a branch, once you load further hotspot will get on stack replacement and this is when jump tables are needed. You can see this occurring with on stack replacement with -XX:+PrintCompilation, as classes are loaded. A classical test case showing this: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note that making the class's final or not in the test, makes no difference to the outcome. In case of serialisers/deserialisers this is our case as we have an interface that is implemented, as such making these implementations final doesn't help, as all reference by calling classes are invoking methods on the interface, as such its performance is about how many implementations are loaded. > Make serializer classes final > - > > Key: KAFKA-4424 > URL: https://issues.apache.org/jira/browse/KAFKA-4424 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Matthias Bechtold >Priority: Minor > > Implementations of simple serializers / deserializers should be final to > prevent JVM method call overhead. > See also: > https://wiki.openjdk.java.net/display/HotSpot/VirtualCalls > This breaks the API slightly, inheritors must change to generic interfaces > Serializer / Deserializer. But architecture-wise final serialization classes > make the most sense to me. > So what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691736#comment-15691736 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/24/16 12:12 AM: - Just reading the link you reference. Virtual calls and jump tables is monomorphic vs polymorphic calls. If only one implementation(this implementation being final or not) of an interface is loaded then you will get a monomorphic implementation and it can be fully inlined. If you load another, your method is inlined but with a branch, once you load further hotspot will get on stack replacement and this is when jump tables are needed. You can see this occurring with on stack replacement with -XX:+PrintCompilation, as classes are loaded. A classical test case showing this: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note that making the class's final or not in the test, makes no difference to the outcome. In case of serialisers/deserialisers this is our case as we have an interface that is implemented, as such making these implementations final doesn't help, as all reference by calling classes are invoking methods on the interface, as such its performance is about how many implementations are loaded. was (Author: michael.andre.pearce): Just reading the link you reference. Virtual calls and jump tables is monomorphic vs polymorphic calls. If only one implementation(this implementation being final or not) of an interface is loaded then you will get a monomorphic implementation and it can be fully inlined. If you load another, your method is inlined but with a branch, once you load further hotspot will get on stack replacement and this is when jump tables are needed. You can see this occurring with on stack replacement with -XX:+PrintCompilation, as classes are loaded. A classical test case showing this: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note that making the class's final or not in the test, makes no difference to the outcome. In case of serialisers/deserialisers this is our case as we have an interface that is implemented, as such making these implementations final doesn't do much. > Make serializer classes final > - > > Key: KAFKA-4424 > URL: https://issues.apache.org/jira/browse/KAFKA-4424 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Matthias Bechtold >Priority: Minor > > Implementations of simple serializers / deserializers should be final to > prevent JVM method call overhead. > See also: > https://wiki.openjdk.java.net/display/HotSpot/VirtualCalls > This breaks the API slightly, inheritors must change to generic interfaces > Serializer / Deserializer. But architecture-wise final serialization classes > make the most sense to me. > So what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691736#comment-15691736 ] Michael Andre Pearce (IG) commented on KAFKA-4424: -- Just reading the link you reference. Virtual calls and jump tables is monomorphic vs polymorphic calls. If only one implementation(this implementation being final or not) of an interface is loaded then you will get a monomorphic implementation and it can be fully inlined. If you load another, your method is inlined but with a branch, once you load further hotspot will get on stack replacement and this is when jump tables are needed. You can see this occurring with on stack replacement with -XX:+PrintCompilation, as classes are loaded. A classical test case showing this: http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html You'll note that making the class's final or not in the test, makes no difference to the outcome. In case of serialisers/deserialisers this is our case as we have an interface that is implemented, as such making these implementations final doesn't do much. > Make serializer classes final > - > > Key: KAFKA-4424 > URL: https://issues.apache.org/jira/browse/KAFKA-4424 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Matthias Bechtold >Priority: Minor > > Implementations of simple serializers / deserializers should be final to > prevent JVM method call overhead. > See also: > https://wiki.openjdk.java.net/display/HotSpot/VirtualCalls > This breaks the API slightly, inheritors must change to generic interfaces > Serializer / Deserializer. But architecture-wise final serialization classes > make the most sense to me. > So what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691571#comment-15691571 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/23/16 11:30 PM: - Hi Matthias, Thanks for putting together some initial tests. When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per we understand you have done), the macbook was on El Capitan, and the cisco blade and nutanix vm are running RHEL7. The above stats copied are from running in particular on a laptop (but are inline with what we've seen also in our server environment) just easier to copy from as our server environments are protected. MacBook Pro (Retina, 15-inch, Mid 2015) 2.2 GHz Intel Core i7 16 GB 1600 MHz DDR3 This is what we have come to expect, essentially making a class final doesn't give as much a perfomance boost as people come to think, as the modern jvm compilers like hotspot (there is another commercial one which I'm sure we all know of ;) but as its propriety/commercial shall avoid it for this discussion ), they really do a lot of magic under the hood for us. Please read: http://www.oracle.com/technetwork/java/whitepaper-135217.html#impact http://www.oracle.com/technetwork/java/whitepaper-135217.html#optimizations Whilst this article (by Brian Goetz - his credentials kinda speak for them selves) from way back when is very dated and now not relevant in regards to precise jvm internals as a lot has moved on. http://www.ibm.com/developerworks/java/library/j-jtp1029/index.html There is one core take away i think is important, and is more important when making a decision to use final or not. " final classes and methods can be a significant inconvenience when programming -- they limit your options for reusing existing code and extending the functionality of existing classes. While sometimes a class is made final for a good reason, such as to enforce immutability, the benefits of using final should outweigh the inconvenience. Performance enhancement is almost always a bad reason to compromise good object-oriented design principles, and when the performance enhancement is small or nonexistent, this is a bad trade-off indeed. " This is a change to API level that becomes more restrictive, then i think really this should need a KIP, and no doubt some further tests and arguments on a KIP discussion for the pros and con's. Its worth noting also very recently the ProducerRecord and ConsumerRecord, for extensibility reasons were made non-final, if you take the current master. was (Author: michael.andre.pearce): Hi Matthias, Thanks for putting together some initial tests. When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691571#comment-15691571 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/23/16 11:28 PM: - Hi Matthias, Thanks for putting together some initial tests. When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per we understand you have done), the macbook was on El Capitan, and the cisco blade and nutanix vm are running RHEL7. The above stats copied are from running in particular on a laptop (but are inline with what we've seen also in our server environment) just easier to copy from as our server environments are protected. MacBook Pro (Retina, 15-inch, Mid 2015) 2.2 GHz Intel Core i7 16 GB 1600 MHz DDR3 This is what we have come to expect, essentially making a class final doesn't give as much a perfomance boost as people come to think, as the modern jvm compilers like hotspot (there is another commercial one which I'm sure we all know of ;) but as its propriety/commercial shall avoid it for this discussion ), they really do a lot of magic under the hood for us. Please read: http://www.oracle.com/technetwork/java/whitepaper-135217.html#impact Whilst this article (by Brian Goetz - his credentials kinda speak for them selves) from way back when is very dated and now not relevant in regards to precise jvm internals as a lot has moved on. http://www.ibm.com/developerworks/java/library/j-jtp1029/index.html There is one core take away i think is important, and is more important when making a decision to use final or not. " final classes and methods can be a significant inconvenience when programming -- they limit your options for reusing existing code and extending the functionality of existing classes. While sometimes a class is made final for a good reason, such as to enforce immutability, the benefits of using final should outweigh the inconvenience. Performance enhancement is almost always a bad reason to compromise good object-oriented design principles, and when the performance enhancement is small or nonexistent, this is a bad trade-off indeed. " This is a change to API level that becomes more restrictive, then i think really this should need a KIP, and no doubt some further tests and arguments on a KIP discussion for the pros and con's. Its worth noting also very recently the ProducerRecord and ConsumerRecord, for extensibility reasons were made non-final, if you take the current master. was (Author: michael.andre.pearce): Hi Matthias, Thanks for putting together some initial tests. When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691571#comment-15691571 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/23/16 11:23 PM: - Hi Matthias, Thanks for putting together some initial tests. When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per we understand you have done), the macbook was on El Capitan, and the cisco blade and nutanix vm are running RHEL7. The above stats copied are from running in particular on a laptop (but are inline with what we've seen also in our server environment) just easier to copy from as our server environments are protected. MacBook Pro (Retina, 15-inch, Mid 2015) 2.2 GHz Intel Core i7 16 GB 1600 MHz DDR3 This is what we have come to expect, essentially making a class final doesn't give as much a perfomance boost as people come to think, as the modern jvm compilers like hotspot (there is another commercial one which I'm sure we all know of ;) but as its propriety/commercial shall avoid it for this discussion ), they really do a lot of magic under the hood for us. Whilst this article (by Brian Goetz - his credentials kinda speak for them selves) from way back when is very dated and now not relevant in regards to precise jvm internals as a lot has moved on. http://www.ibm.com/developerworks/java/library/j-jtp1029/index.html There is one core take away i think is important, and is more important when making a decision to use final or not. " final classes and methods can be a significant inconvenience when programming -- they limit your options for reusing existing code and extending the functionality of existing classes. While sometimes a class is made final for a good reason, such as to enforce immutability, the benefits of using final should outweigh the inconvenience. Performance enhancement is almost always a bad reason to compromise good object-oriented design principles, and when the performance enhancement is small or nonexistent, this is a bad trade-off indeed. " This is a change to API level that becomes more restrictive, then i think really this should need a KIP, and no doubt some further tests and arguments on a KIP discussion for the pros and con's. Its worth noting also very recently the ProducerRecord and ConsumerRecord, for extensibility reasons were made non-final, if you take the current master. was (Author: michael.andre.pearce): Hi Matthias, Thanks for putting together some initial tests. When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per we understand you have done), the macbook was on El Capitan, and the cisco blade
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691571#comment-15691571 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/23/16 11:22 PM: - Hi Matthias, Thanks for putting together some initial tests. When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per we understand you have done), the macbook was on El Capitan, and the cisco blade and nutanix vm are running RHEL7. The above stats copied are from running in particular on a laptop (but are inline with what we've seen also in our server environment) just easier to copy from as our server environments are protected. MacBook Pro (Retina, 15-inch, Mid 2015) 2.2 GHz Intel Core i7 16 GB 1600 MHz DDR3 This is what we have come to expect, essentially making a class final doesn't give as much a perfomance boost as people come to think, as the modern jvm compilers like hotspot (there is another commercial one which I'm sure we all know of ;) but as its propriety/commercial shall avoid it for this discussion ), they really do a lot of magic under the hood for us. Whilst this article (by Brian Goetz - his credentials kinda speak for them selves) from way back when is very dated and now not relevant in regards to precise jvm internals as a lot has moved on. http://www.ibm.com/developerworks/java/library/j-jtp1029/index.html There is one core take away i think is important, and is more important when making a decision to use final or not. " final classes and methods can be a significant inconvenience when programming -- they limit your options for reusing existing code and extending the functionality of existing classes. While sometimes a class is made final for a good reason, such as to enforce immutability, the benefits of using final should outweigh the inconvenience. Performance enhancement is almost always a bad reason to compromise good object-oriented design principles, and when the performance enhancement is small or nonexistent, this is a bad trade-off indeed. " This is a change to API level, then i think really this should need a KIP, and no doubt some further tests and arguments on a KIP discussion for the pros and con's. Its worth noting also very recently the ProducerRecord and ConsumerRecord, for extensibility reasons were made non-final, if you take the current master. was (Author: michael.andre.pearce): Hi Matthias, Thanks for putting together some initial tests. When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per we understand you have done), the macbook was on El Capitan, and the cisco blade and nutanix vm are running RH
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691571#comment-15691571 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/23/16 11:12 PM: - Hi Matthias, Thanks for putting together some initial tests. When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per we understand you have done), the macbook was on El Capitan, and the cisco blade and nutanix vm are running RHEL7. The above stats copied are from running in particular on a laptop (but are inline with what we've seen also in our server environment) just easier to copy from as our server environments are protected. MacBook Pro (Retina, 15-inch, Mid 2015) 2.2 GHz Intel Core i7 16 GB 1600 MHz DDR3 This is what we have come to expect, essentially making a class final doesn't give as much a perfomance boost as people come to think, as the modern jvm compilers like hotspot (there is another commercial one which I'm sure we all know of ;) but as its propriety/commercial shall avoid it for this discussion ), they really do a lot of magic under the hood for us. Whilst this article (by Brian Goetz - his credentials kinda speak for them selves) from way back when is very dated and now not relevant in regards to precise jvm internals as a lot has moved on. http://www.ibm.com/developerworks/java/library/j-jtp1029/index.html There is one core take away i think is important, and is more important when making a decision to use final or not. " final classes and methods can be a significant inconvenience when programming -- they limit your options for reusing existing code and extending the functionality of existing classes. While sometimes a class is made final for a good reason, such as to enforce immutability, the benefits of using final should outweigh the inconvenience. Performance enhancement is almost always a bad reason to compromise good object-oriented design principles, and when the performance enhancement is small or nonexistent, this is a bad trade-off indeed. " As stated, this is a change to API level, then i think really this should need a KIP, and no doubt some further tests and arguments on a KIP discussion for the pros and con's. Its worth noting also very recently the ProducerRecord and ConsumerRecord, for extensibility reasons were made non-final, if you take the current master. was (Author: michael.andre.pearce): Hi Matthias, Thanks for putting together some initial tests. When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per we understand you have done), the macbook was on El Capitan, and the cisco blade and nutanix vm are
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691571#comment-15691571 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/23/16 11:01 PM: - Hi Matthias, Thanks for putting together some initial tests. When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per we understand you have done), the macbook was on El Capitan, and the cisco blade and nutanix vm are running RHEL7. The above stats copied are from running in particular on a laptop (but are inline with what we've seen also in our server environment) just easier to copy from as our server environments are protected. MacBook Pro (Retina, 15-inch, Mid 2015) 2.2 GHz Intel Core i7 16 GB 1600 MHz DDR3 This is what we have come to expect, essentially making a class final doesn't give as much a perfomance boost as people come to think, as the modern jvm compilers like hotspot (there is another commercial one which I'm sure we all know of ;) but as its propriety/commercial shall avoid it for this discussion ), they really do a lot of magic under the hood for us. Whilst this article from way back when is very dated and now not relevant in regards to precise jvm internals as a lot has moved on. http://www.ibm.com/developerworks/java/library/j-jtp1029/index.html There is one core take away i think is important, and is more important when making a decision to use final or not. " final classes and methods can be a significant inconvenience when programming -- they limit your options for reusing existing code and extending the functionality of existing classes. While sometimes a class is made final for a good reason, such as to enforce immutability, the benefits of using final should outweigh the inconvenience. Performance enhancement is almost always a bad reason to compromise good object-oriented design principles, and when the performance enhancement is small or nonexistent, this is a bad trade-off indeed. " As stated, this is a change to API level, then i think really this should need a KIP, and no doubt some further tests and arguments on a KIP discussion for the pros and con's. Its worth noting also very recently the ProducerRecord and ConsumerRecord, for extensibility reasons were made non-final, if you take the current master. was (Author: michael.andre.pearce): Hi Matthias, Thanks for putting together some initial tests. When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per we understand you have done), the macbook was on El Capitan, and the cisco blade and nutanix vm are running RHEL7. The above stats copied are from running in partic
[jira] [Comment Edited] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691571#comment-15691571 ] Michael Andre Pearce (IG) edited comment on KAFKA-4424 at 11/23/16 10:57 PM: - Hi Matthias, Thanks for putting together some initial tests. When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per we understand you have done), the macbook was on El Capitan, and the cisco blade and nutanix vm are running RHEL7. The above stats copied are from running in particular on a laptop (but are inline with what we've seen also in our server environment) just easier to copy from as our server environments are protected. MacBook Pro (Retina, 15-inch, Mid 2015) 2.2 GHz Intel Core i7 16 GB 1600 MHz DDR3 This is what we have come to expect, essentially making a class final doesn't give as much a speed boost as people come to think, as the modern jvm compilers like hotspot (there is another commercial one which I'm sure we all know of ;) but as its propriety/commercial shall avoid it for this discussion ), they really do a lot of magic under the hood for us. Whilst this article from way back when is very dated and now not relevant in regards to precise jvm internals as a lot has moved on. http://www.ibm.com/developerworks/java/library/j-jtp1029/index.html There is one core take away i think is important, and is more important when making a decision to use final or not. " final classes and methods can be a significant inconvenience when programming -- they limit your options for reusing existing code and extending the functionality of existing classes. While sometimes a class is made final for a good reason, such as to enforce immutability, the benefits of using final should outweigh the inconvenience. Performance enhancement is almost always a bad reason to compromise good object-oriented design principles, and when the performance enhancement is small or nonexistent, this is a bad trade-off indeed. " As stated, this is a change to API level, then i think really this should need a KIP, and no doubt some further tests and arguments on a KIP discussion for the pros and con's. Its worth noting also very recently the ProducerRecord and ConsumerRecord, for extensibility reasons were made non-final, if you take the current master. was (Author: michael.andre.pearce): Hi Matthias, When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per we understand you have done), the macbook was on El Capitan, and the cisco blade and nutanix vm are running RHEL7. The above stats copied are from running in particular on a laptop (but are inline with what we've seen
[jira] [Commented] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691571#comment-15691571 ] Michael Andre Pearce (IG) commented on KAFKA-4424: -- Hi Matthias, When we took your test and ran it this evening (with the same parameters) we get: # Run complete. Total time: 01:40:11 Benchmark Mode CntScore Error Units KafkaBenchmark.testFinalSerializer thrpt 1000 9378.179 ± 26.059 ops/s KafkaBenchmark.testFinalSerializerNoFlush thrpt 1000 1283796.450 ± 4976.711 ops/s KafkaBenchmark.testSerializer thrpt 1000 9325.273 ± 26.581 ops/s KafkaBenchmark.testSerializerNoFlush thrpt 1000 1289296.549 ± 5127.774 ops/s The performance difference we are seeing is very negligible at best. We have run this across a few machines (1 macbook, 1 cisco blade server, 1 nutanix vm) within our company and get similar results between final and non-final. (we actually had one result come in from the linux vm running on our nutanix clusters where the non-final was negligible faster, we repeated and it reversed to have the other negligible faster, but it does show that this seems to be negligible). We have run on all the machines using the latest Kakfa 0.10.1.0 version, on JDK 1.8.0_112, VM 25.112-b16 and kafka setup locally (as per we understand you have done), the macbook was on El Capitan, and the cisco blade and nutanix vm are running RHEL7. The above stats copied are from running in particular on a laptop (but are inline with what we've seen also in our server environment) just easier to copy from as our server environments are protected. MacBook Pro (Retina, 15-inch, Mid 2015) 2.2 GHz Intel Core i7 16 GB 1600 MHz DDR3 This is what we have come to expect, essentially making a class final doesn't give as much a speed boost as people come to think, as the modern jvm compilers like hotspot (there is another commercial one which I'm sure we all know of ;) but as its propriety/commercial shall avoid it for this discussion ), they really do a lot of magic under the hood for us. Whilst this article from way back when is very dated and now not relevant in regards to precise jvm internals as a lot has moved on. http://www.ibm.com/developerworks/java/library/j-jtp1029/index.html There is one core take away i think is important, and is more important when making a decision to use final or not. " final classes and methods can be a significant inconvenience when programming -- they limit your options for reusing existing code and extending the functionality of existing classes. While sometimes a class is made final for a good reason, such as to enforce immutability, the benefits of using final should outweigh the inconvenience. Performance enhancement is almost always a bad reason to compromise good object-oriented design principles, and when the performance enhancement is small or nonexistent, this is a bad trade-off indeed. " As stated, this is a change to API level, then i think really this should need a KIP, and no doubt some further tests and arguments on a KIP discussion for the pros and con's. Its worth noting also very recently the ProducerRecord and ConsumerRecord, for extensibility reasons were made non-final, if you take the current master. > Make serializer classes final > - > > Key: KAFKA-4424 > URL: https://issues.apache.org/jira/browse/KAFKA-4424 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Matthias Bechtold >Priority: Minor > > Implementations of simple serializers / deserializers should be final to > prevent JVM method call overhead. > See also: > https://wiki.openjdk.java.net/display/HotSpot/VirtualCalls > This breaks the API slightly, inheritors must change to generic interfaces > Serializer / Deserializer. But architecture-wise final serialization classes > make the most sense to me. > So what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4424) Make serializer classes final
[ https://issues.apache.org/jira/browse/KAFKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15681447#comment-15681447 ] Michael Andre Pearce (IG) commented on KAFKA-4424: -- Is there a supporting test to prove a significant performance improvement in an end 2 end setup after jit/jvm warm up? Obvious reasons is that this would break any code in organisations thats extended this as its part of the client api, so to make this change needs to prove a significant performance improvement % to warrant breaking these. > Make serializer classes final > - > > Key: KAFKA-4424 > URL: https://issues.apache.org/jira/browse/KAFKA-4424 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Matthias Bechtold >Priority: Minor > > Implementations of simple serializers / deserializers should be final to > prevent JVM method call overhead. > See also: > https://wiki.openjdk.java.net/display/HotSpot/VirtualCalls > This breaks the API slightly, inheritors must change to generic interfaces > Serializer / Deserializer. But architecture-wise final serialization classes > make the most sense to me. > So what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4341) KIP-82 - Add Compaction Tombstone Flag
Michael Andre Pearce (IG) created KAFKA-4341: Summary: KIP-82 - Add Compaction Tombstone Flag Key: KAFKA-4341 URL: https://issues.apache.org/jira/browse/KAFKA-4341 Project: Kafka Issue Type: Improvement Components: clients, core Affects Versions: 0.10.1.0 Reporter: Michael Andre Pearce (IG) Currently compaction works based on null value, there are some use cases where this is in effective or causes unclean work arounds (as discussed in KIP-82 discussion) this is to add a tombstone flag to enable compaction to also work where value is not null, but a producer needs to signify this to the broker. This JIRA is related to KIP found here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-87+-+Add+Compaction+Tombstone+Flag -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4341) KIP-87 - Add Compaction Tombstone Flag
[ https://issues.apache.org/jira/browse/KAFKA-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Andre Pearce (IG) updated KAFKA-4341: - Summary: KIP-87 - Add Compaction Tombstone Flag (was: KIP-82 - Add Compaction Tombstone Flag) > KIP-87 - Add Compaction Tombstone Flag > -- > > Key: KAFKA-4341 > URL: https://issues.apache.org/jira/browse/KAFKA-4341 > Project: Kafka > Issue Type: Improvement > Components: clients, core >Affects Versions: 0.10.1.0 >Reporter: Michael Andre Pearce (IG) > > Currently compaction works based on null value, there are some use cases > where this is in effective or causes unclean work arounds (as discussed in > KIP-82 discussion) this is to add a tombstone flag to enable compaction to > also work where value is not null, but a producer needs to signify this to > the broker. > This JIRA is related to KIP found here: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-87+-+Add+Compaction+Tombstone+Flag -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4208) Add Record Headers
Michael Andre Pearce (IG) created KAFKA-4208: Summary: Add Record Headers Key: KAFKA-4208 URL: https://issues.apache.org/jira/browse/KAFKA-4208 Project: Kafka Issue Type: New Feature Components: clients, core Reporter: Michael Andre Pearce (IG) Priority: Critical Currently headers are not natively supported unlike many transport and messaging platforms or standard, this is to add support for headers to kafka This JIRA is related to KIP found here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3846) Connect record types should include timestamps
[ https://issues.apache.org/jira/browse/KAFKA-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435842#comment-15435842 ] Michael Andre Pearce (IG) commented on KAFKA-3846: -- This looks great, is any documents updated to show how we configure bits like TimebasedPartitioner mentioned here http://docs.confluent.io/3.0.0/connect/connect-hdfs/docs/configuration_options.html to use this? Also is it possible to store this timestamp ( and other meta) in the hdfs/hive record it seems we can only store the payload. From the config options in connect. > Connect record types should include timestamps > -- > > Key: KAFKA-3846 > URL: https://issues.apache.org/jira/browse/KAFKA-3846 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect >Affects Versions: 0.10.0.0 >Reporter: Ewen Cheslack-Postava >Assignee: Shikhar Bhushan > Labels: needs-kip > Fix For: 0.10.1.0 > > > Timestamps were added to records in the previous release, however this does > not get propagated automatically to Connect because it uses custom wrappers > to add fields and rename some for clarity. > The addition of timestamps should be trivial, but can be really useful (e.g. > in sink connectors that would like to include timestamp info if available but > when it is not stored in the value). > This is public API so it will need a KIP despite being very uncontentious. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2260) Allow specifying expected offset on produce
[ https://issues.apache.org/jira/browse/KAFKA-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311853#comment-15311853 ] Michael Andre Pearce (IG) commented on KAFKA-2260: -- Is this still active/alive? would be a real shame if this was left to die and a shame it didn't make it into 0.10, the concept and neatness of the solution very useful. We would have many uses for this. > Allow specifying expected offset on produce > --- > > Key: KAFKA-2260 > URL: https://issues.apache.org/jira/browse/KAFKA-2260 > Project: Kafka > Issue Type: Improvement >Reporter: Ben Kirwin >Assignee: Ewen Cheslack-Postava >Priority: Minor > Attachments: KAFKA-2260.patch, expected-offsets.patch > > > I'd like to propose a change that adds a simple CAS-like mechanism to the > Kafka producer. This update has a small footprint, but enables a bunch of > interesting uses in stream processing or as a commit log for process state. > h4. Proposed Change > In short: > - Allow the user to attach a specific offset to each message produced. > - The server assigns offsets to messages in the usual way. However, if the > expected offset doesn't match the actual offset, the server should fail the > produce request instead of completing the write. > This is a form of optimistic concurrency control, like the ubiquitous > check-and-set -- but instead of checking the current value of some state, it > checks the current offset of the log. > h4. Motivation > Much like check-and-set, this feature is only useful when there's very low > contention. Happily, when Kafka is used as a commit log or as a > stream-processing transport, it's common to have just one producer (or a > small number) for a given partition -- and in many of these cases, predicting > offsets turns out to be quite useful. > - We get the same benefits as the 'idempotent producer' proposal: a producer > can retry a write indefinitely and be sure that at most one of those attempts > will succeed; and if two producers accidentally write to the end of the > partition at once, we can be certain that at least one of them will fail. > - It's possible to 'bulk load' Kafka this way -- you can write a list of n > messages consecutively to a partition, even if the list is much larger than > the buffer size or the producer has to be restarted. > - If a process is using Kafka as a commit log -- reading from a partition to > bootstrap, then writing any updates to that same partition -- it can be sure > that it's seen all of the messages in that partition at the moment it does > its first (successful) write. > There's a bunch of other similar use-cases here, but they all have roughly > the same flavour. > h4. Implementation > The major advantage of this proposal over other suggested transaction / > idempotency mechanisms is its minimality: it gives the 'obvious' meaning to a > currently-unused field, adds no new APIs, and requires very little new code > or additional work from the server. > - Produced messages already carry an offset field, which is currently ignored > by the server. This field could be used for the 'expected offset', with a > sigil value for the current behaviour. (-1 is a natural choice, since it's > already used to mean 'next available offset'.) > - We'd need a new error and error code for a 'CAS failure'. > - The server assigns offsets to produced messages in > {{ByteBufferMessageSet.validateMessagesAndAssignOffsets}}. After this > changed, this method would assign offsets in the same way -- but if they > don't match the offset in the message, we'd return an error instead of > completing the write. > - To avoid breaking existing clients, this behaviour would need to live > behind some config flag. (Possibly global, but probably more useful > per-topic?) > I understand all this is unsolicited and possibly strange: happy to answer > questions, and if this seems interesting, I'd be glad to flesh this out into > a full KIP or patch. (And apologies if this is the wrong venue for this sort > of thing!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)