[jira] [Commented] (KAFKA-2235) LogCleaner offset map overflow
[ https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974667#comment-14974667 ] Jun Rao commented on KAFKA-2235: One way to get around your problem now is to set log.cleaner.io.buffer.load.factor to a larger value. Currently, it doesn't seem that we check the range of the value. So you can set it to a value larger than 1.0, which will allow you to build a bigger offset map with the same buffer size. > LogCleaner offset map overflow > -- > > Key: KAFKA-2235 > URL: https://issues.apache.org/jira/browse/KAFKA-2235 > Project: Kafka > Issue Type: Bug > Components: core, log >Affects Versions: 0.8.1, 0.8.2.0 >Reporter: Ivan Simoneko >Assignee: Ivan Simoneko > Fix For: 0.9.0.0 > > Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch > > > We've seen log cleaning generating an error for a topic with lots of small > messages. It seems that cleanup map overflow is possible if a log segment > contains more unique keys than empty slots in offsetMap. Check for baseOffset > and map utilization before processing segment seems to be not enough because > it doesn't take into account segment size (number of unique messages in the > segment). > I suggest to estimate upper bound of keys in a segment as a number of > messages in the segment and compare it with the number of available slots in > the map (keeping in mind desired load factor). It should work in cases where > an empty map is capable to hold all the keys for a single segment. If even a > single segment no able to fit into an empty map cleanup process will still > fail. Probably there should be a limit on the log segment entries count? > Here is the stack trace for this error: > 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] > kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to > java.lang.IllegalArgumentException: requirement failed: Attempt to add a new > entry to a full offset map. >at scala.Predef$.require(Predef.scala:233) >at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538) >at scala.collection.Iterator$class.foreach(Iterator.scala:727) >at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) >at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) >at kafka.message.MessageSet.foreach(MessageSet.scala:67) >at > kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512) >at scala.collection.immutable.Stream.foreach(Stream.scala:547) >at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512) >at kafka.log.Cleaner.clean(LogCleaner.scala:307) >at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221) >at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199) >at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2235) LogCleaner offset map overflow
[ https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974502#comment-14974502 ] Joel Koshy commented on KAFKA-2235: --- Agreed with Todd - this is very similar to the proposal in btw, https://issues.apache.org/jira/browse/KAFKA-1755?focusedCommentId=14216486 > LogCleaner offset map overflow > -- > > Key: KAFKA-2235 > URL: https://issues.apache.org/jira/browse/KAFKA-2235 > Project: Kafka > Issue Type: Bug > Components: core, log >Affects Versions: 0.8.1, 0.8.2.0 >Reporter: Ivan Simoneko >Assignee: Ivan Simoneko > Fix For: 0.9.0.0 > > Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch > > > We've seen log cleaning generating an error for a topic with lots of small > messages. It seems that cleanup map overflow is possible if a log segment > contains more unique keys than empty slots in offsetMap. Check for baseOffset > and map utilization before processing segment seems to be not enough because > it doesn't take into account segment size (number of unique messages in the > segment). > I suggest to estimate upper bound of keys in a segment as a number of > messages in the segment and compare it with the number of available slots in > the map (keeping in mind desired load factor). It should work in cases where > an empty map is capable to hold all the keys for a single segment. If even a > single segment no able to fit into an empty map cleanup process will still > fail. Probably there should be a limit on the log segment entries count? > Here is the stack trace for this error: > 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] > kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to > java.lang.IllegalArgumentException: requirement failed: Attempt to add a new > entry to a full offset map. >at scala.Predef$.require(Predef.scala:233) >at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538) >at scala.collection.Iterator$class.foreach(Iterator.scala:727) >at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) >at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) >at kafka.message.MessageSet.foreach(MessageSet.scala:67) >at > kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512) >at scala.collection.immutable.Stream.foreach(Stream.scala:547) >at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512) >at kafka.log.Cleaner.clean(LogCleaner.scala:307) >at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221) >at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199) >at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2235) LogCleaner offset map overflow
[ https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971728#comment-14971728 ] Joel Koshy commented on KAFKA-2235: --- While I think the change is well-motivated I'm not sure this is the right fix for this issue as the check is too conservative. i.e., especially with highly compressible messages the message-count in the segment may be extremely high but the unique-key-count may be low. > LogCleaner offset map overflow > -- > > Key: KAFKA-2235 > URL: https://issues.apache.org/jira/browse/KAFKA-2235 > Project: Kafka > Issue Type: Bug > Components: core, log >Affects Versions: 0.8.1, 0.8.2.0 >Reporter: Ivan Simoneko >Assignee: Ivan Simoneko > Fix For: 0.9.0.0 > > Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch > > > We've seen log cleaning generating an error for a topic with lots of small > messages. It seems that cleanup map overflow is possible if a log segment > contains more unique keys than empty slots in offsetMap. Check for baseOffset > and map utilization before processing segment seems to be not enough because > it doesn't take into account segment size (number of unique messages in the > segment). > I suggest to estimate upper bound of keys in a segment as a number of > messages in the segment and compare it with the number of available slots in > the map (keeping in mind desired load factor). It should work in cases where > an empty map is capable to hold all the keys for a single segment. If even a > single segment no able to fit into an empty map cleanup process will still > fail. Probably there should be a limit on the log segment entries count? > Here is the stack trace for this error: > 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] > kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to > java.lang.IllegalArgumentException: requirement failed: Attempt to add a new > entry to a full offset map. >at scala.Predef$.require(Predef.scala:233) >at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538) >at scala.collection.Iterator$class.foreach(Iterator.scala:727) >at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) >at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) >at kafka.message.MessageSet.foreach(MessageSet.scala:67) >at > kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512) >at scala.collection.immutable.Stream.foreach(Stream.scala:547) >at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512) >at kafka.log.Cleaner.clean(LogCleaner.scala:307) >at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221) >at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199) >at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2235) LogCleaner offset map overflow
[ https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971806#comment-14971806 ] Todd Palino commented on KAFKA-2235: I don't think we can. I have already increased it from 512MB to 1GB, and we still hit the same problems. That only provides a 2x increase in the size of the map, and I would need almost a 10x increase to solve the problem. > LogCleaner offset map overflow > -- > > Key: KAFKA-2235 > URL: https://issues.apache.org/jira/browse/KAFKA-2235 > Project: Kafka > Issue Type: Bug > Components: core, log >Affects Versions: 0.8.1, 0.8.2.0 >Reporter: Ivan Simoneko >Assignee: Ivan Simoneko > Fix For: 0.9.0.0 > > Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch > > > We've seen log cleaning generating an error for a topic with lots of small > messages. It seems that cleanup map overflow is possible if a log segment > contains more unique keys than empty slots in offsetMap. Check for baseOffset > and map utilization before processing segment seems to be not enough because > it doesn't take into account segment size (number of unique messages in the > segment). > I suggest to estimate upper bound of keys in a segment as a number of > messages in the segment and compare it with the number of available slots in > the map (keeping in mind desired load factor). It should work in cases where > an empty map is capable to hold all the keys for a single segment. If even a > single segment no able to fit into an empty map cleanup process will still > fail. Probably there should be a limit on the log segment entries count? > Here is the stack trace for this error: > 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] > kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to > java.lang.IllegalArgumentException: requirement failed: Attempt to add a new > entry to a full offset map. >at scala.Predef$.require(Predef.scala:233) >at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538) >at scala.collection.Iterator$class.foreach(Iterator.scala:727) >at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) >at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) >at kafka.message.MessageSet.foreach(MessageSet.scala:67) >at > kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512) >at scala.collection.immutable.Stream.foreach(Stream.scala:547) >at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512) >at kafka.log.Cleaner.clean(LogCleaner.scala:307) >at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221) >at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199) >at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2235) LogCleaner offset map overflow
[ https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971740#comment-14971740 ] Todd Palino commented on KAFKA-2235: I'm sure [~jjkoshy] will follow along with more detail on this, but we've run into a serious problem with this check. Basically, it's impossible to perform this kind of check accurately before the offset map is built. We now have partitions that should be able to be compacted as the total number of unique keys is far below the size of the offset map (currently at ~39 million for our configuration) but the messages are very frequent and very small. Even at a segment size of 64 MB, we have over 300 million messages in those segments. So this check creates a situation where log compaction should succeed, but fails because of a speculative check. While I can play the game of trying to walk back segment sizes, there's no way to size segments by number of messages, so it's a guessing game. In addition, the check is clearly wrong in that case, so I shouldn't have to config around it. Lastly, the check causes the log cleaner thread to exit, which means log compaction on the broker fails entirely, rather than just skipping that partition. A better way to handle this would be to cleanly catch the original error you are seeing, generate a clear error message in the logs as to what the failure is, and allow the log cleaner to continue and handle other partitions. You could also maintain a blacklist of partitions in memory in the log cleaner to make sure you don't come back around and try and compact the partition again. > LogCleaner offset map overflow > -- > > Key: KAFKA-2235 > URL: https://issues.apache.org/jira/browse/KAFKA-2235 > Project: Kafka > Issue Type: Bug > Components: core, log >Affects Versions: 0.8.1, 0.8.2.0 >Reporter: Ivan Simoneko >Assignee: Ivan Simoneko > Fix For: 0.9.0.0 > > Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch > > > We've seen log cleaning generating an error for a topic with lots of small > messages. It seems that cleanup map overflow is possible if a log segment > contains more unique keys than empty slots in offsetMap. Check for baseOffset > and map utilization before processing segment seems to be not enough because > it doesn't take into account segment size (number of unique messages in the > segment). > I suggest to estimate upper bound of keys in a segment as a number of > messages in the segment and compare it with the number of available slots in > the map (keeping in mind desired load factor). It should work in cases where > an empty map is capable to hold all the keys for a single segment. If even a > single segment no able to fit into an empty map cleanup process will still > fail. Probably there should be a limit on the log segment entries count? > Here is the stack trace for this error: > 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] > kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to > java.lang.IllegalArgumentException: requirement failed: Attempt to add a new > entry to a full offset map. >at scala.Predef$.require(Predef.scala:233) >at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538) >at scala.collection.Iterator$class.foreach(Iterator.scala:727) >at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) >at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) >at kafka.message.MessageSet.foreach(MessageSet.scala:67) >at > kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512) >at scala.collection.immutable.Stream.foreach(Stream.scala:547) >at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512) >at kafka.log.Cleaner.clean(LogCleaner.scala:307) >at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221) >at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199) >at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2235) LogCleaner offset map overflow
[ https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971804#comment-14971804 ] Jun Rao commented on KAFKA-2235: [~toddpalino], thanks for reporting this issue. I agree that it's better to log an error and then continue. For your use case, it seems that you can just increase log.cleaner.dedupe.buffer.size. > LogCleaner offset map overflow > -- > > Key: KAFKA-2235 > URL: https://issues.apache.org/jira/browse/KAFKA-2235 > Project: Kafka > Issue Type: Bug > Components: core, log >Affects Versions: 0.8.1, 0.8.2.0 >Reporter: Ivan Simoneko >Assignee: Ivan Simoneko > Fix For: 0.9.0.0 > > Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch > > > We've seen log cleaning generating an error for a topic with lots of small > messages. It seems that cleanup map overflow is possible if a log segment > contains more unique keys than empty slots in offsetMap. Check for baseOffset > and map utilization before processing segment seems to be not enough because > it doesn't take into account segment size (number of unique messages in the > segment). > I suggest to estimate upper bound of keys in a segment as a number of > messages in the segment and compare it with the number of available slots in > the map (keeping in mind desired load factor). It should work in cases where > an empty map is capable to hold all the keys for a single segment. If even a > single segment no able to fit into an empty map cleanup process will still > fail. Probably there should be a limit on the log segment entries count? > Here is the stack trace for this error: > 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] > kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to > java.lang.IllegalArgumentException: requirement failed: Attempt to add a new > entry to a full offset map. >at scala.Predef$.require(Predef.scala:233) >at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538) >at scala.collection.Iterator$class.foreach(Iterator.scala:727) >at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) >at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) >at kafka.message.MessageSet.foreach(MessageSet.scala:67) >at > kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512) >at scala.collection.immutable.Stream.foreach(Stream.scala:547) >at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512) >at kafka.log.Cleaner.clean(LogCleaner.scala:307) >at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221) >at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199) >at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2235) LogCleaner offset map overflow
[ https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971848#comment-14971848 ] Ivan Simoneko commented on KAFKA-2235: -- The core problem is that offset map is limited by number of messages while segment is limited by size in bytes and this patch doesn't fix it. But the way it handles the problem (just throws exception end stops compacting) is consistent with other errors in compactor. > LogCleaner offset map overflow > -- > > Key: KAFKA-2235 > URL: https://issues.apache.org/jira/browse/KAFKA-2235 > Project: Kafka > Issue Type: Bug > Components: core, log >Affects Versions: 0.8.1, 0.8.2.0 >Reporter: Ivan Simoneko >Assignee: Ivan Simoneko > Fix For: 0.9.0.0 > > Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch > > > We've seen log cleaning generating an error for a topic with lots of small > messages. It seems that cleanup map overflow is possible if a log segment > contains more unique keys than empty slots in offsetMap. Check for baseOffset > and map utilization before processing segment seems to be not enough because > it doesn't take into account segment size (number of unique messages in the > segment). > I suggest to estimate upper bound of keys in a segment as a number of > messages in the segment and compare it with the number of available slots in > the map (keeping in mind desired load factor). It should work in cases where > an empty map is capable to hold all the keys for a single segment. If even a > single segment no able to fit into an empty map cleanup process will still > fail. Probably there should be a limit on the log segment entries count? > Here is the stack trace for this error: > 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] > kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to > java.lang.IllegalArgumentException: requirement failed: Attempt to add a new > entry to a full offset map. >at scala.Predef$.require(Predef.scala:233) >at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543) >at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538) >at scala.collection.Iterator$class.foreach(Iterator.scala:727) >at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) >at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) >at kafka.message.MessageSet.foreach(MessageSet.scala:67) >at > kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512) >at scala.collection.immutable.Stream.foreach(Stream.scala:547) >at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512) >at kafka.log.Cleaner.clean(LogCleaner.scala:307) >at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221) >at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199) >at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2235) LogCleaner offset map overflow
[ https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595798#comment-14595798 ] Ivan Simoneko commented on KAFKA-2235: -- [~junrao] thank you for review. Please check the patch v2. I think in most cases mentioning log.cleaner.dedupe.buffer.size should be enough, but as log.cleaner.threads is also used in determining map size I've added both of them. If someone increases threads num and start getting this message he can easily understand cause of the problem LogCleaner offset map overflow -- Key: KAFKA-2235 URL: https://issues.apache.org/jira/browse/KAFKA-2235 Project: Kafka Issue Type: Bug Components: core, log Affects Versions: 0.8.1, 0.8.2.0 Reporter: Ivan Simoneko Assignee: Jay Kreps Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch We've seen log cleaning generating an error for a topic with lots of small messages. It seems that cleanup map overflow is possible if a log segment contains more unique keys than empty slots in offsetMap. Check for baseOffset and map utilization before processing segment seems to be not enough because it doesn't take into account segment size (number of unique messages in the segment). I suggest to estimate upper bound of keys in a segment as a number of messages in the segment and compare it with the number of available slots in the map (keeping in mind desired load factor). It should work in cases where an empty map is capable to hold all the keys for a single segment. If even a single segment no able to fit into an empty map cleanup process will still fail. Probably there should be a limit on the log segment entries count? Here is the stack trace for this error: 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to java.lang.IllegalArgumentException: requirement failed: Attempt to add a new entry to a full offset map. at scala.Predef$.require(Predef.scala:233) at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79) at kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543) at kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at kafka.message.MessageSet.foreach(MessageSet.scala:67) at kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538) at kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515) at kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512) at scala.collection.immutable.Stream.foreach(Stream.scala:547) at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512) at kafka.log.Cleaner.clean(LogCleaner.scala:307) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) -- This message was sent by Atlassian JIRA (v6.3.4#6332)