[jira] [Commented] (KAFKA-12946) __consumer_offsets topic with very big partitions
[ https://issues.apache.org/jira/browse/KAFKA-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611018#comment-17611018 ] zhangzhisheng commented on KAFKA-12946: --- upgrade 2.4.2 > __consumer_offsets topic with very big partitions > - > > Key: KAFKA-12946 > URL: https://issues.apache.org/jira/browse/KAFKA-12946 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Emi >Priority: Critical > > I am using Kafka 2.0.0 with java 8u191 > There is a partitions of the __consumer_offsets topic that is 600 GB with > 6000 segments older than 4 months. Other partitions of that topic are small: > 20-30 MB. > There are 60 consumer groups, 90 topics and 100 partitions per topic. > There aren't errors in the logs. From the log of the logcleaner, I can see > that partition is never touched from the logcleaner thread for the > compaction, but it only add new segments. > How is this possible? > There was another partition with the same problem, but after some months it > has been compacted. Now there is only one partition with this problem, but > this is bigger and keep growing > I have used the kafka-dump-log tool to check these old segments and I can see > many duplicates. So I would assume that is not compacted. > My settings: > {{offsets.commit.required.acks = -1}} > {{[offsets.commit.timeout.ms|http://offsets.commit.timeout.ms/]}} = 5000 > {{offsets.load.buffer.size = 5242880}} > > {{[offsets.retention.check.interval.ms|http://offsets.retention.check.interval.ms/]}} > = 60 > {{offsets.retention.minutes = 10080}} > {{offsets.topic.compression.codec = 0}} > {{offsets.topic.num.partitions = 50}} > {{offsets.topic.replication.factor = 3}} > {{offsets.topic.segment.bytes = 104857600}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-12946) __consumer_offsets topic with very big partitions
[ https://issues.apache.org/jira/browse/KAFKA-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363805#comment-17363805 ] Ron Dagostino commented on KAFKA-12946: --- The only one I am familiar with and would recommend is the upgrade. > __consumer_offsets topic with very big partitions > - > > Key: KAFKA-12946 > URL: https://issues.apache.org/jira/browse/KAFKA-12946 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Emi >Priority: Critical > > I am using Kafka 2.0.0 with java 8u191 > There is a partitions of the __consumer_offsets topic that is 600 GB with > 6000 segments older than 4 months. Other partitions of that topic are small: > 20-30 MB. > There are 60 consumer groups, 90 topics and 100 partitions per topic. > There aren't errors in the logs. From the log of the logcleaner, I can see > that partition is never touched from the logcleaner thread for the > compaction, but it only add new segments. > How is this possible? > There was another partition with the same problem, but after some months it > has been compacted. Now there is only one partition with this problem, but > this is bigger and keep growing > I have used the kafka-dump-log tool to check these old segments and I can see > many duplicates. So I would assume that is not compacted. > My settings: > {{offsets.commit.required.acks = -1}} > {{[offsets.commit.timeout.ms|http://offsets.commit.timeout.ms/]}} = 5000 > {{offsets.load.buffer.size = 5242880}} > > {{[offsets.retention.check.interval.ms|http://offsets.retention.check.interval.ms/]}} > = 60 > {{offsets.retention.minutes = 10080}} > {{offsets.topic.compression.codec = 0}} > {{offsets.topic.num.partitions = 50}} > {{offsets.topic.replication.factor = 3}} > {{offsets.topic.segment.bytes = 104857600}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12946) __consumer_offsets topic with very big partitions
[ https://issues.apache.org/jira/browse/KAFKA-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363622#comment-17363622 ] Emi commented on KAFKA-12946: - Ok :) I found other possible solutions, for example: - increase the *log.cleaner.threads* - upgrade kafka - set the cleanup.policy=compact,delete for the topic *__consumer_offsets* for a while - delete the *cleaner-offset-checkpoint* file to force the compaction Could these solutions work in your opinion? I am on a production environment, so I have to be sure that these solutions are safe. Thanks again ;) > __consumer_offsets topic with very big partitions > - > > Key: KAFKA-12946 > URL: https://issues.apache.org/jira/browse/KAFKA-12946 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Emi >Priority: Critical > > I am using Kafka 2.0.0 with java 8u191 > There is a partitions of the __consumer_offsets topic that is 600 GB with > 6000 segments older than 4 months. Other partitions of that topic are small: > 20-30 MB. > There are 60 consumer groups, 90 topics and 100 partitions per topic. > There aren't errors in the logs. From the log of the logcleaner, I can see > that partition is never touched from the logcleaner thread for the > compaction, but it only add new segments. > How is this possible? > There was another partition with the same problem, but after some months it > has been compacted. Now there is only one partition with this problem, but > this is bigger and keep growing > I have used the kafka-dump-log tool to check these old segments and I can see > many duplicates. So I would assume that is not compacted. > My settings: > {{offsets.commit.required.acks = -1}} > {{[offsets.commit.timeout.ms|http://offsets.commit.timeout.ms/]}} = 5000 > {{offsets.load.buffer.size = 5242880}} > > {{[offsets.retention.check.interval.ms|http://offsets.retention.check.interval.ms/]}} > = 60 > {{offsets.retention.minutes = 10080}} > {{offsets.topic.compression.codec = 0}} > {{offsets.topic.num.partitions = 50}} > {{offsets.topic.replication.factor = 3}} > {{offsets.topic.segment.bytes = 104857600}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12946) __consumer_offsets topic with very big partitions
[ https://issues.apache.org/jira/browse/KAFKA-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363584#comment-17363584 ] Ron Dagostino commented on KAFKA-12946: --- Yeah, there are bugs. The KIP I referred to mentions one. There have also been several changes to make the log cleaner thread more robust to failure over time — even since the 2.0 version you are on. Upgrading might not help immediately, but you will want to leverage the KIP-664 tools at some point, so best to keep current. You should definitely read that KIP. > __consumer_offsets topic with very big partitions > - > > Key: KAFKA-12946 > URL: https://issues.apache.org/jira/browse/KAFKA-12946 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Emi >Priority: Critical > > I am using Kafka 2.0.0 with java 8u191 > There is a partitions of the __consumer_offsets topic that is 600 GB with > 6000 segments older than 4 months. Other partitions of that topic are small: > 20-30 MB. > There are 60 consumer groups, 90 topics and 100 partitions per topic. > There aren't errors in the logs. From the log of the logcleaner, I can see > that partition is never touched from the logcleaner thread for the > compaction, but it only add new segments. > How is this possible? > There was another partition with the same problem, but after some months it > has been compacted. Now there is only one partition with this problem, but > this is bigger and keep growing > I have used the kafka-dump-log tool to check these old segments and I can see > many duplicates. So I would assume that is not compacted. > My settings: > {{offsets.commit.required.acks = -1}} > {{[offsets.commit.timeout.ms|http://offsets.commit.timeout.ms/]}} = 5000 > {{offsets.load.buffer.size = 5242880}} > > {{[offsets.retention.check.interval.ms|http://offsets.retention.check.interval.ms/]}} > = 60 > {{offsets.retention.minutes = 10080}} > {{offsets.topic.compression.codec = 0}} > {{offsets.topic.num.partitions = 50}} > {{offsets.topic.replication.factor = 3}} > {{offsets.topic.segment.bytes = 104857600}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12946) __consumer_offsets topic with very big partitions
[ https://issues.apache.org/jira/browse/KAFKA-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363522#comment-17363522 ] Emi commented on KAFKA-12946: - [~rndgstn] Interesting, it could be a solution that I am going to consider. But I am more interested to know why this happen. So, why is there this very big partition in the __consumer_offsets topic? Is it really a bug of Kafka? > __consumer_offsets topic with very big partitions > - > > Key: KAFKA-12946 > URL: https://issues.apache.org/jira/browse/KAFKA-12946 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Emi >Priority: Critical > > I am using Kafka 2.0.0 with java 8u191 > There is a partitions of the __consumer_offsets topic that is 600 GB with > 6000 segments older than 4 months. Other partitions of that topic are small: > 20-30 MB. > There are 60 consumer groups, 90 topics and 100 partitions per topic. > There aren't errors in the logs. From the log of the logcleaner, I can see > that partition is never touched from the logcleaner thread for the > compaction, but it only add new segments. > How is this possible? > There was another partition with the same problem, but after some months it > has been compacted. Now there is only one partition with this problem, but > this is bigger and keep growing > I have used the kafka-dump-log tool to check these old segments and I can see > many duplicates. So I would assume that is not compacted. > My settings: > {{offsets.commit.required.acks = -1}} > {{[offsets.commit.timeout.ms|http://offsets.commit.timeout.ms/]}} = 5000 > {{offsets.load.buffer.size = 5242880}} > > {{[offsets.retention.check.interval.ms|http://offsets.retention.check.interval.ms/]}} > = 60 > {{offsets.retention.minutes = 10080}} > {{offsets.topic.compression.codec = 0}} > {{offsets.topic.num.partitions = 50}} > {{offsets.topic.replication.factor = 3}} > {{offsets.topic.segment.bytes = 104857600}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12946) __consumer_offsets topic with very big partitions
[ https://issues.apache.org/jira/browse/KAFKA-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363496#comment-17363496 ] Ron Dagostino commented on KAFKA-12946: --- I mean if you take a look at the size on disk, is the size of the log significantly smaller? Broker 0 might be the leader for partition 0 with 600 GB of size. Maybe broker 1 is a follower with about the same 600 GB size, but perhaps broker 2 is a follower with just 100 MB. It is unexplained why this would occur, but it is possible, and if so then you can make 2 the leader, move 1 to 3, move 3 back to 1, move 0 to 3, move 3 back to 0, and then make 0 the leader again -- now you have the same leadership and followers as before but 100 MB on all 3 replicas. > __consumer_offsets topic with very big partitions > - > > Key: KAFKA-12946 > URL: https://issues.apache.org/jira/browse/KAFKA-12946 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Emi >Priority: Critical > > I am using Kafka 2.0.0 with java 8u191 > There is a partitions of the __consumer_offsets topic that is 600 GB with > 6000 segments older than 4 months. Other partitions of that topic are small: > 20-30 MB. > There are 60 consumer groups, 90 topics and 100 partitions per topic. > There aren't errors in the logs. From the log of the logcleaner, I can see > that partition is never touched from the logcleaner thread for the > compaction, but it only add new segments. > How is this possible? > There was another partition with the same problem, but after some months it > has been compacted. Now there is only one partition with this problem, but > this is bigger and keep growing > I have used the kafka-dump-log tool to check these old segments and I can see > many duplicates. So I would assume that is not compacted. > My settings: > {{offsets.commit.required.acks = -1}} > {{[offsets.commit.timeout.ms|http://offsets.commit.timeout.ms/]}} = 5000 > {{offsets.load.buffer.size = 5242880}} > > {{[offsets.retention.check.interval.ms|http://offsets.retention.check.interval.ms/]}} > = 60 > {{offsets.retention.minutes = 10080}} > {{offsets.topic.compression.codec = 0}} > {{offsets.topic.num.partitions = 50}} > {{offsets.topic.replication.factor = 3}} > {{offsets.topic.segment.bytes = 104857600}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12946) __consumer_offsets topic with very big partitions
[ https://issues.apache.org/jira/browse/KAFKA-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363203#comment-17363203 ] Emi commented on KAFKA-12946: - [~rndgstn] What do you mean for "has a significantly smaller size than the leader"? Thanks > __consumer_offsets topic with very big partitions > - > > Key: KAFKA-12946 > URL: https://issues.apache.org/jira/browse/KAFKA-12946 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Emi >Priority: Critical > > I am using Kafka 2.0.0 with java 8u191 > There is a partitions of the __consumer_offsets topic that is 600 GB with > 6000 segments older than 4 months. Other partitions of that topic are small: > 20-30 MB. > There are 60 consumer groups, 90 topics and 100 partitions per topic. > There aren't errors in the logs. From the log of the logcleaner, I can see > that partition is never touched from the logcleaner thread for the > compaction, but it only add new segments. > How is this possible? > There was another partition with the same problem, but after some months it > has been compacted. Now there is only one partition with this problem, but > this is bigger and keep growing > I have used the kafka-dump-log tool to check these old segments and I can see > many duplicates. So I would assume that is not compacted. > My settings: > {{offsets.commit.required.acks = -1}} > {{[offsets.commit.timeout.ms|http://offsets.commit.timeout.ms/]}} = 5000 > {{offsets.load.buffer.size = 5242880}} > > {{[offsets.retention.check.interval.ms|http://offsets.retention.check.interval.ms/]}} > = 60 > {{offsets.retention.minutes = 10080}} > {{offsets.topic.compression.codec = 0}} > {{offsets.topic.num.partitions = 50}} > {{offsets.topic.replication.factor = 3}} > {{offsets.topic.segment.bytes = 104857600}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12946) __consumer_offsets topic with very big partitions
[ https://issues.apache.org/jira/browse/KAFKA-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363155#comment-17363155 ] Ron Dagostino commented on KAFKA-12946: --- If the partition isn't being cleaned then you can try setting min.cleanable.dirty.ratio=0 for the __consumer_offsets topic; this might allow it to get cleaned. You can delete that config after a while to let the value default back. Another possibility might exist if one of the follower replicas has a significantly smaller size than the leader; in such cases you can move leadership to the smaller replica and then reassign the follower replicas to new brokers so that they will copy the (much smaller-sized) data; then you can migrate the followers back to where they were originally and move the leader back to the original leader. This solution will only work if you have more brokers than the replication factor. Finally, take a look at https://cwiki.apache.org/confluence/display/KAFKA/KIP-664%3A+Provide+tooling+to+detect+and+abort+hanging+transactions. You may not have any other options right now if it is a hanging transaction, but help is coming. > __consumer_offsets topic with very big partitions > - > > Key: KAFKA-12946 > URL: https://issues.apache.org/jira/browse/KAFKA-12946 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.0.0 >Reporter: Emi >Priority: Critical > > I am using Kafka 2.0.0 with java 8u191 > There is a partitions of the __consumer_offsets topic that is 600 GB with > 6000 segments older than 4 months. Other partitions of that topic are small: > 20-30 MB. > There are 60 consumer groups, 90 topics and 100 partitions per topic. > There aren't errors in the logs. From the log of the logcleaner, I can see > that partition is never touched from the logcleaner thread for the > compaction, but it only add new segments. > How is this possible? > There was another partition with the same problem, but after some months it > has been compacted. Now there is only one partition with this problem, but > this is bigger and keep growing > I have used the kafka-dump-log tool to check these old segments and I can see > many duplicates. So I would assume that is not compacted. > My settings: > {{offsets.commit.required.acks = -1}} > {{[offsets.commit.timeout.ms|http://offsets.commit.timeout.ms/]}} = 5000 > {{offsets.load.buffer.size = 5242880}} > > {{[offsets.retention.check.interval.ms|http://offsets.retention.check.interval.ms/]}} > = 60 > {{offsets.retention.minutes = 10080}} > {{offsets.topic.compression.codec = 0}} > {{offsets.topic.num.partitions = 50}} > {{offsets.topic.replication.factor = 3}} > {{offsets.topic.segment.bytes = 104857600}} -- This message was sent by Atlassian Jira (v8.3.4#803005)