[jira] [Updated] (KAFKA-4141) 2x increase in cpu usage on new Producer API
[ https://issues.apache.org/jira/browse/KAFKA-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jorgensen updated KAFKA-4141: Description: We are seeing about a 2x increase in CPU usage for the new kafka producer compared to the 0.8.1.1 producer. We are currently using gzip compression. We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and noticed that the cpu usage for the new producer had increased pretty significantly compared to the old producer. This has caused us to need more resources to do the same amount of work we were doing before. I did some quick profiling and it looks like during sending half of the cpu cycles are spent in org.apache.kafka.common.record.Compressor.putRecord and the other half are in org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of the cpu cycles for that method). I know its not apples to apples but the old producer did not seem to have this overhead or at least it was greatly reduced. Is this a known performance degradation compared to the old producer? h2. old producer: !http://imgur.com/1xS34Dl.jpg! h2. new producer: !http://imgur.com/0w0G5b1.jpg! was: We are seeing about a 2x increase in CPU usage for the new kafka producer compared to the 0.8.1.1 producer. We are currently using gzip compression. We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and noticed that the cpu usage for the new producer had increased pretty significantly compared to the old producer. This has caused us to need more resources to do the same amount of work we were doing before. I did some quick profiling and it looks like during sending half of the cpu cycles are sped in org.apache.kafka.common.record.Compressor.putRecord and the other half is in org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of the cpu cycles for that method). I know its not apples to apples but the old producer did not seem to have this overhead or at least it was greatly reduced. Is this a known performance degradation compared to the old producer? h2. old producer: !http://imgur.com/1xS34Dl.jpg! h2. new producer: !http://imgur.com/0w0G5b1.jpg! > 2x increase in cpu usage on new Producer API > > > Key: KAFKA-4141 > URL: https://issues.apache.org/jira/browse/KAFKA-4141 > Project: Kafka > Issue Type: Bug >Reporter: Andrew Jorgensen > > We are seeing about a 2x increase in CPU usage for the new kafka producer > compared to the 0.8.1.1 producer. We are currently using gzip compression. > We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 > and noticed that the cpu usage for the new producer had increased pretty > significantly compared to the old producer. This has caused us to need more > resources to do the same amount of work we were doing before. I did some > quick profiling and it looks like during sending half of the cpu cycles are > spent in org.apache.kafka.common.record.Compressor.putRecord and the other > half are in org.apache.kafka.common.record.Record.computeChecksum (both are > around 5.8% of the cpu cycles for that method). I know its not apples to > apples but the old producer did not seem to have this overhead or at least it > was greatly reduced. > Is this a known performance degradation compared to the old producer? > h2. old producer: > !http://imgur.com/1xS34Dl.jpg! > h2. new producer: > !http://imgur.com/0w0G5b1.jpg! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4141) 2x increase in cpu usage on new Producer API
[ https://issues.apache.org/jira/browse/KAFKA-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jorgensen updated KAFKA-4141: Description: We are seeing about a 2x increase in CPU usage for the new kafka producer compared to the 0.8.1.1 producer. We are currently using gzip compression. We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and noticed that the cpu usage for the new producer had increased pretty significantly compared to the old producer. This has caused us to need more resources to do the same amount of work we were doing before. I did some quick profiling and it looks like during sending half of the cpu cycles are sped in org.apache.kafka.common.record.Compressor.putRecord and the other half is in org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of the cpu cycles for that method). I know its not apples to apples but the old producer did not seem to have this overhead or at least it was greatly reduced. Is this a known performance degradation compared to the old producer? h2. old producer: !http://imgur.com/1xS34Dl.jpg! h2. new producer: !http://imgur.com/0w0G5b1.jpg! was: We are seeing about a 2x increase in CPU usage for the new kafka producer compared to the 0.8.0.1 producer. We are currently using gzip compression. We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and noticed that the cpu usage for the new producer had increased pretty significantly compared to the old producer. This has caused us to need more resources to do the same amount of work we were doing before. I did some quick profiling and it looks like during sending half of the cpu cycles are sped in org.apache.kafka.common.record.Compressor.putRecord and the other half is in org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of the cpu cycles for that method). I know its not apples to apples but the old producer did not seem to have this overhead or at least it was greatly reduced. Is this a known performance degradation compared to the old producer? h2. old producer: !http://imgur.com/1xS34Dl.jpg! h2. new producer: !http://imgur.com/0w0G5b1.jpg! > 2x increase in cpu usage on new Producer API > > > Key: KAFKA-4141 > URL: https://issues.apache.org/jira/browse/KAFKA-4141 > Project: Kafka > Issue Type: Bug >Reporter: Andrew Jorgensen > > We are seeing about a 2x increase in CPU usage for the new kafka producer > compared to the 0.8.1.1 producer. We are currently using gzip compression. > We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 > and noticed that the cpu usage for the new producer had increased pretty > significantly compared to the old producer. This has caused us to need more > resources to do the same amount of work we were doing before. I did some > quick profiling and it looks like during sending half of the cpu cycles are > sped in org.apache.kafka.common.record.Compressor.putRecord and the other > half is in org.apache.kafka.common.record.Record.computeChecksum (both are > around 5.8% of the cpu cycles for that method). I know its not apples to > apples but the old producer did not seem to have this overhead or at least it > was greatly reduced. > Is this a known performance degradation compared to the old producer? > h2. old producer: > !http://imgur.com/1xS34Dl.jpg! > h2. new producer: > !http://imgur.com/0w0G5b1.jpg! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4141) 2x increase in cpu usage on new Producer API
[ https://issues.apache.org/jira/browse/KAFKA-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jorgensen updated KAFKA-4141: Description: We are seeing about a 2x increase in CPU usage for the new kafka producer compared to the 0.8.0.1 producer. We are currently using gzip compression. We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and noticed that the cpu usage for the new producer had increased pretty significantly compared to the old producer. This has caused us to need more resources to do the same amount of work we were doing before. I did some quick profiling and it looks like during sending half of the cpu cycles are sped in org.apache.kafka.common.record.Compressor.putRecord and the other half is in org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of the cpu cycles for that method). I know its not apples to apples but the old producer did not seem to have this overhead or at least it was greatly reduced. Is this a known performance degradation compared to the old producer? h2. old producer: !http://imgur.com/1xS34Dl.jpg! h2. new producer: !http://imgur.com/0w0G5b1.jpg! was: We are seeing about a 2x increase in CPU usage for the new kafka producer compared to the 0.8.0.1 producer. We are currently using gzip compression. We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and noticed that the cpu usage for the new producer had increased pretty significantly compared to the old producer. This has caused us to need more resources to do the same amount of work we were doing before. I did some quick profiling and it looks like during sending half of the cpu cycles are sped in org.apache.kafka.common.record.Compressor.putRecord and the other half is in org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of the cpu cycles for that method). I know its not apples to apples but the old producer did not seem to have this overhead or at least it was greatly reduced. Is this a known performance degradation compared to the old producer? Here is the trace from the old producer: !http://imgur.com/1xS34Dl.jpg! New producer: !http://imgur.com/0w0G5b1.jpg! > 2x increase in cpu usage on new Producer API > > > Key: KAFKA-4141 > URL: https://issues.apache.org/jira/browse/KAFKA-4141 > Project: Kafka > Issue Type: Bug >Reporter: Andrew Jorgensen > > We are seeing about a 2x increase in CPU usage for the new kafka producer > compared to the 0.8.0.1 producer. We are currently using gzip compression. > We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 > and noticed that the cpu usage for the new producer had increased pretty > significantly compared to the old producer. This has caused us to need more > resources to do the same amount of work we were doing before. I did some > quick profiling and it looks like during sending half of the cpu cycles are > sped in org.apache.kafka.common.record.Compressor.putRecord and the other > half is in org.apache.kafka.common.record.Record.computeChecksum (both are > around 5.8% of the cpu cycles for that method). I know its not apples to > apples but the old producer did not seem to have this overhead or at least it > was greatly reduced. > Is this a known performance degradation compared to the old producer? > h2. old producer: > !http://imgur.com/1xS34Dl.jpg! > h2. new producer: > !http://imgur.com/0w0G5b1.jpg! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4141) 2x increase in cp usage on new Producer API
Andrew Jorgensen created KAFKA-4141: --- Summary: 2x increase in cp usage on new Producer API Key: KAFKA-4141 URL: https://issues.apache.org/jira/browse/KAFKA-4141 Project: Kafka Issue Type: Bug Reporter: Andrew Jorgensen We are seeing about a 2x increase in CPU usage for the new kafka producer compared to the 0.8.0.1 producer. We are currently using gzip compression. We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and noticed that the cpu usage for the new producer had increased pretty significantly compared to the old producer. This has caused us to need more resources to do the same amount of work we were doing before. I did some quick profiling and it looks like during sending half of the cpu cycles are sped in org.apache.kafka.common.record.Compressor.putRecord and the other half is in org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of the cpu cycles for that method). I know its not apples to apples but the old producer did not seem to have this overhead or at least it was greatly reduced. Is this a known performance degradation compared to the old producer? Here is the trace from the old producer: !http://imgur.com/1xS34Dl.jpg! New producer: !http://imgur.com/0w0G5b1.jpg! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4141) 2x increase in cpu usage on new Producer API
[ https://issues.apache.org/jira/browse/KAFKA-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jorgensen updated KAFKA-4141: Summary: 2x increase in cpu usage on new Producer API (was: 2x increase in cp usage on new Producer API) > 2x increase in cpu usage on new Producer API > > > Key: KAFKA-4141 > URL: https://issues.apache.org/jira/browse/KAFKA-4141 > Project: Kafka > Issue Type: Bug >Reporter: Andrew Jorgensen > > We are seeing about a 2x increase in CPU usage for the new kafka producer > compared to the 0.8.0.1 producer. We are currently using gzip compression. > We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 > and noticed that the cpu usage for the new producer had increased pretty > significantly compared to the old producer. This has caused us to need more > resources to do the same amount of work we were doing before. I did some > quick profiling and it looks like during sending half of the cpu cycles are > sped in org.apache.kafka.common.record.Compressor.putRecord and the other > half is in org.apache.kafka.common.record.Record.computeChecksum (both are > around 5.8% of the cpu cycles for that method). I know its not apples to > apples but the old producer did not seem to have this overhead or at least it > was greatly reduced. > Is this a known performance degradation compared to the old producer? > Here is the trace from the old producer: > !http://imgur.com/1xS34Dl.jpg! > New producer: > !http://imgur.com/0w0G5b1.jpg! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393877#comment-15393877 ] Andrew Jorgensen commented on KAFKA-3980: - [~omkreddy] do you mean enable debug log on the producer side of the kafka side? It looks like the debug log for the producer itself doesn't have its client id so I'm assuming you mean on the kafka side? > JmxReporter uses excessive memory causing OutOfMemoryException > -- > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB and a node that > has been running for 2 months is using over 700MB of memory. Looking at the > heap dump it looks like the JmxReporter is holding on to metrics and causing > them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a chance when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > resetting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388382#comment-15388382 ] Andrew Jorgensen edited comment on KAFKA-3980 at 7/21/16 9:37 PM: -- I was able to extract a list of all the ids from the map in the JmxReporter and have uploaded them here (https://drive.google.com/file/d/0B_65les2Npo5OHFyMVpXSjd1cXc/view?usp=sharing). The compressed size is 29M and uncompressed it is 224M. In the list you can see that the type=Fetch keys have clearly defined names but all the type=Produce seem to have completely randomized ids. I think that in general this makes sense, we dont really need a id on the producer side but the JmxReporter should not grow unbounded. EDIT: Another data point is that I believe we are using the 0.8.1.1 client to talk to a 0.9.0.1 cluster. Not sure if the version mismatch there is contributing. It looks like the default client-id in the 0.8.1.1 client was an empty string where-as the new one uses an AtomicInteger. was (Author: ajorgensen): I was able to extract a list of all the ids from the map in the JmxReporter and have uploaded them here (https://drive.google.com/file/d/0B_65les2Npo5OHFyMVpXSjd1cXc/view?usp=sharing). The compressed size is 29M and uncompressed it is 224M. In the list you can see that the type=Fetch keys have clearly defined names but all the type=Produce seem to have completely randomized ids. I think that in general this makes sense, we dont really need a id on the producer side but the JmxReporter should not grow unbounded. > JmxReporter uses excessive memory causing OutOfMemoryException > -- > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB and a node that > has been running for 2 months is using over 700MB of memory. Looking at the > heap dump it looks like the JmxReporter is holding on to metrics and causing > them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a chance when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > resetting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388382#comment-15388382 ] Andrew Jorgensen commented on KAFKA-3980: - I was able to extract a list of all the ids from the map in the JmxReporter and have uploaded them here (https://drive.google.com/file/d/0B_65les2Npo5OHFyMVpXSjd1cXc/view?usp=sharing). The compressed size is 29M and uncompressed it is 224M. In the list you can see that the type=Fetch keys have clearly defined names but all the type=Produce seem to have completely randomized ids. I think that in general this makes sense, we dont really need a id on the producer side but the JmxReporter should not grow unbounded. > JmxReporter uses excessive memory causing OutOfMemoryException > -- > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB and a node that > has been running for 2 months is using over 700MB of memory. Looking at the > heap dump it looks like the JmxReporter is holding on to metrics and causing > them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a chance when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > resetting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387185#comment-15387185 ] Andrew Jorgensen edited comment on KAFKA-3980 at 7/21/16 5:59 AM: -- As far as I know we are not restarting with different client ids. We have increase the number of producers recently but if we are restarting with separate client ids it is not on purpose. When are those client ids generated? Looking at the heap dump they look pretty generated and not manually set, could restarting the producers frequently cause new ids to be generated each time they restart which add up over time? FWIW we have another cluster that is on 0.8.1.1, so the older producer, which does not seem to exhibit the same symptoms despite being restarted multiple times. EDIT: I'm not sure if its related but we are using this ruby-kafka client for some of the producers. The others are in java and using the provided kafka client. https://github.com/zendesk/ruby-kafka/blob/master/lib/kafka/client.rb#L44. It appears that the client sets itself to `ruby-client` by default. was (Author: ajorgensen): As far as I know we are not restarting with different client ids. We have increase the number of producers recently but if we are restarting with separate client ids it is not on purpose. When are those client ids generated? Looking at the heap dump they look pretty generated and not manually set, could restarting the producers frequently cause new ids to be generated each time they restart which add up over time? FWIW we have another cluster that is on 0.8.1.1, so the older producer, which does not seem to exhibit the same symptoms despite being restarted multiple times. EDIT: I'm not sure if its related but we are using this ruby-kafka client: https://github.com/zendesk/ruby-kafka/blob/master/lib/kafka/client.rb#L44. It appears that the client sets itself to `ruby-client` by default. > JmxReporter uses excessive memory causing OutOfMemoryException > -- > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB and a node that > has been running for 2 months is using over 700MB of memory. Looking at the > heap dump it looks like the JmxReporter is holding on to metrics and causing > them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a chance when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > resetting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387185#comment-15387185 ] Andrew Jorgensen edited comment on KAFKA-3980 at 7/21/16 5:53 AM: -- As far as I know we are not restarting with different client ids. We have increase the number of producers recently but if we are restarting with separate client ids it is not on purpose. When are those client ids generated? Looking at the heap dump they look pretty generated and not manually set, could restarting the producers frequently cause new ids to be generated each time they restart which add up over time? FWIW we have another cluster that is on 0.8.1.1, so the older producer, which does not seem to exhibit the same symptoms despite being restarted multiple times. EDIT: I'm not sure if its related but we are using this ruby-kafka client: https://github.com/zendesk/ruby-kafka/blob/master/lib/kafka/client.rb#L44. It appears that the client sets itself to `ruby-client` by default. was (Author: ajorgensen): As far as I know we are not restarting with different client ids. We have increase the number of producers recently but if we are restarting with separate client ids it is not on purpose. When are those client ids generated? Looking at the heap dump they look pretty generated and not manually set, could restarting the producers frequently cause new ids to be generated each time they restart which add up over time? FWIW we have another cluster that is on 0.8.1.1, so the older producer, which does not seem to exhibit the same symptoms despite being restarted multiple times. EDIT: I'm not sure if its related but we are using this ruby-kafka client: https://github.com/zendesk/ruby-kafka/blob/master/lib/kafka/client.rb#L44 > JmxReporter uses excessive memory causing OutOfMemoryException > -- > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB and a node that > has been running for 2 months is using over 700MB of memory. Looking at the > heap dump it looks like the JmxReporter is holding on to metrics and causing > them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a chance when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > resetting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387185#comment-15387185 ] Andrew Jorgensen edited comment on KAFKA-3980 at 7/21/16 5:52 AM: -- As far as I know we are not restarting with different client ids. We have increase the number of producers recently but if we are restarting with separate client ids it is not on purpose. When are those client ids generated? Looking at the heap dump they look pretty generated and not manually set, could restarting the producers frequently cause new ids to be generated each time they restart which add up over time? FWIW we have another cluster that is on 0.8.1.1, so the older producer, which does not seem to exhibit the same symptoms despite being restarted multiple times. EDIT: I'm not sure if its related but we are using this ruby-kafka client: https://github.com/zendesk/ruby-kafka/blob/master/lib/kafka/client.rb#L44 was (Author: ajorgensen): As far as I know we are not restarting with different client ids. We have increase the number of producers recently but if we are restarting with separate client ids it is not on purpose. When are those client ids generated? Looking at the heap dump they look pretty generated and not manually set, could restarting the producers frequently cause new ids to be generated each time they restart which add up over time? FWIW we have another cluster that is on 0.8.1.1, so the older producer, which does not seem to exhibit the same symptoms despite being restarted multiple times. > JmxReporter uses excessive memory causing OutOfMemoryException > -- > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB and a node that > has been running for 2 months is using over 700MB of memory. Looking at the > heap dump it looks like the JmxReporter is holding on to metrics and causing > them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a chance when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > resetting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387185#comment-15387185 ] Andrew Jorgensen edited comment on KAFKA-3980 at 7/21/16 5:22 AM: -- As far as I know we are not restarting with different client ids. We have increase the number of producers recently but if we are restarting with separate client ids it is not on purpose. When are those client ids generated? Looking at the heap dump they look pretty generated and not manually set, could restarting the producers frequently cause new ids to be generated each time they restart which add up over time? FWIW we have another cluster that is on 0.8.1.1, so the older producer, which does not seem to exhibit the same symptoms despite being restarted multiple times. was (Author: ajorgensen): As far as I know we are not restarting with different client ids. We have increase the number of producers recently but if we are restarting with separate client ids it is not on purpose. When are those client ids generated? Looking at the heap dump they look pretty random, could restarting the producers frequently cause new ids to be generated each time they restart which add up over time? FWIW we have another cluster that is on 0.8.1.1, so the older producer, which does not seem to exhibit the same symptoms despite being restarted multiple times. > JmxReporter uses excessive memory causing OutOfMemoryException > -- > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB and a node that > has been running for 2 months is using over 700MB of memory. Looking at the > heap dump it looks like the JmxReporter is holding on to metrics and causing > them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a chance when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > resetting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387185#comment-15387185 ] Andrew Jorgensen commented on KAFKA-3980: - As far as I know we are not restarting with different client ids. We have increase the number of producers recently but if we are restarting with separate client ids it is not on purpose. When are those client ids generated? Looking at the heap dump they look pretty random, could restarting the producers frequently cause new ids to be generated each time they restart which add up over time? FWIW we have another cluster that is on 0.8.1.1, so the older producer, which does not seem to exhibit the same symptoms despite being restarted multiple times. > JmxReporter uses excessive memory causing OutOfMemoryException > -- > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB and a node that > has been running for 2 months is using over 700MB of memory. Looking at the > heap dump it looks like the JmxReporter is holding on to metrics and causing > them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a chance when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > resetting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jorgensen updated KAFKA-3980: Description: I have some nodes in a kafka cluster that occasionally will run out of memory whenever I restart the producers. I was able to take a heap dump from both a recently restarted Kafka node which weighed in at about 20 MB and a node that has been running for 2 months is using over 700MB of memory. Looking at the heap dump it looks like the JmxReporter is holding on to metrics and causing them to build up over time. !http://imgur.com/N6Cd0Ku.png! !http://imgur.com/kQBqA2j.png! The ultimate problem this causes is that there is a chance when I restart the producers it will cause the node to experience an Java heap space exception and OOM. The nodes then fail to startup correctly and write a -1 as the leader number to the partitions they were responsible for effectively resetting the offset and rendering that partition unavailable. The kafka process then needs to go be restarted in order to re-assign the node to the partition that it owns. I have a few questions: 1. I am not quite sure why there are so many client id entries in that JmxReporter map. 2. Is there a way to have the JmxReporter release metrics after a set amount of time or a way to turn certain high cardinality metrics like these off? I can provide any logs or heap dumps if more information is needed. was: I have some nodes in a kafka cluster that occasionally will run out of memory whenever I restart the producers. I was able to take a heap dump from both a recently restarted Kafka node which weighed in at about 20 MB and a node that has been running for 2 months is using over 700MB of memory. Looking at the heap dump it looks like the JmxReporter is holding on to metrics and causing them to build up over time. !http://imgur.com/N6Cd0Ku.png! !http://imgur.com/kQBqA2j.png! The ultimate problem this causes is that there is a chance when I restart the producers it will cause the node to experience an Java heap space exception and OOM. The nodes then fail to startup correctly and write a -1 as the leader number to the partitions they were responsible for effectively reseting the offset and rendering that partition unavailable. The kafka process then needs to go be restarted in order to re-assign the node to the partition that it owns. I have a few questions: 1. I am not quite sure why there are so many client id entries in that JmxReporter map. 2. Is there a way to have the JmxReporter release metrics after a set amount of time or a way to turn certain high cardinality metrics like these off? I can provide any logs or heap dumps if more information is needed. > JmxReporter uses excessive memory causing OutOfMemoryException > -- > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB and a node that > has been running for 2 months is using over 700MB of memory. Looking at the > heap dump it looks like the JmxReporter is holding on to metrics and causing > them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a chance when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > resetting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jorgensen updated KAFKA-3980: Summary: JmxReporter uses excessive memory causing OutOfMemoryException (was: JmxReporter uses an excessive memory causing OutOfMemoryException) > JmxReporter uses excessive memory causing OutOfMemoryException > -- > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB and a node that > has been running for 2 months is using over 700MB of memory. Looking at the > heap dump it looks like the JmxReporter is holding on to metrics and causing > them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a chance when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > reseting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3980) JmxReporter uses an excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jorgensen updated KAFKA-3980: Description: I have some nodes in a kafka cluster that occasionally will run out of memory whenever I restart the producers. I was able to take a heap dump from both a recently restarted Kafka node which weighed in at about 20 MB and a node that has been running for 2 months is using over 700MB of memory. Looking at the heap dump it looks like the JmxReporter is holding on to metrics and causing them to build up over time. !http://imgur.com/N6Cd0Ku.png! !http://imgur.com/kQBqA2j.png! The ultimate problem this causes is that there is a chance when I restart the producers it will cause the node to experience an Java heap space exception and OOM. The nodes then fail to startup correctly and write a -1 as the leader number to the partitions they were responsible for effectively reseting the offset and rendering that partition unavailable. The kafka process then needs to go be restarted in order to re-assign the node to the partition that it owns. I have a few questions: 1. I am not quite sure why there are so many client id entries in that JmxReporter map. 2. Is there a way to have the JmxReporter release metrics after a set amount of time or a way to turn certain high cardinality metrics like these off? I can provide any logs or heap dumps if more information is needed. was: I have some nodes in a kafka cluster that occasionally will run out of memory whenever I restart the producers. I was able to take a heap dump from both a recently restarted Kafka node which weighed in at about 20 MB and a node that has been running for 2 months is using over 700MB of memory. Looking at the heap dump it looks like the JmxReporter is holding on to metrics and causing them to build up over time. !http://imgur.com/N6Cd0Ku.png! !http://imgur.com/kQBqA2j.png! The ultimate problem this causes is that there is a change when I restart the producers it will cause the node to experience an Java heap space exception and OOM. The nodes then fail to startup correctly and write a -1 as the leader number to the partitions they were responsible for effectively reseting the offset and rendering that partition unavailable. The kafka process then needs to go be restarted in order to re-assign the node to the partition that it owns. I have a few questions: 1. I am not quite sure why there are so many client id entries in that JmxReporter map. 2. Is there a way to have the JmxReporter release metrics after a set amount of time or a way to turn certain high cardinality metrics like these off? I can provide any logs or heap dumps if more information is needed. > JmxReporter uses an excessive memory causing OutOfMemoryException > - > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB and a node that > has been running for 2 months is using over 700MB of memory. Looking at the > heap dump it looks like the JmxReporter is holding on to metrics and causing > them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a chance when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > reseting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3980) JmxReporter uses an excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jorgensen updated KAFKA-3980: Description: I have some nodes in a kafka cluster that occasionally will run out of memory whenever I restart the producers. I was able to take a heap dump from both a recently restarted Kafka node which weighed in at about 20 MB and a node that has been running for 2 months is using over 700MB of memory. Looking at the heap dump it looks like the JmxReporter is holding on to metrics and causing them to build up over time. !http://imgur.com/N6Cd0Ku.png! !http://imgur.com/kQBqA2j.png! The ultimate problem this causes is that there is a change when I restart the producers it will cause the node to experience an Java heap space exception and OOM. The nodes then fail to startup correctly and write a -1 as the leader number to the partitions they were responsible for effectively reseting the offset and rendering that partition unavailable. The kafka process then needs to go be restarted in order to re-assign the node to the partition that it owns. I have a few questions: 1. I am not quite sure why there are so many client id entries in that JmxReporter map. 2. Is there a way to have the JmxReporter release metrics after a set amount of time or a way to turn certain high cardinality metrics like these off? I can provide any logs or heap dumps if more information is needed. was: I have some nodes in a kafka cluster that occasionally will run out of memory whenever I restart the producers. I was able to take a heap dump from both a recently restarted Kafka node which weighed in at about 20 MB while a node that has been running for 2 months is using over 700MB of memory. Looking at the heap dump it looks like the JmxReporter is holding on to metrics and causing them to build up over time. !http://imgur.com/N6Cd0Ku.png! !http://imgur.com/kQBqA2j.png! The ultimate problem this causes is that there is a change when I restart the producers it will cause the node to experience an Java heap space exception and OOM. The nodes then fail to startup correctly and write a -1 as the leader number to the partitions they were responsible for effectively reseting the offset and rendering that partition unavailable. The kafka process then needs to go be restarted in order to re-assign the node to the partition that it owns. I have a few questions: 1. I am not quite sure why there are so many client id entries in that JmxReporter map. 2. Is there a way to have the JmxReporter release metrics after a set amount of time or a way to turn certain high cardinality metrics like these off? I can provide any logs or heap dumps if more information is needed. > JmxReporter uses an excessive memory causing OutOfMemoryException > - > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB and a node that > has been running for 2 months is using over 700MB of memory. Looking at the > heap dump it looks like the JmxReporter is holding on to metrics and causing > them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a change when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > reseting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3980) JmxReporter uses an excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jorgensen updated KAFKA-3980: Summary: JmxReporter uses an excessive memory causing OutOfMemoryException (was: JmxReporter using excessive memory causing OutOfMemoryException) > JmxReporter uses an excessive memory causing OutOfMemoryException > - > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB while a node > that has been running for 2 months is using over 700MB of memory. Looking at > the heap dump it looks like the JmxReporter is holding on to metrics and > causing them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a change when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > reseting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3980) JmxReporter using excessive memory causing OutOfMemoryException
[ https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jorgensen updated KAFKA-3980: Summary: JmxReporter using excessive memory causing OutOfMemoryException (was: JMXReport using excessive memory) > JmxReporter using excessive memory causing OutOfMemoryException > --- > > Key: KAFKA-3980 > URL: https://issues.apache.org/jira/browse/KAFKA-3980 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Jorgensen > > I have some nodes in a kafka cluster that occasionally will run out of memory > whenever I restart the producers. I was able to take a heap dump from both a > recently restarted Kafka node which weighed in at about 20 MB while a node > that has been running for 2 months is using over 700MB of memory. Looking at > the heap dump it looks like the JmxReporter is holding on to metrics and > causing them to build up over time. > !http://imgur.com/N6Cd0Ku.png! > !http://imgur.com/kQBqA2j.png! > The ultimate problem this causes is that there is a change when I restart the > producers it will cause the node to experience an Java heap space exception > and OOM. The nodes then fail to startup correctly and write a -1 as the > leader number to the partitions they were responsible for effectively > reseting the offset and rendering that partition unavailable. The kafka > process then needs to go be restarted in order to re-assign the node to the > partition that it owns. > I have a few questions: > 1. I am not quite sure why there are so many client id entries in that > JmxReporter map. > 2. Is there a way to have the JmxReporter release metrics after a set amount > of time or a way to turn certain high cardinality metrics like these off? > I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3980) JMXReport using excessive memory
Andrew Jorgensen created KAFKA-3980: --- Summary: JMXReport using excessive memory Key: KAFKA-3980 URL: https://issues.apache.org/jira/browse/KAFKA-3980 Project: Kafka Issue Type: Bug Affects Versions: 0.9.0.1 Reporter: Andrew Jorgensen I have some nodes in a kafka cluster that occasionally will run out of memory whenever I restart the producers. I was able to take a heap dump from both a recently restarted Kafka node which weighed in at about 20 MB while a node that has been running for 2 months is using over 700MB of memory. Looking at the heap dump it looks like the JmxReporter is holding on to metrics and causing them to build up over time. !http://imgur.com/N6Cd0Ku.png! !http://imgur.com/kQBqA2j.png! The ultimate problem this causes is that there is a change when I restart the producers it will cause the node to experience an Java heap space exception and OOM. The nodes then fail to startup correctly and write a -1 as the leader number to the partitions they were responsible for effectively reseting the offset and rendering that partition unavailable. The kafka process then needs to go be restarted in order to re-assign the node to the partition that it owns. I have a few questions: 1. I am not quite sure why there are so many client id entries in that JmxReporter map. 2. Is there a way to have the JmxReporter release metrics after a set amount of time or a way to turn certain high cardinality metrics like these off? I can provide any logs or heap dumps if more information is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-1816) Topic configs reset after partition increase
Andrew Jorgensen created KAFKA-1816: --- Summary: Topic configs reset after partition increase Key: KAFKA-1816 URL: https://issues.apache.org/jira/browse/KAFKA-1816 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Andrew Jorgensen Priority: Minor If you alter a topic to increase the number of partitions then the configuration erases the existing configs for that topic. This can be reproduces by doing the following: {code:none} $ bin/kafka-topics.sh --create --zookeeper localhost --topic test_topic --partitions 5 --config retention.ms=3600 $ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic Topic:test_topicPartitionCount:5ReplicationFactor:1 Configs:retention.ms=3600 $ bin/kafka-topics.sh --alter --zookeeper localhost --topic test_topic --partitions 10 Topic:test_topicPartitionCount:10ReplicationFactor:1 Configs: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1816) Topic configs reset after partition increase
[ https://issues.apache.org/jira/browse/KAFKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jorgensen updated KAFKA-1816: Description: If you alter a topic to increase the number of partitions then the configuration erases the existing configs for that topic. This can be reproduces by doing the following: {code:none} $ bin/kafka-topics.sh --create --zookeeper localhost --topic test_topic --partitions 5 --config retention.ms=3600 $ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic Topic:test_topicPartitionCount:5ReplicationFactor:1 Configs:retention.ms=3600 $ bin/kafka-topics.sh --alter --zookeeper localhost --topic test_topic --partitions 10 $ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic Topic:test_topicPartitionCount:10ReplicationFactor:1 Configs: {code} was: If you alter a topic to increase the number of partitions then the configuration erases the existing configs for that topic. This can be reproduces by doing the following: {code:none} $ bin/kafka-topics.sh --create --zookeeper localhost --topic test_topic --partitions 5 --config retention.ms=3600 $ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic Topic:test_topicPartitionCount:5ReplicationFactor:1 Configs:retention.ms=3600 $ bin/kafka-topics.sh --alter --zookeeper localhost --topic test_topic --partitions 10 Topic:test_topicPartitionCount:10ReplicationFactor:1 Configs: {code} Topic configs reset after partition increase Key: KAFKA-1816 URL: https://issues.apache.org/jira/browse/KAFKA-1816 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Andrew Jorgensen Priority: Minor If you alter a topic to increase the number of partitions then the configuration erases the existing configs for that topic. This can be reproduces by doing the following: {code:none} $ bin/kafka-topics.sh --create --zookeeper localhost --topic test_topic --partitions 5 --config retention.ms=3600 $ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic Topic:test_topicPartitionCount:5ReplicationFactor:1 Configs:retention.ms=3600 $ bin/kafka-topics.sh --alter --zookeeper localhost --topic test_topic --partitions 10 $ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic Topic:test_topicPartitionCount:10ReplicationFactor:1 Configs: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1816) Topic configs reset after partition increase
[ https://issues.apache.org/jira/browse/KAFKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240245#comment-14240245 ] Andrew Jorgensen commented on KAFKA-1816: - I can try this out on 0.8.2 when I get home to verify. Topic configs reset after partition increase Key: KAFKA-1816 URL: https://issues.apache.org/jira/browse/KAFKA-1816 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Andrew Jorgensen Priority: Minor Labels: newbie Fix For: 0.8.3 If you alter a topic to increase the number of partitions then the configuration erases the existing configs for that topic. This can be reproduces by doing the following: {code:none} $ bin/kafka-topics.sh --create --zookeeper localhost --topic test_topic --partitions 5 --config retention.ms=3600 $ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic Topic:test_topicPartitionCount:5ReplicationFactor:1 Configs:retention.ms=3600 $ bin/kafka-topics.sh --alter --zookeeper localhost --topic test_topic --partitions 10 $ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic Topic:test_topicPartitionCount:10ReplicationFactor:1 Configs: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)