[jira] [Updated] (KAFKA-4141) 2x increase in cpu usage on new Producer API

2016-09-07 Thread Andrew Jorgensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Jorgensen updated KAFKA-4141:

Description: 
We are seeing about a 2x increase in CPU usage for the new kafka producer 
compared to the 0.8.1.1 producer. We are currently using gzip compression.

We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and 
noticed that the cpu usage for the new producer had increased pretty 
significantly compared to the old producer. This has caused us to need more 
resources to do the same amount of work we were doing before. I did some quick 
profiling and it looks like during sending half of the cpu cycles are spent in 
org.apache.kafka.common.record.Compressor.putRecord and the other half are in 
org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of 
the cpu cycles for that method). I know its not apples to apples but the old 
producer did not seem to have this overhead or at least it was greatly reduced.

Is this a known performance degradation compared to the old producer? 

h2. old producer:
!http://imgur.com/1xS34Dl.jpg!

h2. new producer:
!http://imgur.com/0w0G5b1.jpg!

  was:
We are seeing about a 2x increase in CPU usage for the new kafka producer 
compared to the 0.8.1.1 producer. We are currently using gzip compression.

We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and 
noticed that the cpu usage for the new producer had increased pretty 
significantly compared to the old producer. This has caused us to need more 
resources to do the same amount of work we were doing before. I did some quick 
profiling and it looks like during sending half of the cpu cycles are sped in 
org.apache.kafka.common.record.Compressor.putRecord and the other half is in 
org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of 
the cpu cycles for that method). I know its not apples to apples but the old 
producer did not seem to have this overhead or at least it was greatly reduced.

Is this a known performance degradation compared to the old producer? 

h2. old producer:
!http://imgur.com/1xS34Dl.jpg!

h2. new producer:
!http://imgur.com/0w0G5b1.jpg!


> 2x increase in cpu usage on new Producer API
> 
>
> Key: KAFKA-4141
> URL: https://issues.apache.org/jira/browse/KAFKA-4141
> Project: Kafka
>  Issue Type: Bug
>Reporter: Andrew Jorgensen
>
> We are seeing about a 2x increase in CPU usage for the new kafka producer 
> compared to the 0.8.1.1 producer. We are currently using gzip compression.
> We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 
> and noticed that the cpu usage for the new producer had increased pretty 
> significantly compared to the old producer. This has caused us to need more 
> resources to do the same amount of work we were doing before. I did some 
> quick profiling and it looks like during sending half of the cpu cycles are 
> spent in org.apache.kafka.common.record.Compressor.putRecord and the other 
> half are in org.apache.kafka.common.record.Record.computeChecksum (both are 
> around 5.8% of the cpu cycles for that method). I know its not apples to 
> apples but the old producer did not seem to have this overhead or at least it 
> was greatly reduced.
> Is this a known performance degradation compared to the old producer? 
> h2. old producer:
> !http://imgur.com/1xS34Dl.jpg!
> h2. new producer:
> !http://imgur.com/0w0G5b1.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4141) 2x increase in cpu usage on new Producer API

2016-09-07 Thread Andrew Jorgensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Jorgensen updated KAFKA-4141:

Description: 
We are seeing about a 2x increase in CPU usage for the new kafka producer 
compared to the 0.8.1.1 producer. We are currently using gzip compression.

We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and 
noticed that the cpu usage for the new producer had increased pretty 
significantly compared to the old producer. This has caused us to need more 
resources to do the same amount of work we were doing before. I did some quick 
profiling and it looks like during sending half of the cpu cycles are sped in 
org.apache.kafka.common.record.Compressor.putRecord and the other half is in 
org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of 
the cpu cycles for that method). I know its not apples to apples but the old 
producer did not seem to have this overhead or at least it was greatly reduced.

Is this a known performance degradation compared to the old producer? 

h2. old producer:
!http://imgur.com/1xS34Dl.jpg!

h2. new producer:
!http://imgur.com/0w0G5b1.jpg!

  was:
We are seeing about a 2x increase in CPU usage for the new kafka producer 
compared to the 0.8.0.1 producer. We are currently using gzip compression.

We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and 
noticed that the cpu usage for the new producer had increased pretty 
significantly compared to the old producer. This has caused us to need more 
resources to do the same amount of work we were doing before. I did some quick 
profiling and it looks like during sending half of the cpu cycles are sped in 
org.apache.kafka.common.record.Compressor.putRecord and the other half is in 
org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of 
the cpu cycles for that method). I know its not apples to apples but the old 
producer did not seem to have this overhead or at least it was greatly reduced.

Is this a known performance degradation compared to the old producer? 

h2. old producer:
!http://imgur.com/1xS34Dl.jpg!

h2. new producer:
!http://imgur.com/0w0G5b1.jpg!


> 2x increase in cpu usage on new Producer API
> 
>
> Key: KAFKA-4141
> URL: https://issues.apache.org/jira/browse/KAFKA-4141
> Project: Kafka
>  Issue Type: Bug
>Reporter: Andrew Jorgensen
>
> We are seeing about a 2x increase in CPU usage for the new kafka producer 
> compared to the 0.8.1.1 producer. We are currently using gzip compression.
> We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 
> and noticed that the cpu usage for the new producer had increased pretty 
> significantly compared to the old producer. This has caused us to need more 
> resources to do the same amount of work we were doing before. I did some 
> quick profiling and it looks like during sending half of the cpu cycles are 
> sped in org.apache.kafka.common.record.Compressor.putRecord and the other 
> half is in org.apache.kafka.common.record.Record.computeChecksum (both are 
> around 5.8% of the cpu cycles for that method). I know its not apples to 
> apples but the old producer did not seem to have this overhead or at least it 
> was greatly reduced.
> Is this a known performance degradation compared to the old producer? 
> h2. old producer:
> !http://imgur.com/1xS34Dl.jpg!
> h2. new producer:
> !http://imgur.com/0w0G5b1.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4141) 2x increase in cpu usage on new Producer API

2016-09-07 Thread Andrew Jorgensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Jorgensen updated KAFKA-4141:

Description: 
We are seeing about a 2x increase in CPU usage for the new kafka producer 
compared to the 0.8.0.1 producer. We are currently using gzip compression.

We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and 
noticed that the cpu usage for the new producer had increased pretty 
significantly compared to the old producer. This has caused us to need more 
resources to do the same amount of work we were doing before. I did some quick 
profiling and it looks like during sending half of the cpu cycles are sped in 
org.apache.kafka.common.record.Compressor.putRecord and the other half is in 
org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of 
the cpu cycles for that method). I know its not apples to apples but the old 
producer did not seem to have this overhead or at least it was greatly reduced.

Is this a known performance degradation compared to the old producer? 

h2. old producer:
!http://imgur.com/1xS34Dl.jpg!

h2. new producer:
!http://imgur.com/0w0G5b1.jpg!

  was:
We are seeing about a 2x increase in CPU usage for the new kafka producer 
compared to the 0.8.0.1 producer. We are currently using gzip compression.

We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and 
noticed that the cpu usage for the new producer had increased pretty 
significantly compared to the old producer. This has caused us to need more 
resources to do the same amount of work we were doing before. I did some quick 
profiling and it looks like during sending half of the cpu cycles are sped in 
org.apache.kafka.common.record.Compressor.putRecord and the other half is in 
org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of 
the cpu cycles for that method). I know its not apples to apples but the old 
producer did not seem to have this overhead or at least it was greatly reduced.

Is this a known performance degradation compared to the old producer? 

Here is the trace from the old producer:
!http://imgur.com/1xS34Dl.jpg!

New producer:
!http://imgur.com/0w0G5b1.jpg!


> 2x increase in cpu usage on new Producer API
> 
>
> Key: KAFKA-4141
> URL: https://issues.apache.org/jira/browse/KAFKA-4141
> Project: Kafka
>  Issue Type: Bug
>Reporter: Andrew Jorgensen
>
> We are seeing about a 2x increase in CPU usage for the new kafka producer 
> compared to the 0.8.0.1 producer. We are currently using gzip compression.
> We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 
> and noticed that the cpu usage for the new producer had increased pretty 
> significantly compared to the old producer. This has caused us to need more 
> resources to do the same amount of work we were doing before. I did some 
> quick profiling and it looks like during sending half of the cpu cycles are 
> sped in org.apache.kafka.common.record.Compressor.putRecord and the other 
> half is in org.apache.kafka.common.record.Record.computeChecksum (both are 
> around 5.8% of the cpu cycles for that method). I know its not apples to 
> apples but the old producer did not seem to have this overhead or at least it 
> was greatly reduced.
> Is this a known performance degradation compared to the old producer? 
> h2. old producer:
> !http://imgur.com/1xS34Dl.jpg!
> h2. new producer:
> !http://imgur.com/0w0G5b1.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4141) 2x increase in cp usage on new Producer API

2016-09-07 Thread Andrew Jorgensen (JIRA)
Andrew Jorgensen created KAFKA-4141:
---

 Summary: 2x increase in cp usage on new Producer API
 Key: KAFKA-4141
 URL: https://issues.apache.org/jira/browse/KAFKA-4141
 Project: Kafka
  Issue Type: Bug
Reporter: Andrew Jorgensen


We are seeing about a 2x increase in CPU usage for the new kafka producer 
compared to the 0.8.0.1 producer. We are currently using gzip compression.

We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 and 
noticed that the cpu usage for the new producer had increased pretty 
significantly compared to the old producer. This has caused us to need more 
resources to do the same amount of work we were doing before. I did some quick 
profiling and it looks like during sending half of the cpu cycles are sped in 
org.apache.kafka.common.record.Compressor.putRecord and the other half is in 
org.apache.kafka.common.record.Record.computeChecksum (both are around 5.8% of 
the cpu cycles for that method). I know its not apples to apples but the old 
producer did not seem to have this overhead or at least it was greatly reduced.

Is this a known performance degradation compared to the old producer? 

Here is the trace from the old producer:
!http://imgur.com/1xS34Dl.jpg!

New producer:
!http://imgur.com/0w0G5b1.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4141) 2x increase in cpu usage on new Producer API

2016-09-07 Thread Andrew Jorgensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Jorgensen updated KAFKA-4141:

Summary: 2x increase in cpu usage on new Producer API  (was: 2x increase in 
cp usage on new Producer API)

> 2x increase in cpu usage on new Producer API
> 
>
> Key: KAFKA-4141
> URL: https://issues.apache.org/jira/browse/KAFKA-4141
> Project: Kafka
>  Issue Type: Bug
>Reporter: Andrew Jorgensen
>
> We are seeing about a 2x increase in CPU usage for the new kafka producer 
> compared to the 0.8.0.1 producer. We are currently using gzip compression.
> We recently upgraded our kafka server and producer to 0.10.0.1 from 0.8.1.1 
> and noticed that the cpu usage for the new producer had increased pretty 
> significantly compared to the old producer. This has caused us to need more 
> resources to do the same amount of work we were doing before. I did some 
> quick profiling and it looks like during sending half of the cpu cycles are 
> sped in org.apache.kafka.common.record.Compressor.putRecord and the other 
> half is in org.apache.kafka.common.record.Record.computeChecksum (both are 
> around 5.8% of the cpu cycles for that method). I know its not apples to 
> apples but the old producer did not seem to have this overhead or at least it 
> was greatly reduced.
> Is this a known performance degradation compared to the old producer? 
> Here is the trace from the old producer:
> !http://imgur.com/1xS34Dl.jpg!
> New producer:
> !http://imgur.com/0w0G5b1.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException

2016-07-26 Thread Andrew Jorgensen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393877#comment-15393877
 ] 

Andrew Jorgensen commented on KAFKA-3980:
-

[~omkreddy] do you mean enable debug log on the producer side of the kafka 
side? It looks like the debug log for the producer itself doesn't have its 
client id so I'm assuming you mean on the kafka side?

> JmxReporter uses excessive memory causing OutOfMemoryException
> --
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> resetting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException

2016-07-21 Thread Andrew Jorgensen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388382#comment-15388382
 ] 

Andrew Jorgensen edited comment on KAFKA-3980 at 7/21/16 9:37 PM:
--

I was able to extract a list of all the ids from the map in the JmxReporter and 
have uploaded them here 
(https://drive.google.com/file/d/0B_65les2Npo5OHFyMVpXSjd1cXc/view?usp=sharing).
 The compressed size is 29M and uncompressed it is 224M. In the list you can 
see that the type=Fetch keys have clearly defined names but all the 
type=Produce seem to have completely randomized ids. I think that in general 
this makes sense, we dont really need a id on the producer side but the 
JmxReporter should not grow unbounded.

EDIT: Another data point is that I believe we are using the   0.8.1.1 client to 
talk to a 0.9.0.1 cluster. Not sure if the version mismatch there is 
contributing. It looks like the default client-id in the 0.8.1.1 client was an 
empty string where-as the new one uses an AtomicInteger.




was (Author: ajorgensen):
I was able to extract a list of all the ids from the map in the JmxReporter and 
have uploaded them here 
(https://drive.google.com/file/d/0B_65les2Npo5OHFyMVpXSjd1cXc/view?usp=sharing).
 The compressed size is 29M and uncompressed it is 224M. In the list you can 
see that the type=Fetch keys have clearly defined names but all the 
type=Produce seem to have completely randomized ids. I think that in general 
this makes sense, we dont really need a id on the producer side but the 
JmxReporter should not grow unbounded.

> JmxReporter uses excessive memory causing OutOfMemoryException
> --
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> resetting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException

2016-07-21 Thread Andrew Jorgensen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388382#comment-15388382
 ] 

Andrew Jorgensen commented on KAFKA-3980:
-

I was able to extract a list of all the ids from the map in the JmxReporter and 
have uploaded them here 
(https://drive.google.com/file/d/0B_65les2Npo5OHFyMVpXSjd1cXc/view?usp=sharing).
 The compressed size is 29M and uncompressed it is 224M. In the list you can 
see that the type=Fetch keys have clearly defined names but all the 
type=Produce seem to have completely randomized ids. I think that in general 
this makes sense, we dont really need a id on the producer side but the 
JmxReporter should not grow unbounded.

> JmxReporter uses excessive memory causing OutOfMemoryException
> --
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> resetting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException

2016-07-21 Thread Andrew Jorgensen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387185#comment-15387185
 ] 

Andrew Jorgensen edited comment on KAFKA-3980 at 7/21/16 5:59 AM:
--

As far as I know we are not restarting with different client ids. We have 
increase the number of producers recently but if we are restarting with 
separate client ids it is not on purpose. When are those client ids generated? 
Looking at the heap dump they look pretty generated and not manually set, could 
restarting the producers frequently cause new ids to be generated each time 
they restart which add up over time? FWIW we have another cluster that is on 
0.8.1.1, so the older producer, which does not seem to exhibit the same 
symptoms despite being restarted multiple times.

EDIT: I'm not sure if its related but we are using this ruby-kafka client for 
some of the producers. The others are in java and using the provided kafka 
client. 
https://github.com/zendesk/ruby-kafka/blob/master/lib/kafka/client.rb#L44. It 
appears that the client sets itself to `ruby-client` by default.


was (Author: ajorgensen):
As far as I know we are not restarting with different client ids. We have 
increase the number of producers recently but if we are restarting with 
separate client ids it is not on purpose. When are those client ids generated? 
Looking at the heap dump they look pretty generated and not manually set, could 
restarting the producers frequently cause new ids to be generated each time 
they restart which add up over time? FWIW we have another cluster that is on 
0.8.1.1, so the older producer, which does not seem to exhibit the same 
symptoms despite being restarted multiple times.

EDIT: I'm not sure if its related but we are using this ruby-kafka client: 
https://github.com/zendesk/ruby-kafka/blob/master/lib/kafka/client.rb#L44. It 
appears that the client sets itself to `ruby-client` by default.

> JmxReporter uses excessive memory causing OutOfMemoryException
> --
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> resetting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException

2016-07-20 Thread Andrew Jorgensen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387185#comment-15387185
 ] 

Andrew Jorgensen edited comment on KAFKA-3980 at 7/21/16 5:53 AM:
--

As far as I know we are not restarting with different client ids. We have 
increase the number of producers recently but if we are restarting with 
separate client ids it is not on purpose. When are those client ids generated? 
Looking at the heap dump they look pretty generated and not manually set, could 
restarting the producers frequently cause new ids to be generated each time 
they restart which add up over time? FWIW we have another cluster that is on 
0.8.1.1, so the older producer, which does not seem to exhibit the same 
symptoms despite being restarted multiple times.

EDIT: I'm not sure if its related but we are using this ruby-kafka client: 
https://github.com/zendesk/ruby-kafka/blob/master/lib/kafka/client.rb#L44. It 
appears that the client sets itself to `ruby-client` by default.


was (Author: ajorgensen):
As far as I know we are not restarting with different client ids. We have 
increase the number of producers recently but if we are restarting with 
separate client ids it is not on purpose. When are those client ids generated? 
Looking at the heap dump they look pretty generated and not manually set, could 
restarting the producers frequently cause new ids to be generated each time 
they restart which add up over time? FWIW we have another cluster that is on 
0.8.1.1, so the older producer, which does not seem to exhibit the same 
symptoms despite being restarted multiple times.

EDIT: I'm not sure if its related but we are using this ruby-kafka client: 
https://github.com/zendesk/ruby-kafka/blob/master/lib/kafka/client.rb#L44

> JmxReporter uses excessive memory causing OutOfMemoryException
> --
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> resetting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException

2016-07-20 Thread Andrew Jorgensen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387185#comment-15387185
 ] 

Andrew Jorgensen edited comment on KAFKA-3980 at 7/21/16 5:52 AM:
--

As far as I know we are not restarting with different client ids. We have 
increase the number of producers recently but if we are restarting with 
separate client ids it is not on purpose. When are those client ids generated? 
Looking at the heap dump they look pretty generated and not manually set, could 
restarting the producers frequently cause new ids to be generated each time 
they restart which add up over time? FWIW we have another cluster that is on 
0.8.1.1, so the older producer, which does not seem to exhibit the same 
symptoms despite being restarted multiple times.

EDIT: I'm not sure if its related but we are using this ruby-kafka client: 
https://github.com/zendesk/ruby-kafka/blob/master/lib/kafka/client.rb#L44


was (Author: ajorgensen):
As far as I know we are not restarting with different client ids. We have 
increase the number of producers recently but if we are restarting with 
separate client ids it is not on purpose. When are those client ids generated? 
Looking at the heap dump they look pretty generated and not manually set, could 
restarting the producers frequently cause new ids to be generated each time 
they restart which add up over time? FWIW we have another cluster that is on 
0.8.1.1, so the older producer, which does not seem to exhibit the same 
symptoms despite being restarted multiple times.

> JmxReporter uses excessive memory causing OutOfMemoryException
> --
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> resetting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException

2016-07-20 Thread Andrew Jorgensen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387185#comment-15387185
 ] 

Andrew Jorgensen edited comment on KAFKA-3980 at 7/21/16 5:22 AM:
--

As far as I know we are not restarting with different client ids. We have 
increase the number of producers recently but if we are restarting with 
separate client ids it is not on purpose. When are those client ids generated? 
Looking at the heap dump they look pretty generated and not manually set, could 
restarting the producers frequently cause new ids to be generated each time 
they restart which add up over time? FWIW we have another cluster that is on 
0.8.1.1, so the older producer, which does not seem to exhibit the same 
symptoms despite being restarted multiple times.


was (Author: ajorgensen):
As far as I know we are not restarting with different client ids. We have 
increase the number of producers recently but if we are restarting with 
separate client ids it is not on purpose. When are those client ids generated? 
Looking at the heap dump they look pretty random, could restarting the 
producers frequently cause new ids to be generated each time they restart which 
add up over time? FWIW we have another cluster that is on 0.8.1.1, so the older 
producer, which does not seem to exhibit the same symptoms despite being 
restarted multiple times.

> JmxReporter uses excessive memory causing OutOfMemoryException
> --
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> resetting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException

2016-07-20 Thread Andrew Jorgensen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387185#comment-15387185
 ] 

Andrew Jorgensen commented on KAFKA-3980:
-

As far as I know we are not restarting with different client ids. We have 
increase the number of producers recently but if we are restarting with 
separate client ids it is not on purpose. When are those client ids generated? 
Looking at the heap dump they look pretty random, could restarting the 
producers frequently cause new ids to be generated each time they restart which 
add up over time? FWIW we have another cluster that is on 0.8.1.1, so the older 
producer, which does not seem to exhibit the same symptoms despite being 
restarted multiple times.

> JmxReporter uses excessive memory causing OutOfMemoryException
> --
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> resetting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException

2016-07-20 Thread Andrew Jorgensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Jorgensen updated KAFKA-3980:

Description: 
I have some nodes in a kafka cluster that occasionally will run out of memory 
whenever I restart the producers. I was able to take a heap dump from both a 
recently restarted Kafka node which weighed in at about 20 MB and a node that 
has been running for 2 months is using over 700MB of memory. Looking at the 
heap dump it looks like the JmxReporter is holding on to metrics and causing 
them to build up over time. 

!http://imgur.com/N6Cd0Ku.png!

!http://imgur.com/kQBqA2j.png!

The ultimate problem this causes is that there is a chance when I restart the 
producers it will cause the node to experience an Java heap space exception and 
OOM. The nodes  then fail to startup correctly and write a -1 as the leader 
number to the partitions they were responsible for effectively resetting the 
offset and rendering that partition unavailable. The kafka process then needs 
to go be restarted in order to re-assign the node to the partition that it owns.

I have a few questions:
1. I am not quite sure why there are so many client id entries in that 
JmxReporter map.
2. Is there a way to have the JmxReporter release metrics after a set amount of 
time or a way to turn certain high cardinality metrics like these off?

I can provide any logs or heap dumps if more information is needed.

  was:
I have some nodes in a kafka cluster that occasionally will run out of memory 
whenever I restart the producers. I was able to take a heap dump from both a 
recently restarted Kafka node which weighed in at about 20 MB and a node that 
has been running for 2 months is using over 700MB of memory. Looking at the 
heap dump it looks like the JmxReporter is holding on to metrics and causing 
them to build up over time. 

!http://imgur.com/N6Cd0Ku.png!

!http://imgur.com/kQBqA2j.png!

The ultimate problem this causes is that there is a chance when I restart the 
producers it will cause the node to experience an Java heap space exception and 
OOM. The nodes  then fail to startup correctly and write a -1 as the leader 
number to the partitions they were responsible for effectively reseting the 
offset and rendering that partition unavailable. The kafka process then needs 
to go be restarted in order to re-assign the node to the partition that it owns.

I have a few questions:
1. I am not quite sure why there are so many client id entries in that 
JmxReporter map.
2. Is there a way to have the JmxReporter release metrics after a set amount of 
time or a way to turn certain high cardinality metrics like these off?

I can provide any logs or heap dumps if more information is needed.


> JmxReporter uses excessive memory causing OutOfMemoryException
> --
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> resetting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException

2016-07-20 Thread Andrew Jorgensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Jorgensen updated KAFKA-3980:

Summary: JmxReporter uses excessive memory causing OutOfMemoryException  
(was: JmxReporter uses an excessive memory causing OutOfMemoryException)

> JmxReporter uses excessive memory causing OutOfMemoryException
> --
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> reseting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3980) JmxReporter uses an excessive memory causing OutOfMemoryException

2016-07-20 Thread Andrew Jorgensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Jorgensen updated KAFKA-3980:

Description: 
I have some nodes in a kafka cluster that occasionally will run out of memory 
whenever I restart the producers. I was able to take a heap dump from both a 
recently restarted Kafka node which weighed in at about 20 MB and a node that 
has been running for 2 months is using over 700MB of memory. Looking at the 
heap dump it looks like the JmxReporter is holding on to metrics and causing 
them to build up over time. 

!http://imgur.com/N6Cd0Ku.png!

!http://imgur.com/kQBqA2j.png!

The ultimate problem this causes is that there is a chance when I restart the 
producers it will cause the node to experience an Java heap space exception and 
OOM. The nodes  then fail to startup correctly and write a -1 as the leader 
number to the partitions they were responsible for effectively reseting the 
offset and rendering that partition unavailable. The kafka process then needs 
to go be restarted in order to re-assign the node to the partition that it owns.

I have a few questions:
1. I am not quite sure why there are so many client id entries in that 
JmxReporter map.
2. Is there a way to have the JmxReporter release metrics after a set amount of 
time or a way to turn certain high cardinality metrics like these off?

I can provide any logs or heap dumps if more information is needed.

  was:
I have some nodes in a kafka cluster that occasionally will run out of memory 
whenever I restart the producers. I was able to take a heap dump from both a 
recently restarted Kafka node which weighed in at about 20 MB and a node that 
has been running for 2 months is using over 700MB of memory. Looking at the 
heap dump it looks like the JmxReporter is holding on to metrics and causing 
them to build up over time. 

!http://imgur.com/N6Cd0Ku.png!

!http://imgur.com/kQBqA2j.png!

The ultimate problem this causes is that there is a change when I restart the 
producers it will cause the node to experience an Java heap space exception and 
OOM. The nodes  then fail to startup correctly and write a -1 as the leader 
number to the partitions they were responsible for effectively reseting the 
offset and rendering that partition unavailable. The kafka process then needs 
to go be restarted in order to re-assign the node to the partition that it owns.

I have a few questions:
1. I am not quite sure why there are so many client id entries in that 
JmxReporter map.
2. Is there a way to have the JmxReporter release metrics after a set amount of 
time or a way to turn certain high cardinality metrics like these off?

I can provide any logs or heap dumps if more information is needed.


> JmxReporter uses an excessive memory causing OutOfMemoryException
> -
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> reseting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3980) JmxReporter uses an excessive memory causing OutOfMemoryException

2016-07-20 Thread Andrew Jorgensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Jorgensen updated KAFKA-3980:

Description: 
I have some nodes in a kafka cluster that occasionally will run out of memory 
whenever I restart the producers. I was able to take a heap dump from both a 
recently restarted Kafka node which weighed in at about 20 MB and a node that 
has been running for 2 months is using over 700MB of memory. Looking at the 
heap dump it looks like the JmxReporter is holding on to metrics and causing 
them to build up over time. 

!http://imgur.com/N6Cd0Ku.png!

!http://imgur.com/kQBqA2j.png!

The ultimate problem this causes is that there is a change when I restart the 
producers it will cause the node to experience an Java heap space exception and 
OOM. The nodes  then fail to startup correctly and write a -1 as the leader 
number to the partitions they were responsible for effectively reseting the 
offset and rendering that partition unavailable. The kafka process then needs 
to go be restarted in order to re-assign the node to the partition that it owns.

I have a few questions:
1. I am not quite sure why there are so many client id entries in that 
JmxReporter map.
2. Is there a way to have the JmxReporter release metrics after a set amount of 
time or a way to turn certain high cardinality metrics like these off?

I can provide any logs or heap dumps if more information is needed.

  was:
I have some nodes in a kafka cluster that occasionally will run out of memory 
whenever I restart the producers. I was able to take a heap dump from both a 
recently restarted Kafka node which weighed in at about 20 MB while a node that 
has been running for 2 months is using over 700MB of memory. Looking at the 
heap dump it looks like the JmxReporter is holding on to metrics and causing 
them to build up over time. 

!http://imgur.com/N6Cd0Ku.png!

!http://imgur.com/kQBqA2j.png!

The ultimate problem this causes is that there is a change when I restart the 
producers it will cause the node to experience an Java heap space exception and 
OOM. The nodes  then fail to startup correctly and write a -1 as the leader 
number to the partitions they were responsible for effectively reseting the 
offset and rendering that partition unavailable. The kafka process then needs 
to go be restarted in order to re-assign the node to the partition that it owns.

I have a few questions:
1. I am not quite sure why there are so many client id entries in that 
JmxReporter map.
2. Is there a way to have the JmxReporter release metrics after a set amount of 
time or a way to turn certain high cardinality metrics like these off?

I can provide any logs or heap dumps if more information is needed.


> JmxReporter uses an excessive memory causing OutOfMemoryException
> -
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a change when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> reseting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3980) JmxReporter uses an excessive memory causing OutOfMemoryException

2016-07-20 Thread Andrew Jorgensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Jorgensen updated KAFKA-3980:

Summary: JmxReporter uses an excessive memory causing OutOfMemoryException  
(was: JmxReporter using excessive memory causing OutOfMemoryException)

> JmxReporter uses an excessive memory causing OutOfMemoryException
> -
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB while a node 
> that has been running for 2 months is using over 700MB of memory. Looking at 
> the heap dump it looks like the JmxReporter is holding on to metrics and 
> causing them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a change when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> reseting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3980) JmxReporter using excessive memory causing OutOfMemoryException

2016-07-20 Thread Andrew Jorgensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Jorgensen updated KAFKA-3980:

Summary: JmxReporter using excessive memory causing OutOfMemoryException  
(was: JMXReport using excessive memory)

> JmxReporter using excessive memory causing OutOfMemoryException
> ---
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB while a node 
> that has been running for 2 months is using over 700MB of memory. Looking at 
> the heap dump it looks like the JmxReporter is holding on to metrics and 
> causing them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a change when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> reseting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3980) JMXReport using excessive memory

2016-07-20 Thread Andrew Jorgensen (JIRA)
Andrew Jorgensen created KAFKA-3980:
---

 Summary: JMXReport using excessive memory
 Key: KAFKA-3980
 URL: https://issues.apache.org/jira/browse/KAFKA-3980
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.9.0.1
Reporter: Andrew Jorgensen


I have some nodes in a kafka cluster that occasionally will run out of memory 
whenever I restart the producers. I was able to take a heap dump from both a 
recently restarted Kafka node which weighed in at about 20 MB while a node that 
has been running for 2 months is using over 700MB of memory. Looking at the 
heap dump it looks like the JmxReporter is holding on to metrics and causing 
them to build up over time. 

!http://imgur.com/N6Cd0Ku.png!

!http://imgur.com/kQBqA2j.png!

The ultimate problem this causes is that there is a change when I restart the 
producers it will cause the node to experience an Java heap space exception and 
OOM. The nodes  then fail to startup correctly and write a -1 as the leader 
number to the partitions they were responsible for effectively reseting the 
offset and rendering that partition unavailable. The kafka process then needs 
to go be restarted in order to re-assign the node to the partition that it owns.

I have a few questions:
1. I am not quite sure why there are so many client id entries in that 
JmxReporter map.
2. Is there a way to have the JmxReporter release metrics after a set amount of 
time or a way to turn certain high cardinality metrics like these off?

I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1816) Topic configs reset after partition increase

2014-12-09 Thread Andrew Jorgensen (JIRA)
Andrew Jorgensen created KAFKA-1816:
---

 Summary: Topic configs reset after partition increase
 Key: KAFKA-1816
 URL: https://issues.apache.org/jira/browse/KAFKA-1816
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Andrew Jorgensen
Priority: Minor


If you alter a topic to increase the number of partitions then the 
configuration erases the existing configs for that topic. This can be 
reproduces by doing the following:

{code:none}
$ bin/kafka-topics.sh --create --zookeeper localhost --topic test_topic 
--partitions 5 --config retention.ms=3600

$ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic
 Topic:test_topicPartitionCount:5ReplicationFactor:1 
 Configs:retention.ms=3600

$ bin/kafka-topics.sh --alter --zookeeper localhost --topic test_topic 
--partitions 10
 Topic:test_topicPartitionCount:10ReplicationFactor:1 
 Configs:
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1816) Topic configs reset after partition increase

2014-12-09 Thread Andrew Jorgensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Jorgensen updated KAFKA-1816:

Description: 
If you alter a topic to increase the number of partitions then the 
configuration erases the existing configs for that topic. This can be 
reproduces by doing the following:

{code:none}
$ bin/kafka-topics.sh --create --zookeeper localhost --topic test_topic 
--partitions 5 --config retention.ms=3600

$ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic
 Topic:test_topicPartitionCount:5ReplicationFactor:1 
 Configs:retention.ms=3600

$ bin/kafka-topics.sh --alter --zookeeper localhost --topic test_topic 
--partitions 10

$ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic
 Topic:test_topicPartitionCount:10ReplicationFactor:1 
 Configs:
{code}

  was:
If you alter a topic to increase the number of partitions then the 
configuration erases the existing configs for that topic. This can be 
reproduces by doing the following:

{code:none}
$ bin/kafka-topics.sh --create --zookeeper localhost --topic test_topic 
--partitions 5 --config retention.ms=3600

$ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic
 Topic:test_topicPartitionCount:5ReplicationFactor:1 
 Configs:retention.ms=3600

$ bin/kafka-topics.sh --alter --zookeeper localhost --topic test_topic 
--partitions 10
 Topic:test_topicPartitionCount:10ReplicationFactor:1 
 Configs:
{code}


 Topic configs reset after partition increase
 

 Key: KAFKA-1816
 URL: https://issues.apache.org/jira/browse/KAFKA-1816
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Andrew Jorgensen
Priority: Minor

 If you alter a topic to increase the number of partitions then the 
 configuration erases the existing configs for that topic. This can be 
 reproduces by doing the following:
 {code:none}
 $ bin/kafka-topics.sh --create --zookeeper localhost --topic test_topic 
 --partitions 5 --config retention.ms=3600
 $ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic
  Topic:test_topicPartitionCount:5ReplicationFactor:1 
  Configs:retention.ms=3600
 $ bin/kafka-topics.sh --alter --zookeeper localhost --topic test_topic 
 --partitions 10
 $ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic
  Topic:test_topicPartitionCount:10ReplicationFactor:1 
  Configs:
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1816) Topic configs reset after partition increase

2014-12-09 Thread Andrew Jorgensen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240245#comment-14240245
 ] 

Andrew Jorgensen commented on KAFKA-1816:
-

I can try this out on 0.8.2 when I get home to verify. 

 Topic configs reset after partition increase
 

 Key: KAFKA-1816
 URL: https://issues.apache.org/jira/browse/KAFKA-1816
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Andrew Jorgensen
Priority: Minor
  Labels: newbie
 Fix For: 0.8.3


 If you alter a topic to increase the number of partitions then the 
 configuration erases the existing configs for that topic. This can be 
 reproduces by doing the following:
 {code:none}
 $ bin/kafka-topics.sh --create --zookeeper localhost --topic test_topic 
 --partitions 5 --config retention.ms=3600
 $ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic
  Topic:test_topicPartitionCount:5ReplicationFactor:1 
  Configs:retention.ms=3600
 $ bin/kafka-topics.sh --alter --zookeeper localhost --topic test_topic 
 --partitions 10
 $ bin/kafka-topics.sh --describe --zookeeper localhost --topic test_topic
  Topic:test_topicPartitionCount:10ReplicationFactor:1 
  Configs:
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)