Re: CPU Spike with Jmx_exporter

2018-07-10 Thread Alain RODRIGUEZ
Hello,

I did not work with the 'jmx_exporter' in a production cluster, but for
datadog agent and other collectors I could work with, the number of metrics
being collected was a key point.

Cassandra exposes a lot of metrics and I saw datadog agents taking too much
CPU, I even saw Graphite servers falling because of the load due to some
Cassandra nodes sending metrics. I would recommend you to make sure that
you are filtering-in only metrics that are used to display some charts or
used for alerting purposes. Restrict the pattern for the rules as much as
possible.

Also, for datadog agents, some work was done in the latest version so
metric collection requires less CPU. Maybe is there a similar update that
was released or that you could ask for. Also, CPU might be related to GC as
I see the agent is running inside a JVM, some tuning might help.

If you really cannot do much to improve it on your side, I would open an
issue or a discussion on prometheus side (
https://github.com/prometheus/jmx_exporter/issues maybe?).

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2018-07-05 20:39 GMT+01:00 rajpal reddy :

> We are seeing the CPU spike only when Jmx metrics are exposed using
> Jmx_exporter.  tried setting up imx authentication still see cpu spike. if
> i stop using jmx exporter  we don’t see any cpu spike. is there any thing
> we have to tune to make work with Jmx_exporter?
>
>
> On Jun 14, 2018, at 2:18 PM, rajpal reddy 
> wrote:
>
> Hey Chris,
>
> Sorry to bother you. Did you get a chance to look at the gclog file I sent
> last night.
>
> On Wed, Jun 13, 2018, 8:44 PM rajpal reddy 
> wrote:
>
>> Chris,
>>
>> sorry attached wrong log file. attaching gc collection seconds and cpu.
>> there were going high at the same time and also attached the gc.log.
>> grafana dashboard and gc.log timing are 4hours apart gc can be see 06/12th
>> around 22:50
>>
>> rate(jvm_gc_collection_seconds_sum{"}[5m])
>>
>> > On Jun 13, 2018, at 5:26 PM, Chris Lohfink  wrote:
>> >
>> > There are not even a 100ms GC pause in that, are you certain theres a
>> problem?
>> >
>> >> On Jun 13, 2018, at 3:00 PM, rajpal reddy 
>> wrote:
>> >>
>> >> Thanks Chris I did attached the gc logs already. reattaching them
>> now.
>> >>
>> >> it started yesterday around 11:54PM
>> >>> On Jun 13, 2018, at 3:56 PM, Chris Lohfink 
>> wrote:
>> >>>
>>  What is the criteria for picking up the value for G1ReservePercent?
>> >>>
>> >>>
>> >>> it depends on the object allocation rate vs the size of the heap.
>> Cassandra ideally would be sub 500-600mb/s allocations but it can spike
>> pretty high with something like reading a wide partition or repair
>> streaming which might exceed what the g1 ygcs tenuring and timing is
>> prepared for from previous steady rate. Giving it a bigger buffer is a nice
>> safety net for allocation spikes.
>> >>>
>>  is the HEAP_NEWSIZE is required only for CMS
>> >>>
>> >>>
>> >>> it should only set Xmn with that if using CMS, with G1 it should be
>> ignored or else yes it would be bad to set Xmn. Giving the gc logs will
>> give the results of all the bash scripts along with details of whats
>> happening so its your best option if you want help to share that.
>> >>>
>> >>> Chris
>> >>>
>>  On Jun 13, 2018, at 12:17 PM, Subroto Barua <
>> sbarua...@yahoo.com.INVALID > wrote:
>> 
>>  Chris,
>>  What is the criteria for picking up the value for G1ReservePercent?
>> 
>>  Subroto
>> 
>> > On Jun 13, 2018, at 6:52 AM, Chris Lohfink 
>> wrote:
>> >
>> > G1ReservePercent
>> 
>>  
>> -
>>  To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>  For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
>> >>>
>> >>>
>> >>> -
>> >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> >>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> >>>
>> >>
>> >>
>> >>
>> >> -
>> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> >> For additional commands, e-mail: user-h...@cassandra.apache.org
>> >
>> >
>> > -
>> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> > For additional commands, e-mail: user-h...@cassandra.apache.org
>> >
>>
>>
>


CPU Spike with Jmx_exporter

2018-07-05 Thread rajpal reddy
We are seeing the CPU spike only when Jmx metrics are exposed using 
Jmx_exporter.  tried setting up imx authentication still see cpu spike. if i 
stop using jmx exporter  we don’t see any cpu spike. is there any thing we have 
to tune to make work with Jmx_exporter?


> On Jun 14, 2018, at 2:18 PM, rajpal reddy  wrote:
> 
> Hey Chris,
> 
> Sorry to bother you. Did you get a chance to look at the gclog file I sent 
> last night.
> 
> On Wed, Jun 13, 2018, 8:44 PM rajpal reddy  > wrote:
> Chris,
> 
> sorry attached wrong log file. attaching gc collection seconds and cpu. there 
> were going high at the same time and also attached the gc.log. grafana 
> dashboard and gc.log timing are 4hours apart gc can be see 06/12th around 
> 22:50
> 
> rate(jvm_gc_collection_seconds_sum{"}[5m])
> 
> > On Jun 13, 2018, at 5:26 PM, Chris Lohfink  > > wrote:
> > 
> > There are not even a 100ms GC pause in that, are you certain theres a 
> > problem?
> > 
> >> On Jun 13, 2018, at 3:00 PM, rajpal reddy  >> > wrote:
> >> 
> >> Thanks Chris I did attached the gc logs already. reattaching them 
> >> now.
> >> 
> >> it started yesterday around 11:54PM 
> >>> On Jun 13, 2018, at 3:56 PM, Chris Lohfink  >>> > wrote:
> >>> 
>  What is the criteria for picking up the value for G1ReservePercent?
> >>> 
> >>> 
> >>> it depends on the object allocation rate vs the size of the heap. 
> >>> Cassandra ideally would be sub 500-600mb/s allocations but it can spike 
> >>> pretty high with something like reading a wide partition or repair 
> >>> streaming which might exceed what the g1 ygcs tenuring and timing is 
> >>> prepared for from previous steady rate. Giving it a bigger buffer is a 
> >>> nice safety net for allocation spikes.
> >>> 
>  is the HEAP_NEWSIZE is required only for CMS
> >>> 
> >>> 
> >>> it should only set Xmn with that if using CMS, with G1 it should be 
> >>> ignored or else yes it would be bad to set Xmn. Giving the gc logs will 
> >>> give the results of all the bash scripts along with details of whats 
> >>> happening so its your best option if you want help to share that.
> >>> 
> >>> Chris
> >>> 
>  On Jun 13, 2018, at 12:17 PM, Subroto Barua 
>   wrote:
>  
>  Chris,
>  What is the criteria for picking up the value for G1ReservePercent?
>  
>  Subroto 
>  
> > On Jun 13, 2018, at 6:52 AM, Chris Lohfink  > > wrote:
> > 
> > G1ReservePercent
>  
>  -
>  To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>  
>  For additional commands, e-mail: user-h...@cassandra.apache.org 
>  
>  
> >>> 
> >>> 
> >>> -
> >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> >>> 
> >>> For additional commands, e-mail: user-h...@cassandra.apache.org 
> >>> 
> >>> 
> >> 
> >> 
> >> 
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> >> 
> >> For additional commands, e-mail: user-h...@cassandra.apache.org 
> >> 
> > 
> > 
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> > 
> > For additional commands, e-mail: user-h...@cassandra.apache.org 
> > 
> > 
>