Re: JVM metrics disappearing after job crash, restart

Nikolas Davis Mon, 04 Jun 2018 17:03:20 -0700

Fabian,

It does look like it may be related. I'll add a comment. After digging a
bit more I found that the crash and lack of metrics were precipitated by the
JobManager instance crashing and cycling, which caused the job to restart.



Chesnay,

I didn't see anything interesting in our logs. Our reporter config is
fairly straightforward (I think):

metrics.reporter.nr.class: com.newrelic.flink.NewRelicReporter
metrics.reporter.nr.interval: 60 SECONDS
metrics.reporters: nr

Nik Davis
Software Engineer
New Relic

On Mon, Jun 4, 2018 at 1:56 AM, Chesnay Schepler <ches...@apache.org> wrote:

> Can you show us the metrics-related configuration parameters in
> flink-conf.yaml?
>
> Please also check the logs for any warnings from the MetricGroup and 
> MetricRegistry
> classes.
>
>
> On 04.06.2018 10:44, Fabian Hueske wrote:
>
> Hi Nik,
>
> Can you have a look at this JIRA ticket [1] and check if it is related to
> the problems your are facing?
> If so, would you mind leaving a comment there?
>
> Thank you,
> Fabian
>
> [1] https://issues.apache.org/jira/browse/FLINK-8946
>
> 2018-05-31 4:41 GMT+02:00 Nikolas Davis <nda...@newrelic.com>:
>
>> We keep track of metrics by using the value of
>> MetricGroup::getMetricIdentifier, which returns the fully qualified
>> metric name. The query that we use to monitor metrics filters for metrics
>> IDs that match '%Status.JVM.Memory%'. As long as the new metrics come
>> online via the MetricReporter interface then I think the chart would be
>> continuous; we would just see the old JVM memory metrics cycle into new
>> metrics.
>>
>> Nik Davis
>> Software Engineer
>> New Relic
>>
>> On Wed, May 30, 2018 at 5:30 PM, Ajay Tripathy <aj...@yelp.com> wrote:
>>
>>> How are your metrics dimensionalized/named? Task managers often have
>>> UIDs generated for them. The task id dimension will change on restart. If
>>> you name your metric based on this 'task_id' there would be a discontinuity
>>> with the old metric.
>>>
>>> On Wed, May 30, 2018 at 4:49 PM, Nikolas Davis <nda...@newrelic.com>
>>> wrote:
>>>
>>>> Howdy,
>>>>
>>>> We are seeing our task manager JVM metrics disappear over time. This
>>>> last time we correlated it to our job crashing and restarting. I wasn't
>>>> able to grab the failing exception to share. Any thoughts?
>>>>
>>>> We track metrics through the MetricReporter interface. As far as I can
>>>> tell this more or less only affects the JVM metrics. I.e. most / all other
>>>> metrics continue reporting fine as the job is automatically restarted.
>>>>
>>>> Nik Davis
>>>> Software Engineer
>>>> New Relic
>>>>
>>>
>>>
>>
>
>

Re: JVM metrics disappearing after job crash, restart

Reply via email to