[jira] [Closed] (FLINK-10907) Job recovery on the same JobManager causes JobManager metrics to report stale values

2018-12-13 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-10907.
-
Resolution: Not A Problem

Reopen if the problem also affects the Flip-6 mode.

> Job recovery on the same JobManager causes JobManager metrics to report stale 
> values
> 
>
> Key: FLINK-10907
> URL: https://issues.apache.org/jira/browse/FLINK-10907
> Project: Flink
>  Issue Type: Bug
>  Components: Core, Metrics
>Affects Versions: 1.4.2
> Environment: Verified the bug and the fix running on Flink 1.4
> Based on the JobManagerMetricGroup.java code in master, this issue should 
> still occur on Flink versions after 1.4.
>Reporter: Mark Cho
>Priority: Minor
>  Labels: pull-request-available
>
> https://github.com/apache/flink/pull/7119
>  * JobManager loses and regains leadership if it loses connection and 
> reconnects to ZooKeeper.
>  * When it regains the leadership, it tries to recover the job graph.
>  * During the recovery, it will try to reuse the existing 
> {{JobManagerMetricGroup}} to register new counters and gauges under the same 
> metric name, which causes the new counters and gauges to be registered 
> incorrectly.
>  * The old counters and gauges will continue to
>  report the stale values and the new counters and gauges will not report
>  the latest metric.
> Relevant lines from logs
> {code:java}
> com.---.JobManager - Submitting recovered job 
> e9e49fd9b8c61cf54b435f39aa49923f.
> com.---.JobManager - Submitting job e9e49fd9b8c61cf54b435f39aa49923f 
> (flink-job) (Recovery).
> com.---.JobManager - Running initialization on master for job flink-job 
> (e9e49fd9b8c61cf54b435f39aa49923f).
> com.---.JobManager - Successfully ran initialization on master in 0 ms.
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'totalNumberOfCheckpoints'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'numberOfInProgressCheckpoints'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'numberOfCompletedCheckpoints'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'numberOfFailedCheckpoints'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'lastCheckpointRestoreTimestamp'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'lastCheckpointSize'. Metric will not be reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'lastCheckpointDuration'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'lastCheckpointAlignmentBuffered'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'lastCheckpointExternalPath'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'restartingTime'. Metric will not be reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'downtime'. Metric will not be reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'uptime'. Metric will not be reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'fullRestarts'. Metric will not be reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'task_failures'. Metric will not be reported.[]
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-10907) Job recovery on the same JobManager causes JobManager metrics to report stale values

2018-11-16 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-10907.

Resolution: Won't Fix

Not a problem in 1.5 and above. The JobManagerJobMetricGroup is closed when a 
{{JobMaster}} exits and is properly overwritten on recovery in 
{{JobManagerMetricGroup#addJob}}.

> Job recovery on the same JobManager causes JobManager metrics to report stale 
> values
> 
>
> Key: FLINK-10907
> URL: https://issues.apache.org/jira/browse/FLINK-10907
> Project: Flink
>  Issue Type: Bug
>  Components: Core, Metrics
>Affects Versions: 1.4.2
> Environment: Verified the bug and the fix running on Flink 1.4
> Based on the JobManagerMetricGroup.java code in master, this issue should 
> still occur on Flink versions after 1.4.
>Reporter: Mark Cho
>Priority: Minor
>  Labels: pull-request-available
>
> https://github.com/apache/flink/pull/7119
>  * JobManager loses and regains leadership if it loses connection and 
> reconnects to ZooKeeper.
>  * When it regains the leadership, it tries to recover the job graph.
>  * During the recovery, it will try to reuse the existing 
> {{JobManagerMetricGroup}} to register new counters and gauges under the same 
> metric name, which causes the new counters and gauges to be registered 
> incorrectly.
>  * The old counters and gauges will continue to
>  report the stale values and the new counters and gauges will not report
>  the latest metric.
> Relevant lines from logs
> {code:java}
> com.---.JobManager - Submitting recovered job 
> e9e49fd9b8c61cf54b435f39aa49923f.
> com.---.JobManager - Submitting job e9e49fd9b8c61cf54b435f39aa49923f 
> (flink-job) (Recovery).
> com.---.JobManager - Running initialization on master for job flink-job 
> (e9e49fd9b8c61cf54b435f39aa49923f).
> com.---.JobManager - Successfully ran initialization on master in 0 ms.
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'totalNumberOfCheckpoints'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'numberOfInProgressCheckpoints'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'numberOfCompletedCheckpoints'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'numberOfFailedCheckpoints'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'lastCheckpointRestoreTimestamp'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'lastCheckpointSize'. Metric will not be reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'lastCheckpointDuration'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'lastCheckpointAlignmentBuffered'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'lastCheckpointExternalPath'. Metric will not be 
> reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'restartingTime'. Metric will not be reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'downtime'. Metric will not be reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'uptime'. Metric will not be reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'fullRestarts'. Metric will not be reported.[]
> org.apache.flink.metrics.MetricGroup - Name collision: Group already contains 
> a Metric with the name 'task_failures'. Metric will not be reported.[]
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)