[jira] [Commented] (FLINK-8506) fullRestarts Gauge not incremented when jobmanager got killed

2018-01-25 Thread Steven Zhen Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339560#comment-16339560
 ] 

Steven Zhen Wu commented on FLINK-8506:
---

Till, thanks for the explanation. Looks like we should clarify the doc, which 
says "since job submitted".

[https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html]

fullRestarts The total number of full restarts since this job was submitted (in 
milliseconds). Gauge

So it seems that we don't any metric to capture jobmanager failover.

> fullRestarts Gauge not incremented when jobmanager got killed
> -
>
> Key: FLINK-8506
> URL: https://issues.apache.org/jira/browse/FLINK-8506
> Project: Flink
>  Issue Type: Bug
>Reporter: Steven Zhen Wu
>Priority: Major
>
> [~till.rohrmann] When jobmanager node got killed, it will cause job restart. 
> But in this case, we didn't see _fullRestarts_ guage got incremented. is this 
> expected or a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8506) fullRestarts Gauge not incremented when jobmanager got killed

2018-01-25 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339177#comment-16339177
 ] 

Till Rohrmann commented on FLINK-8506:
--

In case of a JobManager failover the {{fullRestarts}} gauge will be reset since 
the number of {{fullRestarts}} is not persisted. This applies to more or less 
all metrics in Flink.

> fullRestarts Gauge not incremented when jobmanager got killed
> -
>
> Key: FLINK-8506
> URL: https://issues.apache.org/jira/browse/FLINK-8506
> Project: Flink
>  Issue Type: Bug
>Reporter: Steven Zhen Wu
>Priority: Major
>
> [~till.rohrmann] When jobmanager node got killed, it will cause job restart. 
> But in this case, we didn't see _fullRestarts_ guage got incremented. is this 
> expected or a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)