Re: Flink Prometheus metric doubt

2020-01-02 Thread Chesnay Schepler
In practice the documentation is incorrect. While technically the metric 
_would_ emit -1 if the job is in a failed/finished state, the reality is 
that at this point the metric is unregistered and no longer updated, 
since the owning component (the jobmanager) is shutting down.


I can't think of a workaround for this problem at the moment.

On 19/12/2019 11:56, Jesús Vásquez wrote:

Hi all, i'm monitoring Flink jobs using prometheus.
I have been trying to use the metrics 
flink_jobmanager_job_uptime/downtime in order to create an alert, that 
fires when one of this values emits -1 since the doc says this is the 
behavior of the metric when the job gets to a completed state.
The thing is that i have tested the behavior when one of my job fails 
and the mentioned metrics never emit something different than zero. 
Finally the metric disappears after the job has failed.

Am i missing something or is this the expected behavior ?





Re: Flink Prometheus metric doubt

2019-12-19 Thread vino yang
Hi Jesus,

IMHO, maybe @Chesnay Schepler  can provide more
information.

Best,
Vino

Jesús Vásquez  于2019年12月19日周四 下午6:57写道:

> Hi all, i'm monitoring Flink jobs using prometheus.
> I have been trying to use the metrics flink_jobmanager_job_uptime/downtime
> in order to create an alert, that fires when one of this values emits -1
> since the doc says this is the behavior of the metric when the job gets to
> a completed state.
> The thing is that i have tested the behavior when one of my job fails and
> the mentioned metrics never emit something different than zero. Finally the
> metric disappears after the job has failed.
> Am i missing something or is this the expected behavior ?
>


Flink Prometheus metric doubt

2019-12-19 Thread Jesús Vásquez
Hi all, i'm monitoring Flink jobs using prometheus.
I have been trying to use the metrics flink_jobmanager_job_uptime/downtime
in order to create an alert, that fires when one of this values emits -1
since the doc says this is the behavior of the metric when the job gets to
a completed state.
The thing is that i have tested the behavior when one of my job fails and
the mentioned metrics never emit something different than zero. Finally the
metric disappears after the job has failed.
Am i missing something or is this the expected behavior ?