Re: Metrics: (1) Histogram in prometheus and (2) meter by event_time

2018-09-28 Thread Averell
Hello Kostas,

Thank you very much for the details. Also thanks for that "feel free to
disagree" (however, I don't have any desire to disagree here :) thanks a
lot)

Regarding that mainStream.windowAll, did you mean that checkpointing of the
two branches (the main one and the monitoring one) will be separated? Is
that possible to disable checkpointing for that 2nd branch?

Thanks and best regards,
Averell 



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Metrics: (1) Histogram in prometheus and (2) meter by event_time

2018-09-27 Thread Kostas Kloudas
Hi Averell,

> On Sep 27, 2018, at 3:09 PM, Averell  wrote:
> 
> Hi Kostas,
> 
> Yes, I want them as metrics, as they are purely for monitoring purpose.
> There's no need of fault tolerance.
> 
> If I use side-output, for example for that metric no.1, I would need a
> tumbling AllWindowFunction, which, as I understand, would introduce some
> delay to both the normal processing flow, and to the checkpoint process. 
> 

Side-output may introduce all that but you can always do something like:

mymainStream = …

myMainStream.myMainComputation….

muMainStream.windowAll().myMonitoringComputation…

This does not affect the main path of your computation (if this is your only 
concern).

> I already tried to follow the referencing web page that you sent. However, I
> could not know how to have what I want.
> For example, with metrics no.1 - meter: org.apache.flink.metrics.Meter only
> provides markEvent(), which marks an event to that Meter. There is no option
> to provide the event_time, and processing_time is always used. So my graph
> is spread over time like the one below.
> 
>  
> 

Event time is a notion of Flink and a property of your data (timestamp).

Metric systems like Prometheus take whatever you expose as metric and attach a 
timestamp based on 
the current wall-clock time, as for them, the time an event occurred is the 
time that they got that metric.

So, if you want event-time computations, then those should be done in Flink.

> For metrics no.2 - histogram: What I can see at Prometheus is the calculated
> percentile values (0.5, 0.75, 0.9, 0.99, 0.999), which tells me, for
> example: 99% the total number of records had ts1-ts2 <= 350s (which looks
> more like a rolling average). But it doesn't tell me roughly how many % of
> record have diff of 250ms, how many of 260ms, etc...
> 
>  
> 

For this case, the two notions of histograms (yours and Prometheus’) are not 
aligned. So
what you could do, is keep each “bucket” of your histogram as a separate metric 
and 
expose it as such to prometheus. So essentially, you are the one creating the 
histogram.

Metric systems do not perform any (complex) computation for you.

Once again, I would say that these exploratory metrics about your stream fall 
more under the category of
analytics about your input data, rather than “metrics”, but of course feel free 
to disagree :)

Cheers,
Kostas

> Thanks and regards,
> Averell
> 
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/



Re: Metrics: (1) Histogram in prometheus and (2) meter by event_time

2018-09-27 Thread Averell
Hi Kostas,

Yes, I want them as metrics, as they are purely for monitoring purpose.
There's no need of fault tolerance.

If I use side-output, for example for that metric no.1, I would need a
tumbling AllWindowFunction, which, as I understand, would introduce some
delay to both the normal processing flow, and to the checkpoint process. 

I already tried to follow the referencing web page that you sent. However, I
could not know how to have what I want.
For example, with metrics no.1 - meter: org.apache.flink.metrics.Meter only
provides markEvent(), which marks an event to that Meter. There is no option
to provide the event_time, and processing_time is always used. So my graph
is spread over time like the one below.

 

For metrics no.2 - histogram: What I can see at Prometheus is the calculated
percentile values (0.5, 0.75, 0.9, 0.99, 0.999), which tells me, for
example: 99% the total number of records had ts1-ts2 <= 350s (which looks
more like a rolling average). But it doesn't tell me roughly how many % of
record have diff of 250ms, how many of 260ms, etc...

 

Thanks and regards,
Averell




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Metrics: (1) Histogram in prometheus and (2) meter by event_time

2018-09-27 Thread Kostas Kloudas
Hi Averell,

From what I understand for your use case, it is possible to do what you want 
with Flink.

If you are implementing a function, then you have access to the metric system 
through
the runtime context (see [1] for more information).

Some things to take into consideration:

1) Metrics are not fault-tolerant, so if you need fault-tolerance then you have 
to take care of that 
(e.g. keep them in Flink’s state).

2) Are you sure you want them as metrics and not something like side-output? 
Metrics are more 
supposed to monitor the health of you cluster and more “system” 
characteristics, rather than
business logic or data properties.

Cheers,
Kostas

[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html 


> On Sep 27, 2018, at 8:45 AM, Averell  wrote:
> 
> Good day everyone,
> 
> I have a stream with two timestamps (ts1 and ts2) inside each record. My
> event time is ts1. This ts1 has value truncated to a quarter (like 23:30,
> 23:45, 00:00,...) 
> I want to report two metrics:
> 1. A meter which counts number of records per value of ts1. (fig.1)
> 
> 
>  
> 
> 2. A histogram which shows the distribution of the difference between ts1
> and ts2 within each record (fig.2)
> 
>  
> 
> I'm using Prometheus with Grafana. Is that possible to do what I mentioned?
> 
> Thanks and best regards,
> Averell
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/