subject:"Flink Metrics"

hi 
不好意思刚刚图好像又挂了 

不知道这个能否查看。










在 2022-11-28 13:50:37，"m17610775726_1"  写道：

hi


你的图片挂了 可以用图床上传一下图片 在这里贴个链接 另外自定义 reportor 把需要的metric 过滤出来上报就行了
 回复的原邮件 
| 发件人 | 陈佳豪 |
| 发送日期 | 2022年11月28日 00:54 |
| 收件人 | user-zh |
| 主题 | 请问flink metrics如何获取任务状态？ |
自定义了一个kafka  Metric Reporters #请问如何使用上述指标呢？
 我想通过上报获取任务状态。除了上述指标外如果有其他方案也可以，当前flink 版本是15.2 还望大神指教一番。

Re:回复：请问flink metrics如何获取任务状态？

这个metrics 我获取不到。 不知道要怎么配置才可以获取到。










在 2022-11-28 13:50:37，"m17610775726_1"  写道：

hi


你的图片挂了 可以用图床上传一下图片 在这里贴个链接 另外自定义 reportor 把需要的metric 过滤出来上报就行了
 回复的原邮件 
| 发件人 | 陈佳豪 |
| 发送日期 | 2022年11月28日 00:54 |
| 收件人 | user-zh |
| 主题 | 请问flink metrics如何获取任务状态？ |
自定义了一个kafka  Metric Reporters #请问如何使用上述指标呢？
 我想通过上报获取任务状态。除了上述指标外如果有其他方案也可以，当前flink 版本是15.2 还望大神指教一番。

回复：请问flink metrics如何获取任务状态？

有大佬告诉下吗？ 这个指标的值获取不到。

| |
陈佳豪
邮箱：jagec...@yeah.net
|
 回复的原邮件 
| 发件人 | 陈佳豪 |
| 发送日期 | 2022年11月28日 00:54 |
| 收件人 | user-zh |
| 主题 | 请问flink metrics如何获取任务状态？ |
自定义了一个kafka  Metric Reporters #请问如何使用上述指标呢？
 我想通过上报获取任务状态。除了上述指标外如果有其他方案也可以，当前flink 版本是15.2 还望大神指教一番。

请问flink metrics如何获取任务状态？

自定义了一个kafka  Metric Reporters #请问如何使用上述指标呢？
 我想通过上报获取任务状态。除了上述指标外如果有其他方案也可以，当前flink 版本是15.2 还望大神指教一番。

Flink metrics flattened after Job restart

2022-05-25 Thread Sahil Aulakh

Hi Flink Community

We are using Flink version 1.13.5 for our application and every time the
job restarts, Flink Job metrics are flattened following the restart.
For e.g. we are using lastCheckpointDuration and on 05/05 our job restarted
and at the same time the checkpoint duration metric flattened. Is it a
known issue? If there is any workaround, please let me know.

Thanks
Sahil Aulakh

Re: [flink-yarn]&[flink-metrics]&[influxdb] Yarn session模式下提交多个Job，只有首次提交的Job有Metrics数据上报

2022-04-14 Thread huweihua

感觉像是 二次改造的问题。可以关注几个点：
1. 指标 tag 里的 job_id 是怎么带上的，是否可能多个作业相互覆盖或者只有第一个生效的场景
2. 可以在自动修改的代码里增加更多的日志，进一步定位, 例如：在 notifyOfAddedMetric() 时打印注册了哪些 metric

> 2022年4月14日 下午6:41，QiZhu Chan  写道：
> 
> 
> 
> 
>
> 需要说明的是，1、InfluxdbReporter是经过二次改造的，改造后所有指标的tag均会带上job_id，方便以通过job_id查找到所有指标。2、在per-job场景下，没有这个问题，因为per-job作业拥有各自的Jobmanager。
> 
> 
>Flink版本：1.13.3Metrics库 : Influxdb
> 
>希望有懂的大佬能解答一下，谢谢！

[flink-yarn]&[flink-metrics]&[influxdb] Yarn session模式下提交多个Job，只有首次提交的Job有Metrics数据上报

2022-04-14 Thread QiZhu Chan

Hi，
在做Flink Metrics监控的工作过程中，有发现一个问题，Flink on yarn下，使用yarn session模式提交多个Flink 
Job，只有首次提交的Job，才能正常上报Metrics；后续提交的Job，Metrics均不上报，请问是什么原因？



需要说明的是，1、InfluxdbReporter是经过二次改造的，改造后所有指标的tag均会带上job_id，方便以通过job_id查找到所有指标。2、在per-job场景下，没有这个问题，因为per-job作业拥有各自的Jobmanager。


Flink版本：1.13.3Metrics库 : Influxdb

希望有懂的大佬能解答一下，谢谢！

Re: Flink metrics via permethous or opentelemerty

2022-02-24 Thread Nicolaus Weidner

Hi Sigalit,

first of all, have you read the docs page on metrics [1], and in particular
the Prometheus section on metrics reporters [2]?
Apart from that, there is also a (somewhat older) blog post about
integrating Flink with Prometheus, including a link to a repo with example
code [3].

Hope that helps to get you started!
Best,
Nico

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/metrics/#metrics
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/metric_reporters/#prometheus
[3] https://flink.apache.org/features/2019/03/11/prometheus-monitoring.html

On Wed, Feb 23, 2022 at 8:42 AM Sigalit Eliazov  wrote:

> Hello. I am looking for a way to expose flink metrics via opentelemerty to
> the gcp could monitoring dashboard.
> Does anyone has experience with that?
>
> If it is not directly possible we thought about using permethous as a
> middlewere.  If you have experience with that i would appreciate any
> guidance.
>
> Thanks
>

Flink metrics via permethous or opentelemerty

2022-02-22 Thread Sigalit Eliazov

Hello. I am looking for a way to expose flink metrics via opentelemerty to
the gcp could monitoring dashboard.
Does anyone has experience with that?

If it is not directly possible we thought about using permethous as a
middlewere.  If you have experience with that i would appreciate any
guidance.

Thanks

Re: regarding flink metrics

2022-02-01 Thread Chesnay Schepler

Your best bet is to create a custom reporter that does this calculation. 
You could either wrap the reporter, subclass is, or fork it.
In any case, 
https://github.com/apache/flink/tree/master/flink-metrics/flink-metrics-datadog 
should be a good starting point.


On 01/02/2022 13:26, Jessy Ping wrote:

Hi Team,

We are using datadog and its http reporter( packaged in flink image) 
for sending metrics from flink application. We do have a requirement 
for setting tags with values calculated at runtime for the custom 
metrics emitted from Flink. Currently, it is impossible to assign tags 
at runtime. Is there a work arround for the same ?


Thanks
Jessy

regarding flink metrics

2022-02-01 Thread Jessy Ping

Hi Team,

We are using datadog and its http reporter( packaged in flink image) for
sending metrics from flink application. We do have a requirement for
setting tags with values calculated at runtime for the custom metrics
emitted from Flink. Currently, it is impossible to assign tags at runtime.
Is there a work arround for the same ?

Thanks
Jessy

Re: Flink Metrics Naming

2021-06-01 Thread Chesnay Schepler


Some more background on MetricGroups:
Internally there (mostly) 3 types of metric groups:
On the one hand we have the ComponentMetricGroups (like 
TaskManagerMetricGroup) that describe a high-level Flink entity, which 
just add a constant expression to the logical scope(like taskmanager, 
task etc.). These exist to support scope formats (although this 
should've been implemented differently, but that's a another story).


On the other hand we have groups created via addGroup(String), which are 
added to the logical scope as is; this is sometimes good(e.g., 
addGroup("KafkaConsumer"), and sometimes isn't (e.g., 
addGroup().
Finally, there is a addGroup(String, String) variant, which behaves like 
a key-value pair (and similarly to the ComponentMetricGroup). The key 
part is added to the logical scope, and a label is usually added as well.


Due to historical reasons some parts in Flink use addGroup(String) 
despite the key-value pair variant being more appropriate; the latter 
was only added later, as was the logical scope as a whole for that matter.


With that said, the logical scope and labels suffer a bit due to being 
retrofitted on an existing design and some early mistakes in the metric 
structuring.
Ideally (imo), things would work like this (*bold *parts signify changes 
to the current behavior):
- addGroup(String) is *sparsely used* and only for high-level 
hierarchies (job, operator, source, kafka). It is added as is to the 
logical scope, creates no label, and is *excluded from the metric 
identifier*.
- addGroup(String, String) has *no effect on the logical scope*, creates 
a label, and is added as . to the metric identifier.


The core issue with these kind of changes however is backwards 
compatibility. We would have to do a sweep over the code-base to migrate 
inappropriate usages of addGroup(String) to the key-pair variant, 
probably remove some unnecessary groups (e.g., "Status" that is used for 
CPU metrics and whatnot) and finally make changes to the metric system 
internals, all of which need a codepath that retain the current behavior.


Simply put, for immediate needs I would probably encourage you do create 
a modified PrometheusReporter which determines the logical scope as you 
see fit; it could just ignore the logical scope entirely (although I'm 
not sure how well prometheus handles 1 metric having multiple instances 
with different label sets (e.g., numRecordsIn for operators/tasks), or 
exclude user-defined groups with something hacky like only using the 
first 4 parts of the logical scope.


On 6/1/2021 4:56 PM, Mason Chen wrote:
Upon further inspection, it seems like the user scope is not universal 
(i.e. comes through the connectors and not UDFs (like rich map 
function)), but the question still stands if the process makes sense.


On Jun 1, 2021, at 10:38 AM, Mason Chen > wrote:


Makes sense. We are primarily concerned with removing the metric 
labels from the names as the user metrics get too long. i.e. the 
groups from `addGroup` are concatenated in the metric name.


Do you think there would be any issues with removing the group 
information in the metric name and putting them into a label instead? 
In seems like most metrics internally, don’t use `addGroup` to create 
group information but rather by creating another subclass of metric 
group.


Perhaps, I should ONLY apply this custom logic to metrics with the 
“user” scope? Other scoped metrics (e.g. operator, task operator, 
etc.) shouldn’t have these group names in the metric names in my 
experience...


An example just for clarity, 
flink__group1_group2_metricName{group1=…, group2=…, 
flink tags}


=>
flink__metricName{group_info=group1_group2, group1=…, 
group2=…, flink tags}


On Jun 1, 2021, at 9:57 AM, Chesnay Schepler > wrote:


The uniqueness of metrics and the naming of the Prometheus reporter 
are somewhat related but also somewhat orthogonal.


Prometheus works similar to JMX in that the metric name (e.g., 
taskmanager.job.task.operator.numRecordsIn) is more or less a 
_class_ of metrics, with tags/labels allowing you to select a 
specific instance of that metric.


Restricting metric names to 1 level of the hierarchy would present a 
few issues:
a) Effectively, all metric names that Flink uses effectively become 
reserved keywords that users must not use, which will lead to 
headaches when adding more metrics or forwarding metrics from 
libraries (e.g., kafka), because we could always break existing 
user-defined metrics.
b) You'd need a cluster-wide lookup that is aware of all hierarchies 
to ensure consistency across all processes.


In the end, there are significantly easier ways to solve the issue 
of the metric name being too long, i.e., give the user more control 
over the logical scope (taskmanager.job.task.operator), be it 
shortening the names (t.j.t.o), limiting the depth (e.g, 
operator.numRecordsIn), removing it outright (but I'd prefer

Re: Flink Metrics Naming

2021-06-01 Thread Mason Chen

Upon further inspection, it seems like the user scope is not universal (i.e. 
comes through the connectors and not UDFs (like rich map function)), but the 
question still stands if the process makes sense.

> On Jun 1, 2021, at 10:38 AM, Mason Chen  wrote:
> 
> Makes sense. We are primarily concerned with removing the metric labels from 
> the names as the user metrics get too long. i.e. the groups from `addGroup` 
> are concatenated in the metric name.
> 
> Do you think there would be any issues with removing the group information in 
> the metric name and putting them into a label instead? In seems like most 
> metrics internally, don’t use `addGroup` to create group information but 
> rather by creating another subclass of metric group.
> 
> Perhaps, I should ONLY apply this custom logic to metrics with the “user” 
> scope? Other scoped metrics (e.g. operator, task operator, etc.) shouldn’t 
> have these group names in the metric names in my experience...
> 
> An example just for clarity, 
> flink__group1_group2_metricName{group1=…, group2=…, flink tags}
> 
> =>
> 
> flink__metricName{group_info=group1_group2, group1=…, group2=…, 
> flink tags}
> 
>> On Jun 1, 2021, at 9:57 AM, Chesnay Schepler > > wrote:
>> 
>> The uniqueness of metrics and the naming of the Prometheus reporter are 
>> somewhat related but also somewhat orthogonal.
>> 
>> Prometheus works similar to JMX in that the metric name (e.g., 
>> taskmanager.job.task.operator.numRecordsIn) is more or less a _class_ of 
>> metrics, with tags/labels allowing you to select a specific instance of that 
>> metric.
>> 
>> Restricting metric names to 1 level of the hierarchy would present a few 
>> issues:
>> a) Effectively, all metric names that Flink uses effectively become reserved 
>> keywords that users must not use, which will lead to headaches when adding 
>> more metrics or forwarding metrics from libraries (e.g., kafka), because we 
>> could always break existing user-defined metrics.
>> b) You'd need a cluster-wide lookup that is aware of all hierarchies to 
>> ensure consistency across all processes.
>> 
>> In the end, there are significantly easier ways to solve the issue of the 
>> metric name being too long, i.e., give the user more control over the 
>> logical scope (taskmanager.job.task.operator), be it shortening the names 
>> (t.j.t.o), limiting the depth (e.g, operator.numRecordsIn), removing it 
>> outright (but I'd prefer some context to be present for clarity) or 
>> supporting something similar to scope formats.
>> I'm reasonably certain there are some tickets already in this direction, we 
>> just don't get around to doing them because for the most part the metric 
>> system works good enough and there are bigger fish to fry.
>> 
>> On 6/1/2021 3:39 PM, Till Rohrmann wrote:
>>> Hi Mason,
>>> 
>>> The idea is that a metric is not uniquely identified by its name alone but 
>>> instead by its path. The groups in which it is defined specify this path 
>>> (similar to directories). That's why it is valid to specify two metrics 
>>> with the same name if they reside in different groups.
>>> 
>>> I think Prometheus does not support such a tree structure and that's why 
>>> the path is exposed via labels if I am not mistaken. So long story short, 
>>> what you are seeing is a combination of how Flink organizes metrics and 
>>> what can be reported to Prometheus. 
>>> 
>>> I am also pulling in Chesnay who is more familiar with this part of the 
>>> code.
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On Fri, May 28, 2021 at 7:33 PM Mason Chen >> > wrote:
>>> Can anyone give insight as to why Flink allows 2 metrics with the same 
>>> “name”?
>>> 
>>> For example,
>>> 
>>> getRuntimeContext.addGroup(“group”, “group1”).counter(“myMetricName”);
>>> 
>>> And
>>> 
>>> getRuntimeContext.addGroup(“other_group”, 
>>> “other_group1”).counter(“myMetricName”);
>>> 
>>> Are totally valid.
>>> 
>>> 
>>> It seems that it has lead to some not-so-great implementations—the 
>>> prometheus reporter and attaching the labels to the metric name, making the 
>>> name quite verbose.
>>> 
>>> 
>> 
>

Re: Flink Metrics Naming

2021-06-01 Thread Mason Chen

Makes sense. We are primarily concerned with removing the metric labels from 
the names as the user metrics get too long. i.e. the groups from `addGroup` are 
concatenated in the metric name.

Do you think there would be any issues with removing the group information in 
the metric name and putting them into a label instead? In seems like most 
metrics internally, don’t use `addGroup` to create group information but rather 
by creating another subclass of metric group.

Perhaps, I should ONLY apply this custom logic to metrics with the “user” 
scope? Other scoped metrics (e.g. operator, task operator, etc.) shouldn’t have 
these group names in the metric names in my experience...

An example just for clarity, 
flink__group1_group2_metricName{group1=…, group2=…, flink tags}

=>

flink__metricName{group_info=group1_group2, group1=…, group2=…, 
flink tags}

> On Jun 1, 2021, at 9:57 AM, Chesnay Schepler  wrote:
> 
> The uniqueness of metrics and the naming of the Prometheus reporter are 
> somewhat related but also somewhat orthogonal.
> 
> Prometheus works similar to JMX in that the metric name (e.g., 
> taskmanager.job.task.operator.numRecordsIn) is more or less a _class_ of 
> metrics, with tags/labels allowing you to select a specific instance of that 
> metric.
> 
> Restricting metric names to 1 level of the hierarchy would present a few 
> issues:
> a) Effectively, all metric names that Flink uses effectively become reserved 
> keywords that users must not use, which will lead to headaches when adding 
> more metrics or forwarding metrics from libraries (e.g., kafka), because we 
> could always break existing user-defined metrics.
> b) You'd need a cluster-wide lookup that is aware of all hierarchies to 
> ensure consistency across all processes.
> 
> In the end, there are significantly easier ways to solve the issue of the 
> metric name being too long, i.e., give the user more control over the logical 
> scope (taskmanager.job.task.operator), be it shortening the names (t.j.t.o), 
> limiting the depth (e.g, operator.numRecordsIn), removing it outright (but 
> I'd prefer some context to be present for clarity) or supporting something 
> similar to scope formats.
> I'm reasonably certain there are some tickets already in this direction, we 
> just don't get around to doing them because for the most part the metric 
> system works good enough and there are bigger fish to fry.
> 
> On 6/1/2021 3:39 PM, Till Rohrmann wrote:
>> Hi Mason,
>> 
>> The idea is that a metric is not uniquely identified by its name alone but 
>> instead by its path. The groups in which it is defined specify this path 
>> (similar to directories). That's why it is valid to specify two metrics with 
>> the same name if they reside in different groups.
>> 
>> I think Prometheus does not support such a tree structure and that's why the 
>> path is exposed via labels if I am not mistaken. So long story short, what 
>> you are seeing is a combination of how Flink organizes metrics and what can 
>> be reported to Prometheus. 
>> 
>> I am also pulling in Chesnay who is more familiar with this part of the code.
>> 
>> Cheers,
>> Till
>> 
>> On Fri, May 28, 2021 at 7:33 PM Mason Chen > > wrote:
>> Can anyone give insight as to why Flink allows 2 metrics with the same 
>> “name”?
>> 
>> For example,
>> 
>> getRuntimeContext.addGroup(“group”, “group1”).counter(“myMetricName”);
>> 
>> And
>> 
>> getRuntimeContext.addGroup(“other_group”, 
>> “other_group1”).counter(“myMetricName”);
>> 
>> Are totally valid.
>> 
>> 
>> It seems that it has lead to some not-so-great implementations—the 
>> prometheus reporter and attaching the labels to the metric name, making the 
>> name quite verbose.
>> 
>> 
>

Re: Flink Metrics Naming

2021-06-01 Thread Chesnay Schepler

The uniqueness of metrics and the naming of the Prometheus reporter are 
somewhat related but also somewhat orthogonal.


Prometheus works similar to JMX in that the metric name (e.g., 
taskmanager.job.task.operator.numRecordsIn) is more or less a _class_ of 
metrics, with tags/labels allowing you to select a specific instance of 
that metric.


Restricting metric names to 1 level of the hierarchy would present a few 
issues:
a) Effectively, all metric names that Flink uses effectively become 
reserved keywords that users must not use, which will lead to headaches 
when adding more metrics or forwarding metrics from libraries (e.g., 
kafka), because we could always break existing user-defined metrics.
b) You'd need a cluster-wide lookup that is aware of all hierarchies to 
ensure consistency across all processes.


In the end, there are significantly easier ways to solve the issue of 
the metric name being too long, i.e., give the user more control over 
the logical scope (taskmanager.job.task.operator), be it shortening the 
names (t.j.t.o), limiting the depth (e.g, operator.numRecordsIn), 
removing it outright (but I'd prefer some context to be present for 
clarity) or supporting something similar to scope formats.
I'm reasonably certain there are some tickets already in this direction, 
we just don't get around to doing them because for the most part the 
metric system works good enough and there are bigger fish to fry.


On 6/1/2021 3:39 PM, Till Rohrmann wrote:

Hi Mason,

The idea is that a metric is not uniquely identified by its name alone 
but instead by its path. The groups in which it is defined specify 
this path (similar to directories). That's why it is valid to specify 
two metrics with the same name if they reside in different groups.


I think Prometheus does not support such a tree structure and that's 
why the path is exposed via labels if I am not mistaken. So long story 
short, what you are seeing is a combination of how Flink organizes 
metrics and what can be reported to Prometheus.


I am also pulling in Chesnay who is more familiar with this part of 
the code.


Cheers,
Till

On Fri, May 28, 2021 at 7:33 PM Mason Chen > wrote:


Can anyone give insight as to why Flink allows 2 metrics with the
same “name”?

For example,

getRuntimeContext.addGroup(“group”, “group1”).counter(“myMetricName”);

And

getRuntimeContext.addGroup(“other_group”,
“other_group1”).counter(“myMetricName”);

Are totally valid.


It seems that it has lead to some not-so-great implementations—the
prometheus reporter and attaching the labels to the metric name,
making the name quite verbose.

Re: Flink Metrics Naming

2021-06-01 Thread Till Rohrmann

Hi Mason,

The idea is that a metric is not uniquely identified by its name alone but
instead by its path. The groups in which it is defined specify this path
(similar to directories). That's why it is valid to specify two metrics
with the same name if they reside in different groups.

I think Prometheus does not support such a tree structure and that's why
the path is exposed via labels if I am not mistaken. So long story short,
what you are seeing is a combination of how Flink organizes metrics and
what can be reported to Prometheus.

I am also pulling in Chesnay who is more familiar with this part of the
code.

Cheers,
Till

On Fri, May 28, 2021 at 7:33 PM Mason Chen  wrote:

> Can anyone give insight as to why Flink allows 2 metrics with the same
> “name”?
>
> For example,
>
> getRuntimeContext.addGroup(“group”, “group1”).counter(“myMetricName”);
>
> And
>
> getRuntimeContext.addGroup(“other_group”,
> “other_group1”).counter(“myMetricName”);
>
> Are totally valid.
>
>
> It seems that it has lead to some not-so-great implementations—the
> prometheus reporter and attaching the labels to the metric name, making the
> name quite verbose.
>
>
>

Flink Metrics Naming

2021-05-28 Thread Mason Chen

Can anyone give insight as to why Flink allows 2 metrics with the same “name”?

For example,

getRuntimeContext.addGroup(“group”, “group1”).counter(“myMetricName”);

And

getRuntimeContext.addGroup(“other_group”, 
“other_group1”).counter(“myMetricName”);

Are totally valid.


It seems that it has lead to some not-so-great implementations—the prometheus 
reporter and attaching the labels to the metric name, making the name quite 
verbose.

Re: Flink Metrics emitted from a Kubernetes Application Cluster

2021-04-09 Thread Chesnay Schepler


This is currently not possible. See also FLINK-8358

On 4/9/2021 4:47 AM, Claude M wrote:

Hello,

I've setup Flink as an Application Cluster in Kubernetes. Now I'm 
looking into monitoring the Flink cluster in Datadog. This is what is 
configured in the flink-conf.yaml to emit metrics:


metrics.scope.jm : flink.jobmanager
metrics.scope.jm.job: flink.jobmanager.job
metrics.scope.tm : flink.taskmanager
metrics.scope.tm.job: flink.taskmanager.job
metrics.scope.task: flink.task
metrics.scope.operator: flink.operator
metrics.reporter.dghttp.class: 
org.apache.flink.metrics.datadog.DatadogHttpReporter

metrics.reporter.dghttp.apikey: {{ datadog_api_key }}
metrics.reporter.dghttp.tags: environment: {{ environment }}

When it gets to Datadog though, the metrics for the flink.jobmanager 
and flink.taskmanager is filtered by the host which is the Pod IP.  
However, I would like it to use the pod name.  How can this be 
accomplished?



Thanks

Flink Metrics emitted from a Kubernetes Application Cluster

2021-04-08 Thread Claude M

Hello,

I've setup Flink as an Application Cluster in Kubernetes.  Now I'm looking
into monitoring the Flink cluster in Datadog.  This is what is configured
in the flink-conf.yaml to emit metrics:

metrics.scope.jm: flink.jobmanager
metrics.scope.jm.job: flink.jobmanager.job
metrics.scope.tm: flink.taskmanager
metrics.scope.tm.job: flink.taskmanager.job
metrics.scope.task: flink.task
metrics.scope.operator: flink.operator
metrics.reporter.dghttp.class:
org.apache.flink.metrics.datadog.DatadogHttpReporter
metrics.reporter.dghttp.apikey: {{ datadog_api_key }}
metrics.reporter.dghttp.tags: environment: {{ environment }}

When it gets to Datadog though, the metrics for the flink.jobmanager and
flink.taskmanager is filtered by the host which is the Pod IP.  However, I
would like it to use the pod name.  How can this be accomplished?


Thanks

Re: Flink Metrics

2021-03-03 Thread Piotr Nowojski

Hi,

1)
Do you want to output those metrics as Flink metrics? Or output those
"metrics"/counters as values to some external system (like Kafka)? The
problem discussed in [1], was that the metrics (Counters) were not fitting
in memory, so David suggested to hold them on Flink's state and treat the
measured values as regular output of the job.

The former option you can think of if you had a single operator, that
consumes your CDCs outputs something (filtered CDCs? processed CDCs?) to
Kafka, while keeping some metrics that you can access via Flink metrics
system. The latter would be the same operator, but instead of single output
it would have multiple outputs, writing the "counters" also for example to
Kafka (or any other system of your choice). Both options are viable, each
has its own pros and cons.

2) You need to persist your metrics somewhere. Why don't you use Flink's
state for that purpose? Upon recovery/initialisation, you can get the
recovered value from state and update/set metric value to that recovered
value.

3) That seems to be a question a bit unrelated to Flink. Try searching
online how to calculate percentiles. I haven't thought about it, but
histograms or sorting all of the values seems to be the options. Probably
best if you would use some existing library to do that for you.

4) Could you rephrase your question?

Best,
Piotrek

niedz., 28 lut 2021 o 14:53 Prasanna kumar 
napisał(a):

> Hi flinksters,
>
> Scenario: We have cdc messages from our rdbms(various tables) flowing to
> Kafka.  Our flink job reads the CDC messages and creates events based on
> certain rules.
>
> I am using Prometheus  and grafana.
>
> Following are there metrics that i need to calculate
>
> A) Number of CDC messages wrt to each table.
> B) Number of events created wrt to each event type.
> C) Average/P99/P95 Latency (event created ts - ccd operation ts)
>
> For A and B, I created counters and able to see the metrices flowing into
> Prometheus . Few questions I have here.
>
> 1) How to create labels for counters in flink ? I did not find any easier
> method to do it . Right now I see that I need to create counters for each
> type of table and events . I referred to one of the community discussions.
> [1] . Is there any way apart from this ?
>
> 2) When the job gets restarted , the counters get back to 0 . How to
> prevent that and to get continuity.
>
> For C , I calculated latency in code for each event and assigned  it to
> histogram.  Few questions I have here.
>
> 3) I read in a few blogs [2] that histogram is the best way to get
> latencies. Is there any better idea?
>
> 4) How to create buckets for various ranges? I also read in a community
> email that flink implements  histogram as summaries.  I also should be able
> to see the latencies across timelines .
>
> [1]
> https://stackoverflow.com/questions/58456830/how-to-use-multiple-counters-in-flink
> [2] https://povilasv.me/prometheus-tracking-request-duration/
>
> Thanks,
> Prasanna.
>

Flink Metrics

2021-02-28 Thread Prasanna kumar

Hi flinksters,

Scenario: We have cdc messages from our rdbms(various tables) flowing to
Kafka.  Our flink job reads the CDC messages and creates events based on
certain rules.

I am using Prometheus  and grafana.

Following are there metrics that i need to calculate

A) Number of CDC messages wrt to each table.
B) Number of events created wrt to each event type.
C) Average/P99/P95 Latency (event created ts - ccd operation ts)

For A and B, I created counters and able to see the metrices flowing into
Prometheus . Few questions I have here.

1) How to create labels for counters in flink ? I did not find any easier
method to do it . Right now I see that I need to create counters for each
type of table and events . I referred to one of the community discussions.
[1] . Is there any way apart from this ?

2) When the job gets restarted , the counters get back to 0 . How to
prevent that and to get continuity.

For C , I calculated latency in code for each event and assigned  it to
histogram.  Few questions I have here.

3) I read in a few blogs [2] that histogram is the best way to get
latencies. Is there any better idea?

4) How to create buckets for various ranges? I also read in a community
email that flink implements  histogram as summaries.  I also should be able
to see the latencies across timelines .

[1]
https://stackoverflow.com/questions/58456830/how-to-use-multiple-counters-in-flink
[2] https://povilasv.me/prometheus-tracking-request-duration/

Thanks,
Prasanna.

Re: Tag flink metrics to job name

2021-02-19 Thread Chesnay Schepler

hmm...in a roundabout way this could be possible I suppose.

For a given job, search through your metrics for some job metric (like 
numRestarts on the JM, or any task metric for TMs), and from that you 
should be able to infer the JM/TM that belongs to that (based on the TM 
ID / host information in the metric).
Conversely, when you see high cpu usage in one of the metrics for a 
JM/TM, search for a job metric for that same process.

On 2/19/2021 9:14 AM, bat man wrote:
Is there a way I can look into say for a specific job what’s the cpu 
usage or memory usage of the yarn containers when multiple jobs are 
running on the same cluster.
Also, the issue am trying to resolve is I’m seeing high memory usage 
for one of the containers I want isolate the issue with one job and 
then investigate further.

Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:18 PM, Chesnay Schepler > wrote:

No, Job-/TaskManager metrics cannot be tagged with the job name.
The reason is that this only makes sense for application clusters
(opposed to session clusters), but we don't differentiate between
the two when it comes to metrics.

On 2/19/2021 3:59 AM, bat man wrote:

I meant the Flink jobname. I’m using the below reporter -
||
|metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter|
Is there any way to tag job names to the task and job manager
metrics.

Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler
mailto:ches...@apache.org>> wrote:

When you mean "job_name", are you referring to the Prometheus
concept of
jobs, of the one of Flink?

Which of Flink prometheus reporters are you using?

On 2/17/2021 7:37 PM, bat man wrote:
> Hello there,
>
> I am using prometheus to push metrics to prometheus and
then use
> grafana for visualization. There are metrics like
>
- flink_taskmanager_Status_JVM_CPU_Load, 
flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time

> etc which do not gives job_name. It is tied to an instance.
> When running multiple jobs in the same yarn cluster it is
possible
> that different jobs have yarn containers on the same
instance, in this
> case it is very difficult to find out which instance has
high CPU
> load, Memory usage etc.
>
> Is there a way to tag job_name to these metrics so that the
metrics
> could be visualized per job.
>
> Thanks,
> Hemant

Re: Tag flink metrics to job name

2021-02-19 Thread bat man

Is there a way I can look into say for a specific job what’s the cpu usage
or memory usage of the yarn containers when multiple jobs are running on
the same cluster.
Also, the issue am trying to resolve is I’m seeing high memory usage for
one of the containers I want isolate the issue with one job and then
investigate further.

Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:18 PM, Chesnay Schepler 
wrote:

> No, Job-/TaskManager metrics cannot be tagged with the job name.
> The reason is that this only makes sense for application clusters (opposed
> to session clusters), but we don't differentiate between the two when it
> comes to metrics.
>
> On 2/19/2021 3:59 AM, bat man wrote:
>
> I meant the Flink jobname. I’m using the below reporter -
>
>  metrics.reporter.prom.class: 
> org.apache.flink.metrics.prometheus.PrometheusReporter
>
> Is there any way to tag job names to the task and job manager metrics.
>
> Thanks,
> Hemant
>
> On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler 
> wrote:
>
>> When you mean "job_name", are you referring to the Prometheus concept of
>> jobs, of the one of Flink?
>>
>> Which of Flink prometheus reporters are you using?
>>
>> On 2/17/2021 7:37 PM, bat man wrote:
>> > Hello there,
>> >
>> > I am using prometheus to push metrics to prometheus and then use
>> > grafana for visualization. There are metrics like
>> >
>> - flink_taskmanager_Status_JVM_CPU_Load, 
>> flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time
>>
>> > etc which do not gives job_name. It is tied to an instance.
>> > When running multiple jobs in the same yarn cluster it is possible
>> > that different jobs have yarn containers on the same instance, in this
>> > case it is very difficult to find out which instance has high CPU
>> > load, Memory usage etc.
>> >
>> > Is there a way to tag job_name to these metrics so that the metrics
>> > could be visualized per job.
>> >
>> > Thanks,
>> > Hemant
>>
>>
>>
>

Re: Tag flink metrics to job name

2021-02-18 Thread Chesnay Schepler

No, Job-/TaskManager metrics cannot be tagged with the job name.
The reason is that this only makes sense for application clusters 
(opposed to session clusters), but we don't differentiate between the 
two when it comes to metrics.

On 2/19/2021 3:59 AM, bat man wrote:

I meant the Flink jobname. I’m using the below reporter -
||
|metrics.reporter.prom.class: 
org.apache.flink.metrics.prometheus.PrometheusReporter|

Is there any way to tag job names to the task and job manager metrics.

Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler > wrote:

When you mean "job_name", are you referring to the Prometheus
concept of
jobs, of the one of Flink?

Which of Flink prometheus reporters are you using?

On 2/17/2021 7:37 PM, bat man wrote:
> Hello there,
>
> I am using prometheus to push metrics to prometheus and then use
> grafana for visualization. There are metrics like
>
- flink_taskmanager_Status_JVM_CPU_Load, 
flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time

> etc which do not gives job_name. It is tied to an instance.
> When running multiple jobs in the same yarn cluster it is possible
> that different jobs have yarn containers on the same instance,
in this
> case it is very difficult to find out which instance has high CPU
> load, Memory usage etc.
>
> Is there a way to tag job_name to these metrics so that the metrics
> could be visualized per job.
>
> Thanks,
> Hemant

Re: Tag flink metrics to job name

2021-02-18 Thread bat man

I meant the Flink jobname. I’m using the below reporter -


metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter

Is there any way to tag job names to the task and job manager metrics.

Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler 
wrote:

> When you mean "job_name", are you referring to the Prometheus concept of
> jobs, of the one of Flink?
>
> Which of Flink prometheus reporters are you using?
>
> On 2/17/2021 7:37 PM, bat man wrote:
> > Hello there,
> >
> > I am using prometheus to push metrics to prometheus and then use
> > grafana for visualization. There are metrics like
> >
> - flink_taskmanager_Status_JVM_CPU_Load, 
> flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time
>
> > etc which do not gives job_name. It is tied to an instance.
> > When running multiple jobs in the same yarn cluster it is possible
> > that different jobs have yarn containers on the same instance, in this
> > case it is very difficult to find out which instance has high CPU
> > load, Memory usage etc.
> >
> > Is there a way to tag job_name to these metrics so that the metrics
> > could be visualized per job.
> >
> > Thanks,
> > Hemant
>
>
>

Re: Tag flink metrics to job name

2021-02-18 Thread Chesnay Schepler

When you mean "job_name", are you referring to the Prometheus concept of 
jobs, of the one of Flink?


Which of Flink prometheus reporters are you using?

On 2/17/2021 7:37 PM, bat man wrote:

Hello there,

I am using prometheus to push metrics to prometheus and then use 
grafana for visualization. There are metrics like 
- flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time 
etc which do not gives job_name. It is tied to an instance.
When running multiple jobs in the same yarn cluster it is possible 
that different jobs have yarn containers on the same instance, in this 
case it is very difficult to find out which instance has high CPU 
load, Memory usage etc.


Is there a way to tag job_name to these metrics so that the metrics 
could be visualized per job.


Thanks,
Hemant

Tag flink metrics to job name

2021-02-17 Thread bat man

Hello there,

I am using prometheus to push metrics to prometheus and then use grafana
for visualization. There are metrics like
- flink_taskmanager_Status_JVM_CPU_Load,
flink_taskmanager_Status_JVM_CPU_Load,
flink_taskmanager_Status_JVM_CPU_Time
etc which do not gives job_name. It is tied to an instance.
When running multiple jobs in the same yarn cluster it is possible that
different jobs have yarn containers on the same instance, in this case it
is very difficult to find out which instance has high CPU load, Memory
usage etc.

Is there a way to tag job_name to these metrics so that the metrics could
be visualized per job.

Thanks,
Hemant

Re: Default Flink Metrics Graphite

2020-09-03 Thread Till Rohrmann

ime.metrics.groups.OperatorMetricGroup.(OperatorMetricGroup.java:48)
>>>>> at
>>>>> org.apache.flink.runtime.metrics.groups.TaskMetricGroup.lambda$getOrAddOperator$0(TaskMetricGroup.java:154)
>>>>> at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
>>>>> at
>>>>> org.apache.flink.runtime.metrics.groups.TaskMetricGroup.getOrAddOperator(TaskMetricGroup.java:154)
>>>>> at
>>>>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.setup(AbstractStreamOperator.java:180)
>>>>> at
>>>>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
>>>>> at
>>>>> org.apache.flink.streaming.api.operators.SimpleOperatorFactory.createStreamOperator(SimpleOperatorFactory.java:75)
>>>>> at
>>>>> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:48)
>>>>> at
>>>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:429)
>>>>> at
>>>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:353)
>>>>> at
>>>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:144)
>>>>> at
>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:433)
>>>>> at
>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
>>>>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
>>>>> at java.lang.Thread.run(Thread.java:748)
>>>>> Regards,
>>>>> Vijay
>>>>>
>>>>>
>>>>> On Wed, Aug 26, 2020 at 7:53 AM Chesnay Schepler 
>>>>> wrote:
>>>>>
>>>>>> metrics.reporter.grph.class:
>>>>>> org.apache.flink.metrics.graphite.GraphiteReporter
>>>>>>
>>>>>>
>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter
>>>>>>
>>>>>> On 26/08/2020 16:40, Vijayendra Yadav wrote:
>>>>>>
>>>>>> Hi Dawid,
>>>>>>
>>>>>> I have 1.10.0 version of flink. What is alternative for this version ?
>>>>>>
>>>>>> Regards,
>>>>>> Vijay
>>>>>>
>>>>>>
>>>>>> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz
>>>>>>   wrote:
>>>>>>
>>>>>> 
>>>>>>
>>>>>> Hi Vijay,
>>>>>>
>>>>>> I think the problem might be that you are using a wrong version of
>>>>>> the reporter.
>>>>>>
>>>>>> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a
>>>>>> plugin, but it was migrated to plugins in 1.11 only[1].
>>>>>>
>>>>>> I'd recommend trying it out with the same 1.11 version of Flink and
>>>>>> Graphite reporter.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Dawid
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-16965
>>>>>> On 26/08/2020 08:04, Vijayendra Yadav wrote:
>>>>>>
>>>>>> Hi Nikola,
>>>>>>
>>>>>> To rule out any other cluster issues, I have tried it in my local
>>>>>> now. Steps as follows, but don't see any metrics yet.
>>>>>>
>>>>>> 1) Set up local Graphite
>>>>>>
>>>>>> docker run -d\
>>>>>>  --name graphite\
>>>>>>  --restart=always\
>>>>>>  -p 80:80\
>>>>>>  -p 2003-2004:2003-2004\
>>>>>>  -p 2023-2024:2023-2024\
>>>>>>  -p 8125:8125/udp\
>>>>>>  -p 8126:8126\
>>>>>>  graphiteapp/graphite-statsd
>>>>>>
>>>>>> Mapped Ports
>>>>>> Host Container Service
>>>>>> 80 80 nginx <https://www.nginx.com/resources/admin-guide/>
>>>>>> 2003 2003 carbon receiver - plaintext
>>>>>

Re: Default Flink Metrics Graphite

2020-09-02 Thread Vijayendra Yadav

>>>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
>>>> at
>>>> org.apache.flink.streaming.api.operators.SimpleOperatorFactory.createStreamOperator(SimpleOperatorFactory.java:75)
>>>> at
>>>> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:48)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:429)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:353)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:144)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:433)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
>>>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
>>>> at java.lang.Thread.run(Thread.java:748)
>>>> Regards,
>>>> Vijay
>>>>
>>>>
>>>> On Wed, Aug 26, 2020 at 7:53 AM Chesnay Schepler 
>>>> wrote:
>>>>
>>>>> metrics.reporter.grph.class:
>>>>> org.apache.flink.metrics.graphite.GraphiteReporter
>>>>>
>>>>>
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter
>>>>>
>>>>> On 26/08/2020 16:40, Vijayendra Yadav wrote:
>>>>>
>>>>> Hi Dawid,
>>>>>
>>>>> I have 1.10.0 version of flink. What is alternative for this version ?
>>>>>
>>>>> Regards,
>>>>> Vijay
>>>>>
>>>>>
>>>>> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz
>>>>>   wrote:
>>>>>
>>>>> 
>>>>>
>>>>> Hi Vijay,
>>>>>
>>>>> I think the problem might be that you are using a wrong version of the
>>>>> reporter.
>>>>>
>>>>> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a
>>>>> plugin, but it was migrated to plugins in 1.11 only[1].
>>>>>
>>>>> I'd recommend trying it out with the same 1.11 version of Flink and
>>>>> Graphite reporter.
>>>>>
>>>>> Best,
>>>>>
>>>>> Dawid
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/FLINK-16965
>>>>> On 26/08/2020 08:04, Vijayendra Yadav wrote:
>>>>>
>>>>> Hi Nikola,
>>>>>
>>>>> To rule out any other cluster issues, I have tried it in my local now.
>>>>> Steps as follows, but don't see any metrics yet.
>>>>>
>>>>> 1) Set up local Graphite
>>>>>
>>>>> docker run -d\
>>>>>  --name graphite\
>>>>>  --restart=always\
>>>>>  -p 80:80\
>>>>>  -p 2003-2004:2003-2004\
>>>>>  -p 2023-2024:2023-2024\
>>>>>  -p 8125:8125/udp\
>>>>>  -p 8126:8126\
>>>>>  graphiteapp/graphite-statsd
>>>>>
>>>>> Mapped Ports
>>>>> Host Container Service
>>>>> 80 80 nginx <https://www.nginx.com/resources/admin-guide/>
>>>>> 2003 2003 carbon receiver - plaintext
>>>>> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol>
>>>>> 2004 2004 carbon receiver - pickle
>>>>> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol>
>>>>> 2023 2023 carbon aggregator - plaintext
>>>>> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>>>>> 2024 2024 carbon aggregator - pickle
>>>>> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>>>>> 8080 8080 Graphite internal gunicorn port (without Nginx proxying).
>>>>> 8125 8125 statsd
>>>>> <https://github.com/etsy/statsd/blob/master/docs/server.md>
>>>>> 8126 8126 statsd admin
>>>>> <https://github.com/etsy/statsd/blob/master/docs/admin

Re: Default Flink Metrics Graphite

2020-09-02 Thread Till Rohrmann

>>
>>> On Wed, Aug 26, 2020 at 7:53 AM Chesnay Schepler 
>>> wrote:
>>>
>>>> metrics.reporter.grph.class:
>>>> org.apache.flink.metrics.graphite.GraphiteReporter
>>>>
>>>>
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter
>>>>
>>>> On 26/08/2020 16:40, Vijayendra Yadav wrote:
>>>>
>>>> Hi Dawid,
>>>>
>>>> I have 1.10.0 version of flink. What is alternative for this version ?
>>>>
>>>> Regards,
>>>> Vijay
>>>>
>>>>
>>>> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz 
>>>>  wrote:
>>>>
>>>> 
>>>>
>>>> Hi Vijay,
>>>>
>>>> I think the problem might be that you are using a wrong version of the
>>>> reporter.
>>>>
>>>> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a
>>>> plugin, but it was migrated to plugins in 1.11 only[1].
>>>>
>>>> I'd recommend trying it out with the same 1.11 version of Flink and
>>>> Graphite reporter.
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>> [1] https://issues.apache.org/jira/browse/FLINK-16965
>>>> On 26/08/2020 08:04, Vijayendra Yadav wrote:
>>>>
>>>> Hi Nikola,
>>>>
>>>> To rule out any other cluster issues, I have tried it in my local now.
>>>> Steps as follows, but don't see any metrics yet.
>>>>
>>>> 1) Set up local Graphite
>>>>
>>>> docker run -d\
>>>>  --name graphite\
>>>>  --restart=always\
>>>>  -p 80:80\
>>>>  -p 2003-2004:2003-2004\
>>>>  -p 2023-2024:2023-2024\
>>>>  -p 8125:8125/udp\
>>>>  -p 8126:8126\
>>>>  graphiteapp/graphite-statsd
>>>>
>>>> Mapped Ports
>>>> Host Container Service
>>>> 80 80 nginx <https://www.nginx.com/resources/admin-guide/>
>>>> 2003 2003 carbon receiver - plaintext
>>>> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol>
>>>> 2004 2004 carbon receiver - pickle
>>>> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol>
>>>> 2023 2023 carbon aggregator - plaintext
>>>> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>>>> 2024 2024 carbon aggregator - pickle
>>>> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>>>> 8080 8080 Graphite internal gunicorn port (without Nginx proxying).
>>>> 8125 8125 statsd
>>>> <https://github.com/etsy/statsd/blob/master/docs/server.md>
>>>> 8126 8126 statsd admin
>>>> <https://github.com/etsy/statsd/blob/master/docs/admin_interface.md>
>>>> 2) WebUI:
>>>>
>>>> 
>>>>
>>>>
>>>>
>>>> 3) Run Flink example Job.
>>>> ./bin/flink run
>>>> ./examples/flink-examples-streaming_2.11-1.11-SNAPSHOT-SocketWindowWordCount.jar
>>>> --port 
>>>>
>>>> with conf/flink-conf.yaml set as:
>>>>
>>>> metrics.reporter.grph.factory.class:
>>>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>>>> metrics.reporter.grph.host: localhost
>>>> metrics.reporter.grph.port: 2003
>>>> metrics.reporter.grph.protocol: TCP
>>>> metrics.reporter.grph.interval: 1 SECONDS
>>>>
>>>> and graphite jar:
>>>>
>>>> plugins/flink-metrics-graphite/flink-metrics-graphite-1.10.0.jar
>>>>
>>>>
>>>> 4) Can't see any activity in webui graphite.
>>>>
>>>>
>>>> Could you review and let me know what is wrong here ? any other way you
>>>> suggest to be able to view the raw metrics data ?
>>>> Also, do you have sample metrics raw format, you can share from any
>>>> other project.
>>>>
>>>> Regards,
>>>> Vijay
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Aug 23, 2020 at 9:26 PM Nikola Hrusov 
>>>> wrote:
>>>>
>>>>> Hi Vijay,
>>>>>
>>>>> Your steps look correct to me.
>>>>> Perhaps you can double check that the graphite port you are sending is
>>>>> correct? THe default carbon port is 2003 and if you use the aggregator it
>>>>> is 2023.
>>>>>
>>>>> You should be able to see in both flink jobmanager and taskmanager
>>>>> that the metrics have been initialized with the config you have pasted.
>>>>>
>>>>> Regards
>>>>> ,
>>>>> Nikola Hrusov
>>>>>
>>>>>
>>>>> On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav <
>>>>> contact@gmail.com> wrote:
>>>>>
>>>>>> Hi Team,
>>>>>>
>>>>>> I am trying  to export Flink stream default metrics using Graphite,
>>>>>> but I can't find it in the Graphite metrics console.  Could you confirm 
>>>>>> the
>>>>>> steps below are correct?
>>>>>>
>>>>>> *1) Updated flink-conf.yaml*
>>>>>>
>>>>>> metrics.reporter.grph.factory.class:
>>>>>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>>>>>> metrics.reporter.grph.host: port
>>>>>> metrics.reporter.grph.port: 9109
>>>>>> metrics.reporter.grph.protocol: TCP
>>>>>> metrics.reporter.grph.interval: 30 SECONDS
>>>>>>
>>>>>> 2) Added Graphite jar in plugin folder :
>>>>>>
>>>>>> ll */usr/lib/flink/plugins/metric/*
>>>>>>  *flink-metrics-graphite-1.10.0.jar*
>>>>>>
>>>>>> 3) Looking metrics in graphite server:
>>>>>>
>>>>>> http://port:8080/metrics <http://10.108.58.63:8080/metrics>
>>>>>>
>>>>>> Note: No code change is done.
>>>>>>
>>>>>> Regards,
>>>>>> Vijay
>>>>>>
>>>>>>
>>>>>>
>>>>

Re: Default Flink Metrics Graphite

2020-09-01 Thread Vijayendra Yadav

Thanks all, I could see the metrics.

On Thu, Aug 27, 2020 at 7:51 AM Robert Metzger  wrote:

> I don't think these error messages give us a hint why you can't see the
> metrics (because they are about registering metrics, not reporting them)
>
> Are you sure you are using the right configuration parameters for Flink
> 1.10? That all required JARs are in the lib/ folder (on all machines) and
> that your graphite setup is working (have you confirmed that you can show
> any metrics in the Graphite UI (maybe from a Graphite demo thingy))?
>
>
> On Thu, Aug 27, 2020 at 2:05 AM Vijayendra Yadav 
> wrote:
>
>> Hi Chesnay and Dawid,
>>
>> I see multiple entries as following in Log:
>>
>> 2020-08-26 23:46:19,105 WARN
>> org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while
>> registering metric: numRecordsIn.
>> java.lang.IllegalArgumentException: A metric named
>> ip-99--99-99.taskmanager.container_1596056409708_1570_01_06.vdcs-kafka-flink-test.Map.0.numRecordsIn
>> already exists
>> at
>> com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
>> 2020-08-26 23:46:19,094 WARN
>> org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while
>> registering metric: numRecordsOut.
>> java.lang.IllegalArgumentException: A metric named
>> ip-99--99-999.taskmanager.container_1596056409708_1570_01_05.vdcs-kafka-flink-test.Map.2.numRecordsOut
>> already exists
>> at
>> com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
>> at
>> org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
>> at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
>> at
>> org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
>> at
>> org.apache.flink.runtime.metrics.MetricRegistryImpl.register(MetricRegistryImpl.java:343)
>> at
>> org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:426)
>> at
>> org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:359)
>> at
>> org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:349)
>> at
>> org.apache.flink.runtime.metrics.groups.OperatorIOMetricGroup.(OperatorIOMetricGroup.java:41)
>> at
>> org.apache.flink.runtime.metrics.groups.OperatorMetricGroup.(OperatorMetricGroup.java:48)
>> at
>> org.apache.flink.runtime.metrics.groups.TaskMetricGroup.lambda$getOrAddOperator$0(TaskMetricGroup.java:154)
>> at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
>> at
>> org.apache.flink.runtime.metrics.groups.TaskMetricGroup.getOrAddOperator(TaskMetricGroup.java:154)
>> at
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.setup(AbstractStreamOperator.java:180)
>> at
>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
>> at
>> org.apache.flink.streaming.api.operators.SimpleOperatorFactory.createStreamOperator(SimpleOperatorFactory.java:75)
>> at
>> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:48)
>> at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:429)
>> at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:353)
>> at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:144)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:433)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
>> at java.lang.Thread.run(Thread.java:748)
>> Regards,
>> Vijay
>>
>>
>> On Wed, Aug 26, 2020 at 7:53 AM Chesnay Schepler 
>> wrote:
>>
>>> metrics.reporter.grph.class:
>>> org.apache.flink.metrics.graphite.GraphiteReporter
>>>
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter
>>>
>>> On 26/08/2020 16:40, Vijayendra Yadav wrote:
>>>
>>> Hi Dawid,
>>>
>>> I have 1.10.0 version of flink. What is alternative for th

Re: Default Flink Metrics Graphite

2020-08-27 Thread Robert Metzger

I don't think these error messages give us a hint why you can't see the
metrics (because they are about registering metrics, not reporting them)

Are you sure you are using the right configuration parameters for Flink
1.10? That all required JARs are in the lib/ folder (on all machines) and
that your graphite setup is working (have you confirmed that you can show
any metrics in the Graphite UI (maybe from a Graphite demo thingy))?


On Thu, Aug 27, 2020 at 2:05 AM Vijayendra Yadav 
wrote:

> Hi Chesnay and Dawid,
>
> I see multiple entries as following in Log:
>
> 2020-08-26 23:46:19,105 WARN
> org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while
> registering metric: numRecordsIn.
> java.lang.IllegalArgumentException: A metric named
> ip-99--99-99.taskmanager.container_1596056409708_1570_01_06.vdcs-kafka-flink-test.Map.0.numRecordsIn
> already exists
> at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
> 2020-08-26 23:46:19,094 WARN
> org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while
> registering metric: numRecordsOut.
> java.lang.IllegalArgumentException: A metric named
> ip-99--99-999.taskmanager.container_1596056409708_1570_01_05.vdcs-kafka-flink-test.Map.2.numRecordsOut
> already exists
> at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
> at
> org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
> at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
> at
> org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
> at
> org.apache.flink.runtime.metrics.MetricRegistryImpl.register(MetricRegistryImpl.java:343)
> at
> org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:426)
> at
> org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:359)
> at
> org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:349)
> at
> org.apache.flink.runtime.metrics.groups.OperatorIOMetricGroup.(OperatorIOMetricGroup.java:41)
> at
> org.apache.flink.runtime.metrics.groups.OperatorMetricGroup.(OperatorMetricGroup.java:48)
> at
> org.apache.flink.runtime.metrics.groups.TaskMetricGroup.lambda$getOrAddOperator$0(TaskMetricGroup.java:154)
> at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
> at
> org.apache.flink.runtime.metrics.groups.TaskMetricGroup.getOrAddOperator(TaskMetricGroup.java:154)
> at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.setup(AbstractStreamOperator.java:180)
> at
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
> at
> org.apache.flink.streaming.api.operators.SimpleOperatorFactory.createStreamOperator(SimpleOperatorFactory.java:75)
> at
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:48)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:429)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:353)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:144)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:433)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
> at java.lang.Thread.run(Thread.java:748)
> Regards,
> Vijay
>
>
> On Wed, Aug 26, 2020 at 7:53 AM Chesnay Schepler 
> wrote:
>
>> metrics.reporter.grph.class:
>> org.apache.flink.metrics.graphite.GraphiteReporter
>>
>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter
>>
>> On 26/08/2020 16:40, Vijayendra Yadav wrote:
>>
>> Hi Dawid,
>>
>> I have 1.10.0 version of flink. What is alternative for this version ?
>>
>> Regards,
>> Vijay
>>
>>
>> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz 
>>  wrote:
>>
>> 
>>
>> Hi Vijay,
>>
>> I think the problem might be that you are using a wrong version of the
>> reporter.
>>
>> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a plugin,
>> but it was migrated to plugins in 1.11 only[1].
>>
>> I'd recommend

Re: Default Flink Metrics Graphite

2020-08-26 Thread Vijayendra Yadav

Hi Chesnay and Dawid,

I see multiple entries as following in Log:

2020-08-26 23:46:19,105 WARN
org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while
registering metric: numRecordsIn.
java.lang.IllegalArgumentException: A metric named
ip-99--99-99.taskmanager.container_1596056409708_1570_01_06.vdcs-kafka-flink-test.Map.0.numRecordsIn
already exists
at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
2020-08-26 23:46:19,094 WARN
org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while
registering metric: numRecordsOut.
java.lang.IllegalArgumentException: A metric named
ip-99--99-999.taskmanager.container_1596056409708_1570_01_05.vdcs-kafka-flink-test.Map.2.numRecordsOut
already exists
at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
at
org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
at
org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
at
org.apache.flink.runtime.metrics.MetricRegistryImpl.register(MetricRegistryImpl.java:343)
at
org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:426)
at
org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:359)
at
org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:349)
at
org.apache.flink.runtime.metrics.groups.OperatorIOMetricGroup.(OperatorIOMetricGroup.java:41)
at
org.apache.flink.runtime.metrics.groups.OperatorMetricGroup.(OperatorMetricGroup.java:48)
at
org.apache.flink.runtime.metrics.groups.TaskMetricGroup.lambda$getOrAddOperator$0(TaskMetricGroup.java:154)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at
org.apache.flink.runtime.metrics.groups.TaskMetricGroup.getOrAddOperator(TaskMetricGroup.java:154)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.setup(AbstractStreamOperator.java:180)
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
at
org.apache.flink.streaming.api.operators.SimpleOperatorFactory.createStreamOperator(SimpleOperatorFactory.java:75)
at
org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:48)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:429)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:353)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:144)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:433)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
at java.lang.Thread.run(Thread.java:748)
Regards,
Vijay

On Wed, Aug 26, 2020 at 7:53 AM Chesnay Schepler  wrote:

> metrics.reporter.grph.class:
> org.apache.flink.metrics.graphite.GraphiteReporter
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter
>
> On 26/08/2020 16:40, Vijayendra Yadav wrote:
>
> Hi Dawid,
>
> I have 1.10.0 version of flink. What is alternative for this version ?
>
> Regards,
> Vijay
>
>
> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz 
>  wrote:
>
> 
>
> Hi Vijay,
>
> I think the problem might be that you are using a wrong version of the
> reporter.
>
> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a plugin,
> but it was migrated to plugins in 1.11 only[1].
>
> I'd recommend trying it out with the same 1.11 version of Flink and
> Graphite reporter.
>
> Best,
>
> Dawid
>
> [1] https://issues.apache.org/jira/browse/FLINK-16965
> On 26/08/2020 08:04, Vijayendra Yadav wrote:
>
> Hi Nikola,
>
> To rule out any other cluster issues, I have tried it in my local now.
> Steps as follows, but don't see any metrics yet.
>
> 1) Set up local Graphite
>
> docker run -d\
>  --name graphite\
>  --restart=always\
>  -p 80:80\
>  -p 2003-2004:2003-2004\
>  -p 2023-2024:2023-2024\
>  -p 8125:8125/udp\
>  -p 8126:8126\
>  graphiteapp/graphite-statsd
>
> Mapped Ports
> Host Container Service
> 80 80 nginx <https://www.nginx.com/resources/admin-guide/>
> 2003 2003 carbon receiver - plaintext
> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol>
> 2004 2004 carbon receiver - pickle
> <h

Re: Default Flink Metrics Graphite

2020-08-26 Thread Chesnay Schepler

metrics.reporter.grph.class: 
org.apache.flink.metrics.graphite.GraphiteReporter


https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter

On 26/08/2020 16:40, Vijayendra Yadav wrote:

Hi Dawid,

I have 1.10.0 version of flink. What is alternative for this version ?

Regards,
Vijay



On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz 
 wrote:




Hi Vijay,

I think the problem might be that you are using a wrong version of 
the reporter.


You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a 
plugin, but it was migrated to plugins in 1.11 only[1].


I'd recommend trying it out with the same 1.11 version of Flink and 
Graphite reporter.


Best,

Dawid

[1] https://issues.apache.org/jira/browse/FLINK-16965

On 26/08/2020 08:04, Vijayendra Yadav wrote:

Hi Nikola,

To rule out any other cluster issues, I have tried it in my local 
now. Steps as follows, but don't see any metrics yet.


1) Set up local Graphite

|docker run -d\ --name graphite\ --restart=always\ -p 80:80\ -p 
2003-2004:2003-2004\ -p 2023-2024:2023-2024\ -p 8125:8125/udp\ -p 
8126:8126\ graphiteapp/graphite-statsd|



  Mapped Ports

HostContainer   Service
80  80  nginx <https://www.nginx.com/resources/admin-guide/>
2003 	2003 	carbon receiver - plaintext 
<http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol> 

2004 	2004 	carbon receiver - pickle 
<http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol> 

2023 	2023 	carbon aggregator - plaintext 
<http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py> 

2024 	2024 	carbon aggregator - pickle 
<http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py> 


80808080Graphite internal gunicorn port (without Nginx proxying).
8125 	8125 	statsd 
<https://github.com/etsy/statsd/blob/master/docs/server.md>
8126 	8126 	statsd admin 
<https://github.com/etsy/statsd/blob/master/docs/admin_interface.md>


2) WebUI:





3) Run Flink example Job.
./bin/flink run 
./examples/flink-examples-streaming_2.11-1.11-SNAPSHOT-SocketWindowWordCount.jar 
--port 


with conf/flink-conf.yaml set as:

metrics.reporter.grph.factory.class: 
org.apache.flink.metrics.graphite.GraphiteReporterFactory

metrics.reporter.grph.host: localhost
metrics.reporter.grph.port: 2003
metrics.reporter.grph.protocol: TCP
metrics.reporter.grph.interval: 1 SECONDS

and graphite jar:

plugins/flink-metrics-graphite/flink-metrics-graphite-1.10.0.jar


4) Can't see any activity in webui graphite.


Could you review and let me know what is wrong here ? any other way 
you suggest to be able to view the raw metrics data ?
Also, do you have sample metrics raw format, you can share from any 
other project.


Regards,
Vijay




On Sun, Aug 23, 2020 at 9:26 PM Nikola Hrusov <mailto:n.hru...@gmail.com>> wrote:


Hi Vijay,

Your steps look correct to me.
Perhaps you can double check that the graphite port you are
sending is correct? THe default carbon port is 2003 and if you
use the aggregator it is 2023.

You should be able to see in both flink jobmanager and
taskmanager that the metrics have been initialized with the
config you have pasted.

Regards
,
Nikola Hrusov


On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav
mailto:contact@gmail.com>> wrote:

Hi Team,

I am trying  to export Flink stream default metrics using
Graphite, but I can't find it in the Graphite metrics
console.  Could you confirm the steps below are correct?

*1) Updated flink-conf.yaml*

metrics.reporter.grph.factory.class:
org.apache.flink.metrics.graphite.GraphiteReporterFactory
metrics.reporter.grph.host: port
metrics.reporter.grph.port: 9109
metrics.reporter.grph.protocol: TCP
metrics.reporter.grph.interval: 30 SECONDS

2) Added Graphite jar in plugin folder :

    ll */usr/lib/flink/plugins/metric/*
*flink-metrics-graphite-1.10.0.jar*

3) Looking metrics in graphite server:

http://port:8080/metrics <http://10.108.58.63:8080/metrics>

Note: No code change is done.

Regards,
Vijay

Re: Default Flink Metrics Graphite

2020-08-26 Thread Dawid Wysakowicz

I'd recommend then following this instruction from older docs[1]

The difference are that you should set:

|metrics.reporter.grph.class:
org.apache.flink.metrics.graphite.GraphiteReporter|

and put the reporter jar to the /lib folder:

In order to use this reporter you must copy
|/opt/flink-metrics-graphite-1.10.0.jar| into the |/lib| folder of your
Flink distribution.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter

Best,

Dawid

On 26/08/2020 16:40, Vijayendra Yadav wrote:
> Hi Dawid,
>
> I have 1.10.0 version of flink. What is alternative for this version ?
>
> Regards,
> Vijay
>
>>
>> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz
>>  wrote:
>>
>> 
>>
>> Hi Vijay,
>>
>> I think the problem might be that you are using a wrong version of
>> the reporter.
>>
>> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a
>> plugin, but it was migrated to plugins in 1.11 only[1].
>>
>> I'd recommend trying it out with the same 1.11 version of Flink and
>> Graphite reporter.
>>
>> Best,
>>
>> Dawid
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-16965
>>
>> On 26/08/2020 08:04, Vijayendra Yadav wrote:
>>> Hi Nikola,
>>>
>>> To rule out any other cluster issues, I have tried it in my local
>>> now. Steps as follows, but don't see any metrics yet.
>>>
>>> 1) Set up local Graphite 
>>>
>>> |docker run -d\ --name graphite\ --restart=always\ -p 80:80\ -p
>>> 2003-2004:2003-2004\ -p 2023-2024:2023-2024\ -p 8125:8125/udp\ -p
>>> 8126:8126\ graphiteapp/graphite-statsd|
>>>
>>>
>>>   Mapped Ports
>>>
>>> HostContainer   Service
>>> 80  80  nginx <https://www.nginx.com/resources/admin-guide/>
>>> 20032003carbon receiver - plaintext
>>> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol>
>>>
>>> 20042004carbon receiver - pickle
>>> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol>
>>>
>>> 20232023carbon aggregator - plaintext
>>> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>>>
>>> 20242024carbon aggregator - pickle
>>> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>>>
>>> 80808080Graphite internal gunicorn port (without Nginx 
>>> proxying).
>>> 81258125statsd
>>> <https://github.com/etsy/statsd/blob/master/docs/server.md>
>>> 81268126statsd admin
>>> <https://github.com/etsy/statsd/blob/master/docs/admin_interface.md>
>>>
>>> 2) WebUI: 
>>>
>>> 
>>>
>>>
>>>
>>> 3) Run Flink example Job.
>>> ./bin/flink run
>>> ./examples/flink-examples-streaming_2.11-1.11-SNAPSHOT-SocketWindowWordCount.jar
>>> --port 
>>>
>>> with conf/flink-conf.yaml set as:
>>>
>>> metrics.reporter.grph.factory.class:
>>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>>> metrics.reporter.grph.host: localhost
>>> metrics.reporter.grph.port: 2003
>>> metrics.reporter.grph.protocol: TCP
>>> metrics.reporter.grph.interval: 1 SECONDS
>>>
>>> and graphite jar:
>>>
>>> plugins/flink-metrics-graphite/flink-metrics-graphite-1.10.0.jar
>>>
>>>
>>> 4) Can't see any activity in webui graphite. 
>>>
>>>
>>> Could you review and let me know what is wrong here ? any other way
>>> you suggest to be able to view the raw metrics data ?
>>> Also, do you have sample metrics raw format, you can share from any
>>> other project.
>>>
>>> Regards,
>>> Vijay
>>>
>>>
>>>
>>>
>>> On Sun, Aug 23, 2020 at 9:26 PM Nikola Hrusov >> <mailto:n.hru...@gmail.com>> wrote:
>>>
>>> Hi Vijay,
>>>
>>> Your steps look correct to me. 
>>> Perhaps you can double check that the graphite port you are
>>> sending is correct? THe default carbon port is 2003 and if you
>>> use the aggregator it is 2023.
>>>
>>> You should be able to see in both flink jobmanager and
>>> taskmanager that the metrics have been initialized with

Re: Default Flink Metrics Graphite

2020-08-26 Thread Vijayendra Yadav

Hi Dawid,

I have 1.10.0 version of flink. What is alternative for this version ?

Regards,
Vijay

> 
> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz  wrote:
> 
> 
> Hi Vijay,
> 
> I think the problem might be that you are using a wrong version of the 
> reporter.
> 
> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a plugin, but 
> it was migrated to plugins in 1.11 only[1].
> 
> I'd recommend trying it out with the same 1.11 version of Flink and Graphite 
> reporter.
> 
> Best,
> 
> Dawid
> 
> [1] https://issues.apache.org/jira/browse/FLINK-16965
> 
> On 26/08/2020 08:04, Vijayendra Yadav wrote:
>> Hi Nikola,
>> 
>> To rule out any other cluster issues, I have tried it in my local now. Steps 
>> as follows, but don't see any metrics yet.
>> 
>> 1) Set up local Graphite 
>> 
>> docker run -d\
>>  --name graphite\
>>  --restart=always\
>>  -p 80:80\
>>  -p 2003-2004:2003-2004\
>>  -p 2023-2024:2023-2024\
>>  -p 8125:8125/udp\
>>  -p 8126:8126\
>>  graphiteapp/graphite-statsd
>> Mapped Ports
>> 
>> Host Container   Service
>> 80   80  nginx
>> 2003 2003carbon receiver - plaintext
>> 2004 2004carbon receiver - pickle
>> 2023 2023carbon aggregator - plaintext
>> 2024 2024carbon aggregator - pickle
>> 8080 8080Graphite internal gunicorn port (without Nginx proxying).
>> 8125 8125statsd
>> 8126 8126statsd admin
>> 2) WebUI: 
>> 
>> 
>> 
>> 
>> 
>> 3) Run Flink example Job.
>> ./bin/flink run 
>> ./examples/flink-examples-streaming_2.11-1.11-SNAPSHOT-SocketWindowWordCount.jar
>>  --port 
>> 
>> with conf/flink-conf.yaml set as:
>> 
>> metrics.reporter.grph.factory.class: 
>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>> metrics.reporter.grph.host: localhost
>> metrics.reporter.grph.port: 2003
>> metrics.reporter.grph.protocol: TCP
>> metrics.reporter.grph.interval: 1 SECONDS
>> 
>> and graphite jar:
>> 
>> plugins/flink-metrics-graphite/flink-metrics-graphite-1.10.0.jar
>> 
>> 
>> 4) Can't see any activity in webui graphite. 
>> 
>> 
>> Could you review and let me know what is wrong here ? any other way you 
>> suggest to be able to view the raw metrics data ?
>> Also, do you have sample metrics raw format, you can share from any other 
>> project.
>> 
>> Regards,
>> Vijay
>> 
>> 
>> 
>> 
>> On Sun, Aug 23, 2020 at 9:26 PM Nikola Hrusov  wrote:
>>> Hi Vijay,
>>> 
>>> Your steps look correct to me. 
>>> Perhaps you can double check that the graphite port you are sending is 
>>> correct? THe default carbon port is 2003 and if you use the aggregator it 
>>> is 2023.
>>> 
>>> You should be able to see in both flink jobmanager and taskmanager that the 
>>> metrics have been initialized with the config you have pasted.
>>> 
>>> Regards
>>> ,
>>> Nikola Hrusov
>>> 
>>> 
>>> On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav  
>>> wrote:
>>>> Hi Team,
>>>> 
>>>> I am trying  to export Flink stream default metrics using Graphite, but I 
>>>> can't find it in the Graphite metrics console.  Could you confirm the 
>>>> steps below are correct?
>>>> 
>>>> 1) Updated flink-conf.yaml
>>>> 
>>>> metrics.reporter.grph.factory.class: 
>>>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>>>> metrics.reporter.grph.host: port
>>>> metrics.reporter.grph.port: 9109
>>>> metrics.reporter.grph.protocol: TCP
>>>> metrics.reporter.grph.interval: 30 SECONDS
>>>> 
>>>> 2) Added Graphite jar in plugin folder :
>>>> 
>>>> ll /usr/lib/flink/plugins/metric/
>>>>  flink-metrics-graphite-1.10.0.jar
>>>> 
>>>> 3) Looking metrics in graphite server:
>>>> 
>>>> http://port:8080/metrics  
>>>> 
>>>> Note: No code change is done.
>>>> 
>>>> Regards,
>>>> Vijay
>>>> 
>>>>

Re: Default Flink Metrics Graphite

2020-08-26 Thread Dawid Wysakowicz

Hi Vijay,

I think the problem might be that you are using a wrong version of the
reporter.

You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a
plugin, but it was migrated to plugins in 1.11 only[1].

I'd recommend trying it out with the same 1.11 version of Flink and
Graphite reporter.

Best,

Dawid

[1] https://issues.apache.org/jira/browse/FLINK-16965

On 26/08/2020 08:04, Vijayendra Yadav wrote:
> Hi Nikola,
>
> To rule out any other cluster issues, I have tried it in my local now.
> Steps as follows, but don't see any metrics yet.
>
> 1) Set up local Graphite 
>
> |docker run -d\ --name graphite\ --restart=always\ -p 80:80\ -p
> 2003-2004:2003-2004\ -p 2023-2024:2023-2024\ -p 8125:8125/udp\ -p
> 8126:8126\ graphiteapp/graphite-statsd|
>
>
>   Mapped Ports
>
> Host  Container   Service
> 8080  nginx <https://www.nginx.com/resources/admin-guide/>
> 2003  2003carbon receiver - plaintext
> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol>
>
> 2004  2004carbon receiver - pickle
> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol>
>
> 2023  2023carbon aggregator - plaintext
> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>
> 2024  2024carbon aggregator - pickle
> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>
> 8080  8080Graphite internal gunicorn port (without Nginx proxying).
> 8125  8125statsd
> <https://github.com/etsy/statsd/blob/master/docs/server.md>
> 8126  8126statsd admin
> <https://github.com/etsy/statsd/blob/master/docs/admin_interface.md>
>
> 2) WebUI: 
>
> image.png
>
>
> 3) Run Flink example Job.
> ./bin/flink run
> ./examples/flink-examples-streaming_2.11-1.11-SNAPSHOT-SocketWindowWordCount.jar
> --port 
>
> with conf/flink-conf.yaml set as:
>
> metrics.reporter.grph.factory.class:
> org.apache.flink.metrics.graphite.GraphiteReporterFactory
> metrics.reporter.grph.host: localhost
> metrics.reporter.grph.port: 2003
> metrics.reporter.grph.protocol: TCP
> metrics.reporter.grph.interval: 1 SECONDS
>
> and graphite jar:
>
> plugins/flink-metrics-graphite/flink-metrics-graphite-1.10.0.jar
>
>
> 4) Can't see any activity in webui graphite. 
>
>
> Could you review and let me know what is wrong here ? any other way
> you suggest to be able to view the raw metrics data ?
> Also, do you have sample metrics raw format, you can share from any
> other project.
>
> Regards,
> Vijay
>
>
>
>
> On Sun, Aug 23, 2020 at 9:26 PM Nikola Hrusov  <mailto:n.hru...@gmail.com>> wrote:
>
> Hi Vijay,
>
> Your steps look correct to me. 
> Perhaps you can double check that the graphite port you are
> sending is correct? THe default carbon port is 2003 and if you use
> the aggregator it is 2023.
>
> You should be able to see in both flink jobmanager and taskmanager
> that the metrics have been initialized with the config you have
> pasted.
>
> Regards
> ,
> Nikola Hrusov
>
>
> On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav
> mailto:contact@gmail.com>> wrote:
>
> Hi Team,
>
> I am trying  to export Flink stream default metrics using
> Graphite, but I can't find it in the Graphite metrics
> console.  Could you confirm the steps below are correct?
>
> *1) Updated flink-conf.yaml*
>
> metrics.reporter.grph.factory.class:
> org.apache.flink.metrics.graphite.GraphiteReporterFactory
> metrics.reporter.grph.host: port
> metrics.reporter.grph.port: 9109
> metrics.reporter.grph.protocol: TCP
> metrics.reporter.grph.interval: 30 SECONDS
>
> 2) Added Graphite jar in plugin folder :
>
> ll */usr/lib/flink/plugins/metric/*
>  *flink-metrics-graphite-1.10.0.jar*
>
> 3) Looking metrics in graphite server:
>
> http://port:8080/metrics <http://10.108.58.63:8080/metrics>  
>
> Note: No code change is done.
>
> Regards,
> Vijay
>
>


signature.asc
Description: OpenPGP digital signature

Re: Default Flink Metrics Graphite

2020-08-26 Thread Vijayendra Yadav

Hi Nikola,

To rule out any other cluster issues, I have tried it in my local now.
Steps as follows, but don't see any metrics yet.

1) Set up local Graphite

docker run -d\
 --name graphite\
 --restart=always\
 -p 80:80\
 -p 2003-2004:2003-2004\
 -p 2023-2024:2023-2024\
 -p 8125:8125/udp\
 -p 8126:8126\
 graphiteapp/graphite-statsd

Mapped Ports
HostContainerService
80 80 nginx <https://www.nginx.com/resources/admin-guide/>
2003 2003 carbon receiver - plaintext
<http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol>
2004 2004 carbon receiver - pickle
<http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol>
2023 2023 carbon aggregator - plaintext
<http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
2024 2024 carbon aggregator - pickle
<http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
8080 8080 Graphite internal gunicorn port (without Nginx proxying).
8125 8125 statsd <https://github.com/etsy/statsd/blob/master/docs/server.md>
8126 8126 statsd admin
<https://github.com/etsy/statsd/blob/master/docs/admin_interface.md>
2) WebUI:

[image: image.png]


3) Run Flink example Job.
./bin/flink run
./examples/flink-examples-streaming_2.11-1.11-SNAPSHOT-SocketWindowWordCount.jar
--port 

with conf/flink-conf.yaml set as:

metrics.reporter.grph.factory.class:
org.apache.flink.metrics.graphite.GraphiteReporterFactory
metrics.reporter.grph.host: localhost
metrics.reporter.grph.port: 2003
metrics.reporter.grph.protocol: TCP
metrics.reporter.grph.interval: 1 SECONDS

and graphite jar:

plugins/flink-metrics-graphite/flink-metrics-graphite-1.10.0.jar


4) Can't see any activity in webui graphite.


Could you review and let me know what is wrong here ? any other way you
suggest to be able to view the raw metrics data ?
Also, do you have sample metrics raw format, you can share from any other
project.

Regards,
Vijay




On Sun, Aug 23, 2020 at 9:26 PM Nikola Hrusov  wrote:

> Hi Vijay,
>
> Your steps look correct to me.
> Perhaps you can double check that the graphite port you are sending is
> correct? THe default carbon port is 2003 and if you use the aggregator it
> is 2023.
>
> You should be able to see in both flink jobmanager and taskmanager that
> the metrics have been initialized with the config you have pasted.
>
> Regards
> ,
> Nikola Hrusov
>
>
> On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav 
> wrote:
>
>> Hi Team,
>>
>> I am trying  to export Flink stream default metrics using Graphite, but I
>> can't find it in the Graphite metrics console.  Could you confirm the steps
>> below are correct?
>>
>> *1) Updated flink-conf.yaml*
>>
>> metrics.reporter.grph.factory.class:
>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>> metrics.reporter.grph.host: port
>> metrics.reporter.grph.port: 9109
>> metrics.reporter.grph.protocol: TCP
>> metrics.reporter.grph.interval: 30 SECONDS
>>
>> 2) Added Graphite jar in plugin folder :
>>
>> ll */usr/lib/flink/plugins/metric/*
>>  *flink-metrics-graphite-1.10.0.jar*
>>
>> 3) Looking metrics in graphite server:
>>
>> http://port:8080/metrics <http://10.108.58.63:8080/metrics>
>>
>> Note: No code change is done.
>>
>> Regards,
>> Vijay
>>
>>
>>

Re: Default Flink Metrics Graphite

2020-08-25 Thread Vijayendra Yadav

Thanks for inputs Nikola. I will check on graphite side.

Sent from my iPhone

> On Aug 23, 2020, at 9:26 PM, Nikola Hrusov  wrote:
> 
> 
> Hi Vijay,
> 
> Your steps look correct to me. 
> Perhaps you can double check that the graphite port you are sending is 
> correct? THe default carbon port is 2003 and if you use the aggregator it is 
> 2023.
> 
> You should be able to see in both flink jobmanager and taskmanager that the 
> metrics have been initialized with the config you have pasted.
> 
> Regards,
> Nikola Hrusov
> 
> 
>> On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav  
>> wrote:
>> Hi Team,
>> 
>> I am trying  to export Flink stream default metrics using Graphite, but I 
>> can't find it in the Graphite metrics console.  Could you confirm the steps 
>> below are correct?
>> 
>> 1) Updated flink-conf.yaml
>> 
>> metrics.reporter.grph.factory.class: 
>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>> metrics.reporter.grph.host: port
>> metrics.reporter.grph.port: 9109
>> metrics.reporter.grph.protocol: TCP
>> metrics.reporter.grph.interval: 30 SECONDS
>> 
>> 2) Added Graphite jar in plugin folder :
>> 
>> ll /usr/lib/flink/plugins/metric/
>>  flink-metrics-graphite-1.10.0.jar
>> 
>> 3) Looking metrics in graphite server:
>> 
>> http://port:8080/metrics  
>> 
>> Note: No code change is done.
>> 
>> Regards,
>> Vijay
>> 
>>

Re: Default Flink Metrics Graphite

2020-08-23 Thread Nikola Hrusov

Hi Vijay,

Your steps look correct to me.
Perhaps you can double check that the graphite port you are sending is
correct? THe default carbon port is 2003 and if you use the aggregator it
is 2023.

You should be able to see in both flink jobmanager and taskmanager that the
metrics have been initialized with the config you have pasted.

Regards
,
Nikola Hrusov

On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav 
wrote:

> Hi Team,
>
> I am trying  to export Flink stream default metrics using Graphite, but I
> can't find it in the Graphite metrics console.  Could you confirm the steps
> below are correct?
>
> *1) Updated flink-conf.yaml*
>
> metrics.reporter.grph.factory.class:
> org.apache.flink.metrics.graphite.GraphiteReporterFactory
> metrics.reporter.grph.host: port
> metrics.reporter.grph.port: 9109
> metrics.reporter.grph.protocol: TCP
> metrics.reporter.grph.interval: 30 SECONDS
>
> 2) Added Graphite jar in plugin folder :
>
> ll */usr/lib/flink/plugins/metric/*
>  *flink-metrics-graphite-1.10.0.jar*
>
> 3) Looking metrics in graphite server:
>
> http://port:8080/metrics <http://10.108.58.63:8080/metrics>
>
> Note: No code change is done.
>
> Regards,
> Vijay
>
>
>

Default Flink Metrics Graphite

2020-08-23 Thread Vijayendra Yadav

Hi Team,

I am trying  to export Flink stream default metrics using Graphite, but I
can't find it in the Graphite metrics console.  Could you confirm the steps
below are correct?

*1) Updated flink-conf.yaml*

metrics.reporter.grph.factory.class:
org.apache.flink.metrics.graphite.GraphiteReporterFactory
metrics.reporter.grph.host: port
metrics.reporter.grph.port: 9109
metrics.reporter.grph.protocol: TCP
metrics.reporter.grph.interval: 30 SECONDS

2) Added Graphite jar in plugin folder :

ll */usr/lib/flink/plugins/metric/*
 *flink-metrics-graphite-1.10.0.jar*

3) Looking metrics in graphite server:

http://port:8080/metrics <http://10.108.58.63:8080/metrics>

Note: No code change is done.

Regards,
Vijay

Re: A query on Flink metrics in kubernetes

2020-07-09 Thread Chesnay Schepler

From Flink's perspective no metrics are aggregated, nor are metric 
requests forwarded to some other process.


Each TaskExecutor has its own reporter, that each must be scraped to get 
the full set of metrics.


On 09/07/2020 11:39, Manish G wrote:

Hi,

I have a query regarding prometheus scraping Flink metrics data with 
application running in kubernetes cluster.


If taskmanager is running on multiple nodes, and prometheus requests 
for the metrics data, then is that request directed to one of the 
nodes(based on some strategy, like round-robin) or is data aggregated 
from all the nodes?


With regards

A query on Flink metrics in kubernetes

2020-07-09 Thread Manish G

Hi,

I have a query regarding prometheus scraping Flink metrics data with
application running in kubernetes cluster.

If taskmanager is running on multiple nodes, and prometheus requests for
the metrics data, then is that request directed to one of the nodes(based
on some strategy, like round-robin) or is data aggregated from all the
nodes?

With regards

Re: Logging Flink metrics

>
>>>
>>>
>>>
>>> On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler 
>>> wrote:
>>>
>>>> You've said elsewhere that you do see some metrics in prometheus, which
>>>> are those?
>>>>
>>>> Why are you configuring the host for the prometheus reporter? This
>>>> option is only for the PrometheusPushGatewayReporter.
>>>>
>>>> On 06/07/2020 18:01, Manish G wrote:
>>>>
>>>> Hi,
>>>>
>>>> So I have following in flink-conf.yml :
>>>> //
>>>> metrics.reporter.prom.class:
>>>> org.apache.flink.metrics.prometheus.PrometheusReporter
>>>> metrics.reporter.prom.host: 127.0.0.1
>>>> metrics.reporter.prom.port: 
>>>> metrics.reporter.slf4j.class:
>>>> org.apache.flink.metrics.slf4j.Slf4jReporter
>>>> metrics.reporter.slf4j.interval: 30 SECONDS
>>>> //
>>>>
>>>> And while I can see custom metrics in Taskmanager logs, but prometheus
>>>> dashboard logs doesn't show custom metrics.
>>>>
>>>> With regards
>>>>
>>>> On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler 
>>>> wrote:
>>>>
>>>>> You have explicitly configured a reporter list, resulting in the slf4j
>>>>> reporter being ignored:
>>>>>
>>>>> 2020-07-06 13:48:22,191 INFO
>>>>> org.apache.flink.configuration.GlobalConfiguration- Loading
>>>>> configuration property: metrics.reporters, prom
>>>>> 2020-07-06 13:48:23,203 INFO
>>>>> org.apache.flink.runtime.metrics.ReporterSetup- Excluding
>>>>> reporter slf4j, not configured in reporter list (prom).
>>>>>
>>>>> Note that nowadays metrics.reporters is no longer required; the set
>>>>> of reporters is automatically determined based on configured properties;
>>>>> the only use-case is disabling a reporter without having to remove the
>>>>> entire configuration.
>>>>> I'd suggest to just remove the option, try again, and report back.
>>>>>
>>>>> On 06/07/2020 16:35, Chesnay Schepler wrote:
>>>>>
>>>>> Please enable debug logging and search for warnings from the metric
>>>>> groups/registry/reporter.
>>>>>
>>>>> If you cannot find anything suspicious, you can also send the foll log
>>>>> to me directly.
>>>>>
>>>>> On 06/07/2020 16:29, Manish G wrote:
>>>>>
>>>>> Job is an infinite streaming one, so it keeps going. Flink
>>>>> configuration is as:
>>>>>
>>>>> metrics.reporter.slf4j.class:
>>>>> org.apache.flink.metrics.slf4j.Slf4jReporter
>>>>> metrics.reporter.slf4j.interval: 30 SECONDS
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler 
>>>>> wrote:
>>>>>
>>>>>> How long did the job run for, and what is the configured interval?
>>>>>>
>>>>>>
>>>>>> On 06/07/2020 15:51, Manish G wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Thanks for this.
>>>>>>
>>>>>> I did the configuration as mentioned at the link(changes in
>>>>>> flink-conf.yml, copying the jar in lib directory), and registered the 
>>>>>> Meter
>>>>>> with metrics group and invoked markEvent() method in the target code. 
>>>>>> But I
>>>>>> don't see any related logs.
>>>>>> I am doing this all on my local computer.
>>>>>>
>>>>>> Anything else I need to do?
>>>>>>
>>>>>> With regards
>>>>>> Manish
>>>>>>
>>>>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler 
>>>>>> wrote:
>>>>>>
>>>>>>> Have you looked at the SLF4J reporter?
>>>>>>>
>>>>>>>
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>>>>>
>>>>>>> On 06/07/2020 13:49, Manish G wrote:
>>>>>>> > Hi,
>>>>>>> >
>>>>>>> > Is it possible to log Flink metrics in application logs apart from
>>>>>>> > publishing it to Prometheus?
>>>>>>> >
>>>>>>> > With regards
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Logging Flink metrics

rom.host: 127.0.0.1
metrics.reporter.prom.port: 
metrics.reporter.slf4j.class:
org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//

And while I can see custom metrics in Taskmanager logs,
but prometheus dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

You have explicitly configured a reporter list,
resulting in the slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO
org.apache.flink.configuration.GlobalConfiguration
- Loading configuration property:
metrics.reporters, prom
2020-07-06 13:48:23,203 INFO
org.apache.flink.runtime.metrics.ReporterSetup -
Excluding reporter slf4j, not configured in
reporter list (prom).

Note that nowadays metrics.reporters is no longer
required; the set of reporters is automatically
determined based on configured properties; the only
use-case is disabling a reporter without having to
remove the entire configuration.
I'd suggest to just remove the option, try again,
and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:

Please enable debug logging and search for
warnings from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can
also send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:

Job is an infinite streaming one, so it keeps
going. Flink configuration is as:

metrics.reporter.slf4j.class:
org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS

On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler
mailto:ches...@apache.org>>
wrote:

How long did the job run for, and what is the
configured interval?

On 06/07/2020 15:51, Manish G wrote:

Hi,

Thanks for this.

I did the configuration as mentioned at the
link(changes in flink-conf.yml, copying the
jar in lib directory), and registered the
Meter with metrics group and invoked
markEvent() method in the target code. But I
don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay
Schepler mailto:ches...@apache.org>> wrote:

Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in
application logs apart from
> publishing it to Prometheus?
>
> With regards

Re: Logging Flink metrics

ics in Taskmanager logs, but prometheus
>>> dashboard logs doesn't show custom metrics.
>>>
>>> With regards
>>>
>>> On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler 
>>> wrote:
>>>
>>>> You have explicitly configured a reporter list, resulting in the slf4j
>>>> reporter being ignored:
>>>>
>>>> 2020-07-06 13:48:22,191 INFO
>>>> org.apache.flink.configuration.GlobalConfiguration- Loading
>>>> configuration property: metrics.reporters, prom
>>>> 2020-07-06 13:48:23,203 INFO
>>>> org.apache.flink.runtime.metrics.ReporterSetup- Excluding
>>>> reporter slf4j, not configured in reporter list (prom).
>>>>
>>>> Note that nowadays metrics.reporters is no longer required; the set of
>>>> reporters is automatically determined based on configured properties; the
>>>> only use-case is disabling a reporter without having to remove the entire
>>>> configuration.
>>>> I'd suggest to just remove the option, try again, and report back.
>>>>
>>>> On 06/07/2020 16:35, Chesnay Schepler wrote:
>>>>
>>>> Please enable debug logging and search for warnings from the metric
>>>> groups/registry/reporter.
>>>>
>>>> If you cannot find anything suspicious, you can also send the foll log
>>>> to me directly.
>>>>
>>>> On 06/07/2020 16:29, Manish G wrote:
>>>>
>>>> Job is an infinite streaming one, so it keeps going. Flink
>>>> configuration is as:
>>>>
>>>> metrics.reporter.slf4j.class:
>>>> org.apache.flink.metrics.slf4j.Slf4jReporter
>>>> metrics.reporter.slf4j.interval: 30 SECONDS
>>>>
>>>>
>>>>
>>>> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler 
>>>> wrote:
>>>>
>>>>> How long did the job run for, and what is the configured interval?
>>>>>
>>>>>
>>>>> On 06/07/2020 15:51, Manish G wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks for this.
>>>>>
>>>>> I did the configuration as mentioned at the link(changes in
>>>>> flink-conf.yml, copying the jar in lib directory), and registered the 
>>>>> Meter
>>>>> with metrics group and invoked markEvent() method in the target code. But 
>>>>> I
>>>>> don't see any related logs.
>>>>> I am doing this all on my local computer.
>>>>>
>>>>> Anything else I need to do?
>>>>>
>>>>> With regards
>>>>> Manish
>>>>>
>>>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler 
>>>>> wrote:
>>>>>
>>>>>> Have you looked at the SLF4J reporter?
>>>>>>
>>>>>>
>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>>>>
>>>>>> On 06/07/2020 13:49, Manish G wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > Is it possible to log Flink metrics in application logs apart from
>>>>>> > publishing it to Prometheus?
>>>>>> >
>>>>>> > With regards
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Logging Flink metrics

org.apache.flink.runtime.metrics.ReporterSetup -
Excluding reporter slf4j, not configured in reporter
list (prom).

Note that nowadays metrics.reporters is no longer
required; the set of reporters is automatically
determined based on configured properties; the only
use-case is disabling a reporter without having to
remove the entire configuration.
I'd suggest to just remove the option, try again, and
report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:

Please enable debug logging and search for warnings
from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can also
send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:

Job is an infinite streaming one, so it keeps going.
Flink configuration is as:

metrics.reporter.slf4j.class:
org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS

On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

How long did the job run for, and what is the
configured interval?

On 06/07/2020 15:51, Manish G wrote:

Hi,

Thanks for this.

I did the configuration as mentioned at the
link(changes in flink-conf.yml, copying the jar
in lib directory), and registered the Meter with
metrics group and invoked markEvent() method in
the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
mailto:ches...@apache.org>>
wrote:

Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in
application logs apart from
> publishing it to Prometheus?
>
> With regards

Re: Logging Flink metrics

;>> org.apache.flink.runtime.metrics.ReporterSetup- Excluding
>>> reporter slf4j, not configured in reporter list (prom).
>>>
>>> Note that nowadays metrics.reporters is no longer required; the set of
>>> reporters is automatically determined based on configured properties; the
>>> only use-case is disabling a reporter without having to remove the entire
>>> configuration.
>>> I'd suggest to just remove the option, try again, and report back.
>>>
>>> On 06/07/2020 16:35, Chesnay Schepler wrote:
>>>
>>> Please enable debug logging and search for warnings from the metric
>>> groups/registry/reporter.
>>>
>>> If you cannot find anything suspicious, you can also send the foll log
>>> to me directly.
>>>
>>> On 06/07/2020 16:29, Manish G wrote:
>>>
>>> Job is an infinite streaming one, so it keeps going. Flink configuration
>>> is as:
>>>
>>> metrics.reporter.slf4j.class:
>>> org.apache.flink.metrics.slf4j.Slf4jReporter
>>> metrics.reporter.slf4j.interval: 30 SECONDS
>>>
>>>
>>>
>>> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler 
>>> wrote:
>>>
>>>> How long did the job run for, and what is the configured interval?
>>>>
>>>>
>>>> On 06/07/2020 15:51, Manish G wrote:
>>>>
>>>> Hi,
>>>>
>>>> Thanks for this.
>>>>
>>>> I did the configuration as mentioned at the link(changes in
>>>> flink-conf.yml, copying the jar in lib directory), and registered the Meter
>>>> with metrics group and invoked markEvent() method in the target code. But I
>>>> don't see any related logs.
>>>> I am doing this all on my local computer.
>>>>
>>>> Anything else I need to do?
>>>>
>>>> With regards
>>>> Manish
>>>>
>>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler 
>>>> wrote:
>>>>
>>>>> Have you looked at the SLF4J reporter?
>>>>>
>>>>>
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>>>
>>>>> On 06/07/2020 13:49, Manish G wrote:
>>>>> > Hi,
>>>>> >
>>>>> > Is it possible to log Flink metrics in application logs apart from
>>>>> > publishing it to Prometheus?
>>>>> >
>>>>> > With regards
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Re: Logging Flink metrics

 link(changes in flink-conf.yml, copying the jar in lib
directory), and registered the Meter with metrics
group and invoked markEvent() method in the target
code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in
application logs apart from
> publishing it to Prometheus?
>
> With regards

Re: Logging Flink metrics

gt;>> flink-conf.yml, copying the jar in lib directory), and registered the Meter
>>> with metrics group and invoked markEvent() method in the target code. But I
>>> don't see any related logs.
>>> I am doing this all on my local computer.
>>>
>>> Anything else I need to do?
>>>
>>> With regards
>>> Manish
>>>
>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler 
>>> wrote:
>>>
>>>> Have you looked at the SLF4J reporter?
>>>>
>>>>
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>>
>>>> On 06/07/2020 13:49, Manish G wrote:
>>>> > Hi,
>>>> >
>>>> > Is it possible to log Flink metrics in application logs apart from
>>>> > publishing it to Prometheus?
>>>> >
>>>> > With regards
>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: Logging Flink metrics

You've said elsewhere that you do see some metrics in prometheus, which 
are those?

Why are you configuring the host for the prometheus reporter? This 
option is only for the PrometheusPushGatewayReporter.

On 06/07/2020 18:01, Manish G wrote:

Hi,

So I have following in flink-conf.yml :
//
metrics.reporter.prom.class: 
org.apache.flink.metrics.prometheus.PrometheusReporter

metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//

And while I can see custom metrics in Taskmanager logs, but prometheus 
dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <mailto:ches...@apache.org>> wrote:

You have explicitly configured a reporter list, resulting in the
slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO
org.apache.flink.runtime.metrics.ReporterSetup - Excluding
reporter slf4j, not configured in reporter list (prom).

Note that nowadays metrics.reporters is no longer required; the
set of reporters is automatically determined based on configured
properties; the only use-case is disabling a reporter without
having to remove the entire configuration.
I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:

Please enable debug logging and search for warnings from the
metric groups/registry/reporter.

If you cannot find anything suspicious, you can also send the
foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:

Job is an infinite streaming one, so it keeps going. Flink
configuration is as:

metrics.reporter.slf4j.class:
org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS

On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

How long did the job run for, and what is the configured
interval?

On 06/07/2020 15:51, Manish G wrote:

Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in
flink-conf.yml, copying the jar in lib directory), and
registered the Meter with metrics group and invoked
markEvent() method in the target code. But I don't see any
related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
    > Is it possible to log Flink metrics in application
logs apart from
> publishing it to Prometheus?
>
> With regards

Re: Logging Flink metrics

Hi,

So I have following in flink-conf.yml :
//
metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//

And while I can see custom metrics in Taskmanager logs, but prometheus
dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler  wrote:

> You have explicitly configured a reporter list, resulting in the slf4j
> reporter being ignored:
>
> 2020-07-06 13:48:22,191 INFO
> org.apache.flink.configuration.GlobalConfiguration- Loading
> configuration property: metrics.reporters, prom
> 2020-07-06 13:48:23,203 INFO
> org.apache.flink.runtime.metrics.ReporterSetup- Excluding
> reporter slf4j, not configured in reporter list (prom).
>
> Note that nowadays metrics.reporters is no longer required; the set of
> reporters is automatically determined based on configured properties; the
> only use-case is disabling a reporter without having to remove the entire
> configuration.
> I'd suggest to just remove the option, try again, and report back.
>
> On 06/07/2020 16:35, Chesnay Schepler wrote:
>
> Please enable debug logging and search for warnings from the metric
> groups/registry/reporter.
>
> If you cannot find anything suspicious, you can also send the foll log to
> me directly.
>
> On 06/07/2020 16:29, Manish G wrote:
>
> Job is an infinite streaming one, so it keeps going. Flink configuration
> is as:
>
> metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
> metrics.reporter.slf4j.interval: 30 SECONDS
>
>
>
> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler 
> wrote:
>
>> How long did the job run for, and what is the configured interval?
>>
>>
>> On 06/07/2020 15:51, Manish G wrote:
>>
>> Hi,
>>
>> Thanks for this.
>>
>> I did the configuration as mentioned at the link(changes in
>> flink-conf.yml, copying the jar in lib directory), and registered the Meter
>> with metrics group and invoked markEvent() method in the target code. But I
>> don't see any related logs.
>> I am doing this all on my local computer.
>>
>> Anything else I need to do?
>>
>> With regards
>> Manish
>>
>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler 
>> wrote:
>>
>>> Have you looked at the SLF4J reporter?
>>>
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>
>>> On 06/07/2020 13:49, Manish G wrote:
>>> > Hi,
>>> >
>>> > Is it possible to log Flink metrics in application logs apart from
>>> > publishing it to Prometheus?
>>> >
>>> > With regards
>>>
>>>
>>>
>>
>
>

Re: Logging Flink metrics

You have explicitly configured a reporter list, resulting in the slf4j 
reporter being ignored:

2020-07-06 13:48:22,191 INFO 
org.apache.flink.configuration.GlobalConfiguration    - Loading 
configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO 
org.apache.flink.runtime.metrics.ReporterSetup    - 
Excluding reporter slf4j, not configured in reporter list (prom).

Note that nowadays metrics.reporters is no longer required; the set of 
reporters is automatically determined based on configured properties; 
the only use-case is disabling a reporter without having to remove the 
entire configuration.

I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings from the metric 
groups/registry/reporter.

If you cannot find anything suspicious, you can also send the foll log 
to me directly.

On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink 
configuration is as:

metrics.reporter.slf4j.class: 
org.apache.flink.metrics.slf4j.Slf4jReporter

metrics.reporter.slf4j.interval: 30 SECONDS

On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <mailto:ches...@apache.org>> wrote:

How long did the job run for, and what is the configured interval?

On 06/07/2020 15:51, Manish G wrote:

Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in
flink-conf.yml, copying the jar in lib directory), and
registered the Meter with metrics group and invoked markEvent()
method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs
apart from
> publishing it to Prometheus?
>
> With regards

Re: Logging Flink metrics

Please enable debug logging and search for warnings from the metric 
groups/registry/reporter.

If you cannot find anything suspicious, you can also send the foll log 
to me directly.

On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink 
configuration is as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS

On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <mailto:ches...@apache.org>> wrote:

How long did the job run for, and what is the configured interval?

On 06/07/2020 15:51, Manish G wrote:

Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in
flink-conf.yml, copying the jar in lib directory), and registered
the Meter with metrics group and invoked markEvent() method in
the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs
apart from
> publishing it to Prometheus?
>
> With regards

Re: Logging Flink metrics

Job is an infinite streaming one, so it keeps going. Flink configuration is
as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler  wrote:

> How long did the job run for, and what is the configured interval?
>
>
> On 06/07/2020 15:51, Manish G wrote:
>
> Hi,
>
> Thanks for this.
>
> I did the configuration as mentioned at the link(changes in
> flink-conf.yml, copying the jar in lib directory), and registered the Meter
> with metrics group and invoked markEvent() method in the target code. But I
> don't see any related logs.
> I am doing this all on my local computer.
>
> Anything else I need to do?
>
> With regards
> Manish
>
> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler 
> wrote:
>
>> Have you looked at the SLF4J reporter?
>>
>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>
>> On 06/07/2020 13:49, Manish G wrote:
>> > Hi,
>> >
>> > Is it possible to log Flink metrics in application logs apart from
>> > publishing it to Prometheus?
>> >
>> > With regards
>>
>>
>>
>

Re: Logging Flink metrics

How long did the job run for, and what is the configured interval?

On 06/07/2020 15:51, Manish G wrote:

Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in 
flink-conf.yml, copying the jar in lib directory), and registered the 
Meter with metrics group and invoked markEvent() method in the target 
code. But I don't see any related logs.

I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <mailto:ches...@apache.org>> wrote:

Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards

Re: Logging Flink metrics

Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml,
copying the jar in lib directory), and registered the Meter with metrics
group and invoked markEvent() method in the target code. But I don't see
any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler  wrote:

> Have you looked at the SLF4J reporter?
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>
> On 06/07/2020 13:49, Manish G wrote:
> > Hi,
> >
> > Is it possible to log Flink metrics in application logs apart from
> > publishing it to Prometheus?
> >
> > With regards
>
>
>

Re: Logging Flink metrics


Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:

Hi,

Is it possible to log Flink metrics in application logs apart from 
publishing it to Prometheus?


With regards

Logging Flink metrics