Flink metrics to Prometheus on Kubernetes

2023-11-07 Thread Raihan Sunny via user
Hi,

I have a few Flink jobs running on Kubernetes using the Flink Kubernetes
Operator. By following the documentation [1] I was able to set up
monitoring for the Operator itself. As for the jobs themselves, I'm a bit
confused about how to properly set it up. Here's my FlinkDeployment
configuration:

apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: sample-job
  namespace: flink
spec:
  image: flink:1.17
  flinkVersion: v1_17
  flinkConfiguration:
taskmanager.numberOfTaskSlots: "1"
state.savepoints.dir: file:///flink-data/savepoints
state.checkpoints.dir: file:///flink-data/checkpoints
high-availability.type: kubernetes
high-availability.storageDir: file:///flink-data/ha
metrics.reporter.prom.factory.class:
org.apache.flink.metrics.prometheus.PrometheusReporterFactory
metrics.reporter.prom.port: 9249-9250
  serviceAccount: flink
  jobManager:
resource:
  memory: "1024m"
  cpu: 1
  taskManager:
resource:
  memory: "1024m"
  cpu: 1
  podTemplate:
spec:
  containers:
- name: flink-main-container
  volumeMounts:
  - mountPath: /flink-data
name: flink-volume
  volumes:
  - name: flink-volume
emptyDir: {}
  job:
jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
parallelism: 1
upgradeMode: savepoint
state: running
savepointTriggerNonce: 0

When I exec into the pod, I can curl http://localhost:9249 and I can see
the JobManager metrics. But the TaskManager metrics aren't there and
nothing's running on port 9250. Both the JobManager and TaskManager are
running on the same machine.

There isn't any instruction on how to scrape this so I tried to modify the
PodMonitor config provided for the Operator and run it which didn't work. I
can see the target being registered in the Prometheus dashboard but it
always stays completely blank. Here's the config I used:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: sample-job
  namespace: monitoring
  labels:
release: monitoring
spec:
  selector:
matchLabels:
  app: sample-job
  namespaceSelector:
matchNames:
- flink
  podMetricsEndpoints:
- targetPort: 9249

So, here's what I want to know:
1. What should the appropriate scraping configuration look like?
2. How can I retrieve the TaskManager metrics as well?
3. In the case where I have multiple jobs potentially running on the same
machine, how can I get metrics for all of them?

Any help would be appreciated.

Versions:
Flink: 1.17.1
Flink Kubernetes Operator: 1.5.0

- [1]
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.5/docs/operations/metrics-logging/#how-to-enable-prometheus-example


Thanks,
Sunny

-- 









SELISE Group
Zürich: The Circle 37, 8058 Zürich-Airport, 
Switzerland
Munich: Tal 44, 80331 München, Germany
Dubai: Building 3, 3rd 
Floor, Dubai Design District, Dubai, United Arab Emirates
Dhaka: Midas 
Center, Road 16, Dhanmondi, Dhaka 1209, Bangladesh
Thimphu: Bhutan 
Innovation Tech Center, Babesa, P.O. Box 633, Thimphu, Bhutan

Visit us: 
www.selisegroup.com 




-- 




*Important Note: This e-mail and any attachment are confidential and 
may contain trade secrets and may well also be legally privileged or 
otherwise protected from disclosure. If you have received it in error, you 
are on notice of its status. Please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. If you are 
not the intended recipient please understand that you must not copy this 
e-mail or any attachment or disclose the contents to any other person. 
Thank you for your cooperation.*


回复: flink-metrics如何获取applicationid

2023-09-15 Thread Chen Zhanghao
Hi,

请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 地址来取消订阅来自 
user-zh@flink.apache.org 邮件组的邮件

Best,
Zhanghao Chen

发件人: im huzi 
发送时间: 2023年9月15日 18:14
收件人: user-zh@flink.apache.org 
主题: Re: flink-metrics如何获取applicationid

退订
On Wed, Aug 30, 2023 at 19:14 allanqinjy  wrote:

> hi,
>请教大家一个问题,就是在上报指标到prometheus时候,jobname会随机生成一个后缀,看源码也是new Abstract
> ID(),有方法在这里获取本次上报的作业applicationid吗?


Re: flink-metrics如何获取applicationid

2023-09-15 Thread im huzi
退订
On Wed, Aug 30, 2023 at 19:14 allanqinjy  wrote:

> hi,
>请教大家一个问题,就是在上报指标到prometheus时候,jobname会随机生成一个后缀,看源码也是new Abstract
> ID(),有方法在这里获取本次上报的作业applicationid吗?


回复:flink-metrics如何获取applicationid

2023-09-11 Thread 吴先生
请问好使吗,怎么使用的


| |
吴先生
|
|
15951914...@163.com
|
 回复的原邮件 
| 发件人 | allanqinjy |
| 发送日期 | 2023年8月30日 20:02 |
| 收件人 | user-zh@flink.apache.org |
| 主题 | 回复:flink-metrics如何获取applicationid |
多谢了,明天改一下代码试试
 回复的原邮件 
| 发件人 | Feng Jin |
| 发送日期 | 2023年08月30日 19:42 |
| 收件人 | user-zh |
| 主题 | Re: flink-metrics如何获取applicationid |
hi,

可以尝试获取下 _APP_ID  这个 JVM 环境变量.
System.getenv(YarnConfigKeys.ENV_APP_ID);

https://github.com/apache/flink/blob/6c9bb3716a3a92f3b5326558c6238432c669556d/flink-yarn/src/main/java/org/apache/flink/yarn/YarnConfigKeys.java#L28


Best,
Feng

On Wed, Aug 30, 2023 at 7:14 PM allanqinjy  wrote:

hi,
请教大家一个问题,就是在上报指标到prometheus时候,jobname会随机生成一个后缀,看源码也是new Abstract
ID(),有方法在这里获取本次上报的作业applicationid吗?


回复:flink-metrics如何获取applicationid

2023-08-30 Thread allanqinjy
多谢了,明天改一下代码试试
 回复的原邮件 
| 发件人 | Feng Jin |
| 发送日期 | 2023年08月30日 19:42 |
| 收件人 | user-zh |
| 主题 | Re: flink-metrics如何获取applicationid |
hi,

可以尝试获取下 _APP_ID  这个 JVM 环境变量.
System.getenv(YarnConfigKeys.ENV_APP_ID);

https://github.com/apache/flink/blob/6c9bb3716a3a92f3b5326558c6238432c669556d/flink-yarn/src/main/java/org/apache/flink/yarn/YarnConfigKeys.java#L28


Best,
Feng

On Wed, Aug 30, 2023 at 7:14 PM allanqinjy  wrote:

> hi,
>请教大家一个问题,就是在上报指标到prometheus时候,jobname会随机生成一个后缀,看源码也是new Abstract
> ID(),有方法在这里获取本次上报的作业applicationid吗?


回复:flink-metrics如何获取applicationid

2023-08-30 Thread allanqinjy
多谢,明天修改一下代码试试
 回复的原邮件 
| 发件人 | Feng Jin |
| 发送日期 | 2023年08月30日 19:42 |
| 收件人 | user-zh |
| 主题 | Re: flink-metrics如何获取applicationid |
hi,

可以尝试获取下 _APP_ID  这个 JVM 环境变量.
System.getenv(YarnConfigKeys.ENV_APP_ID);

https://github.com/apache/flink/blob/6c9bb3716a3a92f3b5326558c6238432c669556d/flink-yarn/src/main/java/org/apache/flink/yarn/YarnConfigKeys.java#L28


Best,
Feng

On Wed, Aug 30, 2023 at 7:14 PM allanqinjy  wrote:

> hi,
>请教大家一个问题,就是在上报指标到prometheus时候,jobname会随机生成一个后缀,看源码也是new Abstract
> ID(),有方法在这里获取本次上报的作业applicationid吗?


Re: flink-metrics如何获取applicationid

2023-08-30 Thread Feng Jin
hi,

可以尝试获取下 _APP_ID  这个 JVM 环境变量.
System.getenv(YarnConfigKeys.ENV_APP_ID);

https://github.com/apache/flink/blob/6c9bb3716a3a92f3b5326558c6238432c669556d/flink-yarn/src/main/java/org/apache/flink/yarn/YarnConfigKeys.java#L28


Best,
Feng

On Wed, Aug 30, 2023 at 7:14 PM allanqinjy  wrote:

> hi,
>请教大家一个问题,就是在上报指标到prometheus时候,jobname会随机生成一个后缀,看源码也是new Abstract
> ID(),有方法在这里获取本次上报的作业applicationid吗?


flink-metrics如何获取applicationid

2023-08-30 Thread allanqinjy
hi,
   请教大家一个问题,就是在上报指标到prometheus时候,jobname会随机生成一个后缀,看源码也是new Abstract 
ID(),有方法在这里获取本次上报的作业applicationid吗?

Re: Question about Flink metrics

2023-05-05 Thread Mason Chen
Hi Neha,

For the jobs you care about, you can attach additional labels using
`scope-variables-additional` [1]. The example located in the same page
showcases how you can configure KV pairs in its map configuration. Be sure
to replace the reporter name with the name of your prometheus reporter!

[1]
https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/metric_reporters/#scope-variables-additional

Best,
Mason

On Thu, May 4, 2023 at 11:35 PM neha goyal  wrote:

> Hello,
> I have a question about the Prometheus metrics. I am able to fetch the
> metrics from the following expression.
> sum(flink_jobmanager_job_numRestarts{job_name="$job_name"}) by (job_name)
> Now I am interested in only a few jobs and I want to give them a label.
> How to achieve this? How to give an additional label to Flink Prometheus
> metrics so that I can fetch the metrics only for those jobs having that
> label? This tag I need to set on the job level. Few jobs will have that tag
> and others won't.
>
>
>


Question about Flink metrics

2023-05-05 Thread neha goyal
Hello,
I have a question about the Prometheus metrics. I am able to fetch the
metrics from the following expression.
sum(flink_jobmanager_job_numRestarts{job_name="$job_name"}) by (job_name)
Now I am interested in only a few jobs and I want to give them a label. How
to achieve this? How to give an additional label to Flink Prometheus
metrics so that I can fetch the metrics only for those jobs having that
label? This tag I need to set on the job level. Few jobs will have that tag
and others won't.


Removing labels from Flink metrics

2023-01-08 Thread Surendra Lalwani via user
Hi Team,

Is it possible to remove a few labels from Flink operator scope metrics as
we are noticing that sometimes those labels are too large and hence causing
unnecessary load at our monitoring platform. One such label is
operator_name.

Thanks and Regards ,
Surendra Lalwani

-- 

IMPORTANT NOTICE: This e-mail, including any attachments, may contain 
confidential information and is intended only for the addressee(s) named 
above. If you are not the intended recipient(s), you should not 
disseminate, distribute, or copy this e-mail. Please notify the sender by 
reply e-mail immediately if you have received this e-mail in error and 
permanently delete all copies of the original message from your system. 
E-mail transmission cannot be guaranteed to be secure as it could be 
intercepted, corrupted, lost, destroyed, arrive late or incomplete, or 
contain viruses. Company accepts no liability for any damage or loss of 
confidential information caused by this email or due to any virus 
transmitted by this email or otherwise.


回复:请问flink metrics如何获取任务状态?

2022-11-28 Thread m17610775726_1
hi


你的图片挂了 可以用图床上传一下图片 在这里贴个链接 另外自定义 reportor 把需要的metric 过滤出来上报就行了
 回复的原邮件 
| 发件人 | 陈佳豪 |
| 发送日期 | 2022年11月28日 00:54 |
| 收件人 | user-zh |
| 主题 | 请问flink metrics如何获取任务状态? |
自定义了一个kafka  Metric Reporters #请问如何使用上述指标呢?
 我想通过上报获取任务状态。除了上述指标外如果有其他方案也可以,当前flink 版本是15.2 还望大神指教一番。

Re:回复:请问flink metrics如何获取任务状态?

2022-11-27 Thread 陈佳豪
hi 
不好意思刚刚图好像又挂了 

不知道这个能否查看。










在 2022-11-28 13:50:37,"m17610775726_1"  写道:

hi


你的图片挂了 可以用图床上传一下图片 在这里贴个链接 另外自定义 reportor 把需要的metric 过滤出来上报就行了
 回复的原邮件 
| 发件人 | 陈佳豪 |
| 发送日期 | 2022年11月28日 00:54 |
| 收件人 | user-zh |
| 主题 | 请问flink metrics如何获取任务状态? |
自定义了一个kafka  Metric Reporters #请问如何使用上述指标呢?
 我想通过上报获取任务状态。除了上述指标外如果有其他方案也可以,当前flink 版本是15.2 还望大神指教一番。

Re:回复:请问flink metrics如何获取任务状态?

2022-11-27 Thread 陈佳豪
这个metrics 我获取不到。 不知道要怎么配置才可以获取到。










在 2022-11-28 13:50:37,"m17610775726_1"  写道:

hi


你的图片挂了 可以用图床上传一下图片 在这里贴个链接 另外自定义 reportor 把需要的metric 过滤出来上报就行了
 回复的原邮件 
| 发件人 | 陈佳豪 |
| 发送日期 | 2022年11月28日 00:54 |
| 收件人 | user-zh |
| 主题 | 请问flink metrics如何获取任务状态? |
自定义了一个kafka  Metric Reporters #请问如何使用上述指标呢?
 我想通过上报获取任务状态。除了上述指标外如果有其他方案也可以,当前flink 版本是15.2 还望大神指教一番。

回复:请问flink metrics如何获取任务状态?

2022-11-27 Thread 陈佳豪
有大佬告诉下吗? 这个指标的值获取不到。

| |
陈佳豪
邮箱:jagec...@yeah.net
|
 回复的原邮件 
| 发件人 | 陈佳豪 |
| 发送日期 | 2022年11月28日 00:54 |
| 收件人 | user-zh |
| 主题 | 请问flink metrics如何获取任务状态? |
自定义了一个kafka  Metric Reporters #请问如何使用上述指标呢?
 我想通过上报获取任务状态。除了上述指标外如果有其他方案也可以,当前flink 版本是15.2 还望大神指教一番。



请问flink metrics如何获取任务状态?

2022-11-27 Thread 陈佳豪
自定义了一个kafka  Metric Reporters #请问如何使用上述指标呢?
 我想通过上报获取任务状态。除了上述指标外如果有其他方案也可以,当前flink 版本是15.2 还望大神指教一番。

Flink metrics flattened after Job restart

2022-05-25 Thread Sahil Aulakh
Hi Flink Community

We are using Flink version 1.13.5 for our application and every time the
job restarts, Flink Job metrics are flattened following the restart.
For e.g. we are using lastCheckpointDuration and on 05/05 our job restarted
and at the same time the checkpoint duration metric flattened. Is it a
known issue? If there is any workaround, please let me know.

Thanks
Sahil Aulakh


Re: [flink-yarn]&[flink-metrics]&[influxdb] Yarn session模式下提交多个Job,只有首次提交的Job有Metrics数据上报

2022-04-14 Thread huweihua
感觉像是 二次改造的问题。可以关注几个点:
1. 指标 tag 里的 job_id 是怎么带上的,是否可能多个作业相互覆盖或者只有第一个生效的场景
2. 可以在自动修改的代码里增加更多的日志,进一步定位, 例如:在 notifyOfAddedMetric() 时打印注册了哪些 metric

> 2022年4月14日 下午6:41,QiZhu Chan  写道:
> 
> 
> 
> 
>
> 需要说明的是,1、InfluxdbReporter是经过二次改造的,改造后所有指标的tag均会带上job_id,方便以通过job_id查找到所有指标。2、在per-job场景下,没有这个问题,因为per-job作业拥有各自的Jobmanager。
> 
> 
>Flink版本:1.13.3Metrics库 : Influxdb
> 
>希望有懂的大佬能解答一下,谢谢!



[flink-yarn]&[flink-metrics]&[influxdb] Yarn session模式下提交多个Job,只有首次提交的Job有Metrics数据上报

2022-04-14 Thread QiZhu Chan
Hi,
在做Flink Metrics监控的工作过程中,有发现一个问题,Flink on yarn下,使用yarn session模式提交多个Flink 
Job,只有首次提交的Job,才能正常上报Metrics;后续提交的Job,Metrics均不上报,请问是什么原因?



需要说明的是,1、InfluxdbReporter是经过二次改造的,改造后所有指标的tag均会带上job_id,方便以通过job_id查找到所有指标。2、在per-job场景下,没有这个问题,因为per-job作业拥有各自的Jobmanager。


Flink版本:1.13.3Metrics库 : Influxdb

希望有懂的大佬能解答一下,谢谢!



Re: Flink metrics via permethous or opentelemerty

2022-02-24 Thread Nicolaus Weidner
Hi Sigalit,

first of all, have you read the docs page on metrics [1], and in particular
the Prometheus section on metrics reporters [2]?
Apart from that, there is also a (somewhat older) blog post about
integrating Flink with Prometheus, including a link to a repo with example
code [3].

Hope that helps to get you started!
Best,
Nico

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/metrics/#metrics
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/metric_reporters/#prometheus
[3] https://flink.apache.org/features/2019/03/11/prometheus-monitoring.html

On Wed, Feb 23, 2022 at 8:42 AM Sigalit Eliazov  wrote:

> Hello. I am looking for a way to expose flink metrics via opentelemerty to
> the gcp could monitoring dashboard.
> Does anyone has experience with that?
>
> If it is not directly possible we thought about using permethous as a
> middlewere.  If you have experience with that i would appreciate any
> guidance.
>
> Thanks
>


Flink metrics via permethous or opentelemerty

2022-02-22 Thread Sigalit Eliazov
Hello. I am looking for a way to expose flink metrics via opentelemerty to
the gcp could monitoring dashboard.
Does anyone has experience with that?

If it is not directly possible we thought about using permethous as a
middlewere.  If you have experience with that i would appreciate any
guidance.

Thanks


Re: regarding flink metrics

2022-02-01 Thread Chesnay Schepler
Your best bet is to create a custom reporter that does this calculation. 
You could either wrap the reporter, subclass is, or fork it.
In any case, 
https://github.com/apache/flink/tree/master/flink-metrics/flink-metrics-datadog 
should be a good starting point.


On 01/02/2022 13:26, Jessy Ping wrote:

Hi Team,

We are using datadog and its http reporter( packaged in flink image) 
for sending metrics from flink application. We do have a requirement 
for setting tags with values calculated at runtime for the custom 
metrics emitted from Flink. Currently, it is impossible to assign tags 
at runtime. Is there a work arround for the same ?


Thanks
Jessy





regarding flink metrics

2022-02-01 Thread Jessy Ping
Hi Team,

We are using datadog and its http reporter( packaged in flink image) for
sending metrics from flink application. We do have a requirement for
setting tags with values calculated at runtime for the custom metrics
emitted from Flink. Currently, it is impossible to assign tags at runtime.
Is there a work arround for the same ?

Thanks
Jessy


Re: Flink Metrics Naming

2021-06-01 Thread Chesnay Schepler

Some more background on MetricGroups:
Internally there (mostly) 3 types of metric groups:
On the one hand we have the ComponentMetricGroups (like 
TaskManagerMetricGroup) that describe a high-level Flink entity, which 
just add a constant expression to the logical scope(like taskmanager, 
task etc.). These exist to support scope formats (although this 
should've been implemented differently, but that's a another story).


On the other hand we have groups created via addGroup(String), which are 
added to the logical scope as is; this is sometimes good(e.g., 
addGroup("KafkaConsumer"), and sometimes isn't (e.g., 
addGroup().
Finally, there is a addGroup(String, String) variant, which behaves like 
a key-value pair (and similarly to the ComponentMetricGroup). The key 
part is added to the logical scope, and a label is usually added as well.


Due to historical reasons some parts in Flink use addGroup(String) 
despite the key-value pair variant being more appropriate; the latter 
was only added later, as was the logical scope as a whole for that matter.


With that said, the logical scope and labels suffer a bit due to being 
retrofitted on an existing design and some early mistakes in the metric 
structuring.
Ideally (imo), things would work like this (*bold *parts signify changes 
to the current behavior):
- addGroup(String) is *sparsely used* and only for high-level 
hierarchies (job, operator, source, kafka). It is added as is to the 
logical scope, creates no label, and is *excluded from the metric 
identifier*.
- addGroup(String, String) has *no effect on the logical scope*, creates 
a label, and is added as . to the metric identifier.


The core issue with these kind of changes however is backwards 
compatibility. We would have to do a sweep over the code-base to migrate 
inappropriate usages of addGroup(String) to the key-pair variant, 
probably remove some unnecessary groups (e.g., "Status" that is used for 
CPU metrics and whatnot) and finally make changes to the metric system 
internals, all of which need a codepath that retain the current behavior.


Simply put, for immediate needs I would probably encourage you do create 
a modified PrometheusReporter which determines the logical scope as you 
see fit; it could just ignore the logical scope entirely (although I'm 
not sure how well prometheus handles 1 metric having multiple instances 
with different label sets (e.g., numRecordsIn for operators/tasks), or 
exclude user-defined groups with something hacky like only using the 
first 4 parts of the logical scope.


On 6/1/2021 4:56 PM, Mason Chen wrote:
Upon further inspection, it seems like the user scope is not universal 
(i.e. comes through the connectors and not UDFs (like rich map 
function)), but the question still stands if the process makes sense.


On Jun 1, 2021, at 10:38 AM, Mason Chen > wrote:


Makes sense. We are primarily concerned with removing the metric 
labels from the names as the user metrics get too long. i.e. the 
groups from `addGroup` are concatenated in the metric name.


Do you think there would be any issues with removing the group 
information in the metric name and putting them into a label instead? 
In seems like most metrics internally, don’t use `addGroup` to create 
group information but rather by creating another subclass of metric 
group.


Perhaps, I should ONLY apply this custom logic to metrics with the 
“user” scope? Other scoped metrics (e.g. operator, task operator, 
etc.) shouldn’t have these group names in the metric names in my 
experience...


An example just for clarity, 
flink__group1_group2_metricName{group1=…, group2=…, 
flink tags}


=>
flink__metricName{group_info=group1_group2, group1=…, 
group2=…, flink tags}


On Jun 1, 2021, at 9:57 AM, Chesnay Schepler > wrote:


The uniqueness of metrics and the naming of the Prometheus reporter 
are somewhat related but also somewhat orthogonal.


Prometheus works similar to JMX in that the metric name (e.g., 
taskmanager.job.task.operator.numRecordsIn) is more or less a 
_class_ of metrics, with tags/labels allowing you to select a 
specific instance of that metric.


Restricting metric names to 1 level of the hierarchy would present a 
few issues:
a) Effectively, all metric names that Flink uses effectively become 
reserved keywords that users must not use, which will lead to 
headaches when adding more metrics or forwarding metrics from 
libraries (e.g., kafka), because we could always break existing 
user-defined metrics.
b) You'd need a cluster-wide lookup that is aware of all hierarchies 
to ensure consistency across all processes.


In the end, there are significantly easier ways to solve the issue 
of the metric name being too long, i.e., give the user more control 
over the logical scope (taskmanager.job.task.operator), be it 
shortening the names (t.j.t.o), limiting the depth (e.g, 
operator.numRecordsIn), removing it outright (but I'd prefer 

Re: Flink Metrics Naming

2021-06-01 Thread Mason Chen
Upon further inspection, it seems like the user scope is not universal (i.e. 
comes through the connectors and not UDFs (like rich map function)), but the 
question still stands if the process makes sense.

> On Jun 1, 2021, at 10:38 AM, Mason Chen  wrote:
> 
> Makes sense. We are primarily concerned with removing the metric labels from 
> the names as the user metrics get too long. i.e. the groups from `addGroup` 
> are concatenated in the metric name.
> 
> Do you think there would be any issues with removing the group information in 
> the metric name and putting them into a label instead? In seems like most 
> metrics internally, don’t use `addGroup` to create group information but 
> rather by creating another subclass of metric group.
> 
> Perhaps, I should ONLY apply this custom logic to metrics with the “user” 
> scope? Other scoped metrics (e.g. operator, task operator, etc.) shouldn’t 
> have these group names in the metric names in my experience...
> 
> An example just for clarity, 
> flink__group1_group2_metricName{group1=…, group2=…, flink tags}
> 
> =>
> 
> flink__metricName{group_info=group1_group2, group1=…, group2=…, 
> flink tags}
> 
>> On Jun 1, 2021, at 9:57 AM, Chesnay Schepler > > wrote:
>> 
>> The uniqueness of metrics and the naming of the Prometheus reporter are 
>> somewhat related but also somewhat orthogonal.
>> 
>> Prometheus works similar to JMX in that the metric name (e.g., 
>> taskmanager.job.task.operator.numRecordsIn) is more or less a _class_ of 
>> metrics, with tags/labels allowing you to select a specific instance of that 
>> metric.
>> 
>> Restricting metric names to 1 level of the hierarchy would present a few 
>> issues:
>> a) Effectively, all metric names that Flink uses effectively become reserved 
>> keywords that users must not use, which will lead to headaches when adding 
>> more metrics or forwarding metrics from libraries (e.g., kafka), because we 
>> could always break existing user-defined metrics.
>> b) You'd need a cluster-wide lookup that is aware of all hierarchies to 
>> ensure consistency across all processes.
>> 
>> In the end, there are significantly easier ways to solve the issue of the 
>> metric name being too long, i.e., give the user more control over the 
>> logical scope (taskmanager.job.task.operator), be it shortening the names 
>> (t.j.t.o), limiting the depth (e.g, operator.numRecordsIn), removing it 
>> outright (but I'd prefer some context to be present for clarity) or 
>> supporting something similar to scope formats.
>> I'm reasonably certain there are some tickets already in this direction, we 
>> just don't get around to doing them because for the most part the metric 
>> system works good enough and there are bigger fish to fry.
>> 
>> On 6/1/2021 3:39 PM, Till Rohrmann wrote:
>>> Hi Mason,
>>> 
>>> The idea is that a metric is not uniquely identified by its name alone but 
>>> instead by its path. The groups in which it is defined specify this path 
>>> (similar to directories). That's why it is valid to specify two metrics 
>>> with the same name if they reside in different groups.
>>> 
>>> I think Prometheus does not support such a tree structure and that's why 
>>> the path is exposed via labels if I am not mistaken. So long story short, 
>>> what you are seeing is a combination of how Flink organizes metrics and 
>>> what can be reported to Prometheus. 
>>> 
>>> I am also pulling in Chesnay who is more familiar with this part of the 
>>> code.
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On Fri, May 28, 2021 at 7:33 PM Mason Chen >> > wrote:
>>> Can anyone give insight as to why Flink allows 2 metrics with the same 
>>> “name”?
>>> 
>>> For example,
>>> 
>>> getRuntimeContext.addGroup(“group”, “group1”).counter(“myMetricName”);
>>> 
>>> And
>>> 
>>> getRuntimeContext.addGroup(“other_group”, 
>>> “other_group1”).counter(“myMetricName”);
>>> 
>>> Are totally valid.
>>> 
>>> 
>>> It seems that it has lead to some not-so-great implementations—the 
>>> prometheus reporter and attaching the labels to the metric name, making the 
>>> name quite verbose.
>>> 
>>> 
>> 
> 



Re: Flink Metrics Naming

2021-06-01 Thread Mason Chen
Makes sense. We are primarily concerned with removing the metric labels from 
the names as the user metrics get too long. i.e. the groups from `addGroup` are 
concatenated in the metric name.

Do you think there would be any issues with removing the group information in 
the metric name and putting them into a label instead? In seems like most 
metrics internally, don’t use `addGroup` to create group information but rather 
by creating another subclass of metric group.

Perhaps, I should ONLY apply this custom logic to metrics with the “user” 
scope? Other scoped metrics (e.g. operator, task operator, etc.) shouldn’t have 
these group names in the metric names in my experience...

An example just for clarity, 
flink__group1_group2_metricName{group1=…, group2=…, flink tags}

=>

flink__metricName{group_info=group1_group2, group1=…, group2=…, 
flink tags}

> On Jun 1, 2021, at 9:57 AM, Chesnay Schepler  wrote:
> 
> The uniqueness of metrics and the naming of the Prometheus reporter are 
> somewhat related but also somewhat orthogonal.
> 
> Prometheus works similar to JMX in that the metric name (e.g., 
> taskmanager.job.task.operator.numRecordsIn) is more or less a _class_ of 
> metrics, with tags/labels allowing you to select a specific instance of that 
> metric.
> 
> Restricting metric names to 1 level of the hierarchy would present a few 
> issues:
> a) Effectively, all metric names that Flink uses effectively become reserved 
> keywords that users must not use, which will lead to headaches when adding 
> more metrics or forwarding metrics from libraries (e.g., kafka), because we 
> could always break existing user-defined metrics.
> b) You'd need a cluster-wide lookup that is aware of all hierarchies to 
> ensure consistency across all processes.
> 
> In the end, there are significantly easier ways to solve the issue of the 
> metric name being too long, i.e., give the user more control over the logical 
> scope (taskmanager.job.task.operator), be it shortening the names (t.j.t.o), 
> limiting the depth (e.g, operator.numRecordsIn), removing it outright (but 
> I'd prefer some context to be present for clarity) or supporting something 
> similar to scope formats.
> I'm reasonably certain there are some tickets already in this direction, we 
> just don't get around to doing them because for the most part the metric 
> system works good enough and there are bigger fish to fry.
> 
> On 6/1/2021 3:39 PM, Till Rohrmann wrote:
>> Hi Mason,
>> 
>> The idea is that a metric is not uniquely identified by its name alone but 
>> instead by its path. The groups in which it is defined specify this path 
>> (similar to directories). That's why it is valid to specify two metrics with 
>> the same name if they reside in different groups.
>> 
>> I think Prometheus does not support such a tree structure and that's why the 
>> path is exposed via labels if I am not mistaken. So long story short, what 
>> you are seeing is a combination of how Flink organizes metrics and what can 
>> be reported to Prometheus. 
>> 
>> I am also pulling in Chesnay who is more familiar with this part of the code.
>> 
>> Cheers,
>> Till
>> 
>> On Fri, May 28, 2021 at 7:33 PM Mason Chen > > wrote:
>> Can anyone give insight as to why Flink allows 2 metrics with the same 
>> “name”?
>> 
>> For example,
>> 
>> getRuntimeContext.addGroup(“group”, “group1”).counter(“myMetricName”);
>> 
>> And
>> 
>> getRuntimeContext.addGroup(“other_group”, 
>> “other_group1”).counter(“myMetricName”);
>> 
>> Are totally valid.
>> 
>> 
>> It seems that it has lead to some not-so-great implementations—the 
>> prometheus reporter and attaching the labels to the metric name, making the 
>> name quite verbose.
>> 
>> 
> 



Re: Flink Metrics Naming

2021-06-01 Thread Chesnay Schepler
The uniqueness of metrics and the naming of the Prometheus reporter are 
somewhat related but also somewhat orthogonal.


Prometheus works similar to JMX in that the metric name (e.g., 
taskmanager.job.task.operator.numRecordsIn) is more or less a _class_ of 
metrics, with tags/labels allowing you to select a specific instance of 
that metric.


Restricting metric names to 1 level of the hierarchy would present a few 
issues:
a) Effectively, all metric names that Flink uses effectively become 
reserved keywords that users must not use, which will lead to headaches 
when adding more metrics or forwarding metrics from libraries (e.g., 
kafka), because we could always break existing user-defined metrics.
b) You'd need a cluster-wide lookup that is aware of all hierarchies to 
ensure consistency across all processes.


In the end, there are significantly easier ways to solve the issue of 
the metric name being too long, i.e., give the user more control over 
the logical scope (taskmanager.job.task.operator), be it shortening the 
names (t.j.t.o), limiting the depth (e.g, operator.numRecordsIn), 
removing it outright (but I'd prefer some context to be present for 
clarity) or supporting something similar to scope formats.
I'm reasonably certain there are some tickets already in this direction, 
we just don't get around to doing them because for the most part the 
metric system works good enough and there are bigger fish to fry.


On 6/1/2021 3:39 PM, Till Rohrmann wrote:

Hi Mason,

The idea is that a metric is not uniquely identified by its name alone 
but instead by its path. The groups in which it is defined specify 
this path (similar to directories). That's why it is valid to specify 
two metrics with the same name if they reside in different groups.


I think Prometheus does not support such a tree structure and that's 
why the path is exposed via labels if I am not mistaken. So long story 
short, what you are seeing is a combination of how Flink organizes 
metrics and what can be reported to Prometheus.


I am also pulling in Chesnay who is more familiar with this part of 
the code.


Cheers,
Till

On Fri, May 28, 2021 at 7:33 PM Mason Chen > wrote:


Can anyone give insight as to why Flink allows 2 metrics with the
same “name”?

For example,

getRuntimeContext.addGroup(“group”, “group1”).counter(“myMetricName”);

And

getRuntimeContext.addGroup(“other_group”,
“other_group1”).counter(“myMetricName”);

Are totally valid.


It seems that it has lead to some not-so-great implementations—the
prometheus reporter and attaching the labels to the metric name,
making the name quite verbose.






Re: Flink Metrics Naming

2021-06-01 Thread Till Rohrmann
Hi Mason,

The idea is that a metric is not uniquely identified by its name alone but
instead by its path. The groups in which it is defined specify this path
(similar to directories). That's why it is valid to specify two metrics
with the same name if they reside in different groups.

I think Prometheus does not support such a tree structure and that's why
the path is exposed via labels if I am not mistaken. So long story short,
what you are seeing is a combination of how Flink organizes metrics and
what can be reported to Prometheus.

I am also pulling in Chesnay who is more familiar with this part of the
code.

Cheers,
Till

On Fri, May 28, 2021 at 7:33 PM Mason Chen  wrote:

> Can anyone give insight as to why Flink allows 2 metrics with the same
> “name”?
>
> For example,
>
> getRuntimeContext.addGroup(“group”, “group1”).counter(“myMetricName”);
>
> And
>
> getRuntimeContext.addGroup(“other_group”,
> “other_group1”).counter(“myMetricName”);
>
> Are totally valid.
>
>
> It seems that it has lead to some not-so-great implementations—the
> prometheus reporter and attaching the labels to the metric name, making the
> name quite verbose.
>
>
>


Flink Metrics Naming

2021-05-28 Thread Mason Chen
Can anyone give insight as to why Flink allows 2 metrics with the same “name”?

For example,

getRuntimeContext.addGroup(“group”, “group1”).counter(“myMetricName”);

And

getRuntimeContext.addGroup(“other_group”, 
“other_group1”).counter(“myMetricName”);

Are totally valid.


It seems that it has lead to some not-so-great implementations—the prometheus 
reporter and attaching the labels to the metric name, making the name quite 
verbose.




Re: Flink Metrics emitted from a Kubernetes Application Cluster

2021-04-09 Thread Chesnay Schepler

This is currently not possible. See also FLINK-8358

On 4/9/2021 4:47 AM, Claude M wrote:

Hello,

I've setup Flink as an Application Cluster in Kubernetes. Now I'm 
looking into monitoring the Flink cluster in Datadog. This is what is 
configured in the flink-conf.yaml to emit metrics:


metrics.scope.jm : flink.jobmanager
metrics.scope.jm.job: flink.jobmanager.job
metrics.scope.tm : flink.taskmanager
metrics.scope.tm.job: flink.taskmanager.job
metrics.scope.task: flink.task
metrics.scope.operator: flink.operator
metrics.reporter.dghttp.class: 
org.apache.flink.metrics.datadog.DatadogHttpReporter

metrics.reporter.dghttp.apikey: {{ datadog_api_key }}
metrics.reporter.dghttp.tags: environment: {{ environment }}

When it gets to Datadog though, the metrics for the flink.jobmanager 
and flink.taskmanager is filtered by the host which is the Pod IP.  
However, I would like it to use the pod name.  How can this be 
accomplished?



Thanks





Flink Metrics emitted from a Kubernetes Application Cluster

2021-04-08 Thread Claude M
Hello,

I've setup Flink as an Application Cluster in Kubernetes.  Now I'm looking
into monitoring the Flink cluster in Datadog.  This is what is configured
in the flink-conf.yaml to emit metrics:

metrics.scope.jm: flink.jobmanager
metrics.scope.jm.job: flink.jobmanager.job
metrics.scope.tm: flink.taskmanager
metrics.scope.tm.job: flink.taskmanager.job
metrics.scope.task: flink.task
metrics.scope.operator: flink.operator
metrics.reporter.dghttp.class:
org.apache.flink.metrics.datadog.DatadogHttpReporter
metrics.reporter.dghttp.apikey: {{ datadog_api_key }}
metrics.reporter.dghttp.tags: environment: {{ environment }}

When it gets to Datadog though, the metrics for the flink.jobmanager and
flink.taskmanager is filtered by the host which is the Pod IP.  However, I
would like it to use the pod name.  How can this be accomplished?


Thanks


Re: Flink Metrics

2021-03-03 Thread Piotr Nowojski
Hi,

1)
Do you want to output those metrics as Flink metrics? Or output those
"metrics"/counters as values to some external system (like Kafka)? The
problem discussed in [1], was that the metrics (Counters) were not fitting
in memory, so David suggested to hold them on Flink's state and treat the
measured values as regular output of the job.

The former option you can think of if you had a single operator, that
consumes your CDCs outputs something (filtered CDCs? processed CDCs?) to
Kafka, while keeping some metrics that you can access via Flink metrics
system. The latter would be the same operator, but instead of single output
it would have multiple outputs, writing the "counters" also for example to
Kafka (or any other system of your choice). Both options are viable, each
has its own pros and cons.

2) You need to persist your metrics somewhere. Why don't you use Flink's
state for that purpose? Upon recovery/initialisation, you can get the
recovered value from state and update/set metric value to that recovered
value.

3) That seems to be a question a bit unrelated to Flink. Try searching
online how to calculate percentiles. I haven't thought about it, but
histograms or sorting all of the values seems to be the options. Probably
best if you would use some existing library to do that for you.

4) Could you rephrase your question?

Best,
Piotrek

niedz., 28 lut 2021 o 14:53 Prasanna kumar 
napisał(a):

> Hi flinksters,
>
> Scenario: We have cdc messages from our rdbms(various tables) flowing to
> Kafka.  Our flink job reads the CDC messages and creates events based on
> certain rules.
>
> I am using Prometheus  and grafana.
>
> Following are there metrics that i need to calculate
>
> A) Number of CDC messages wrt to each table.
> B) Number of events created wrt to each event type.
> C) Average/P99/P95 Latency (event created ts - ccd operation ts)
>
> For A and B, I created counters and able to see the metrices flowing into
> Prometheus . Few questions I have here.
>
> 1) How to create labels for counters in flink ? I did not find any easier
> method to do it . Right now I see that I need to create counters for each
> type of table and events . I referred to one of the community discussions.
> [1] . Is there any way apart from this ?
>
> 2) When the job gets restarted , the counters get back to 0 . How to
> prevent that and to get continuity.
>
> For C , I calculated latency in code for each event and assigned  it to
> histogram.  Few questions I have here.
>
> 3) I read in a few blogs [2] that histogram is the best way to get
> latencies. Is there any better idea?
>
> 4) How to create buckets for various ranges? I also read in a community
> email that flink implements  histogram as summaries.  I also should be able
> to see the latencies across timelines .
>
> [1]
> https://stackoverflow.com/questions/58456830/how-to-use-multiple-counters-in-flink
> [2] https://povilasv.me/prometheus-tracking-request-duration/
>
> Thanks,
> Prasanna.
>


Flink Metrics

2021-02-28 Thread Prasanna kumar
Hi flinksters,

Scenario: We have cdc messages from our rdbms(various tables) flowing to
Kafka.  Our flink job reads the CDC messages and creates events based on
certain rules.

I am using Prometheus  and grafana.

Following are there metrics that i need to calculate

A) Number of CDC messages wrt to each table.
B) Number of events created wrt to each event type.
C) Average/P99/P95 Latency (event created ts - ccd operation ts)

For A and B, I created counters and able to see the metrices flowing into
Prometheus . Few questions I have here.

1) How to create labels for counters in flink ? I did not find any easier
method to do it . Right now I see that I need to create counters for each
type of table and events . I referred to one of the community discussions.
[1] . Is there any way apart from this ?

2) When the job gets restarted , the counters get back to 0 . How to
prevent that and to get continuity.

For C , I calculated latency in code for each event and assigned  it to
histogram.  Few questions I have here.

3) I read in a few blogs [2] that histogram is the best way to get
latencies. Is there any better idea?

4) How to create buckets for various ranges? I also read in a community
email that flink implements  histogram as summaries.  I also should be able
to see the latencies across timelines .

[1]
https://stackoverflow.com/questions/58456830/how-to-use-multiple-counters-in-flink
[2] https://povilasv.me/prometheus-tracking-request-duration/

Thanks,
Prasanna.


Re: Tag flink metrics to job name

2021-02-19 Thread Chesnay Schepler

hmm...in a roundabout way this could be possible I suppose.

For a given job, search through your metrics for some job metric (like 
numRestarts on the JM, or any task metric for TMs), and from that you 
should be able to infer the JM/TM that belongs to that (based on the TM 
ID / host information in the metric).
Conversely, when you see high cpu usage in one of the metrics for a 
JM/TM, search for a job metric for that same process.


On 2/19/2021 9:14 AM, bat man wrote:
Is there a way I can look into say for a specific job what’s the cpu 
usage or memory usage of the yarn containers when multiple jobs are 
running on the same cluster.
Also, the issue am trying to resolve is I’m seeing high memory usage 
for one of the containers I want isolate the issue with one job and 
then investigate further.


Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:18 PM, Chesnay Schepler > wrote:


No, Job-/TaskManager metrics cannot be tagged with the job name.
The reason is that this only makes sense for application clusters
(opposed to session clusters), but we don't differentiate between
the two when it comes to metrics.

On 2/19/2021 3:59 AM, bat man wrote:

I meant the Flink jobname. I’m using the below reporter -
||
|metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter|
Is there any way to tag job names to the task and job manager
metrics.

Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler
mailto:ches...@apache.org>> wrote:

When you mean "job_name", are you referring to the Prometheus
concept of
jobs, of the one of Flink?

Which of Flink prometheus reporters are you using?

On 2/17/2021 7:37 PM, bat man wrote:
> Hello there,
>
> I am using prometheus to push metrics to prometheus and
then use
> grafana for visualization. There are metrics like
>
- flink_taskmanager_Status_JVM_CPU_Load, 
flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time

> etc which do not gives job_name. It is tied to an instance.
> When running multiple jobs in the same yarn cluster it is
possible
> that different jobs have yarn containers on the same
instance, in this
> case it is very difficult to find out which instance has
high CPU
> load, Memory usage etc.
>
> Is there a way to tag job_name to these metrics so that the
metrics
> could be visualized per job.
>
> Thanks,
> Hemant








Re: Tag flink metrics to job name

2021-02-19 Thread bat man
Is there a way I can look into say for a specific job what’s the cpu usage
or memory usage of the yarn containers when multiple jobs are running on
the same cluster.
Also, the issue am trying to resolve is I’m seeing high memory usage for
one of the containers I want isolate the issue with one job and then
investigate further.

Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:18 PM, Chesnay Schepler 
wrote:

> No, Job-/TaskManager metrics cannot be tagged with the job name.
> The reason is that this only makes sense for application clusters (opposed
> to session clusters), but we don't differentiate between the two when it
> comes to metrics.
>
> On 2/19/2021 3:59 AM, bat man wrote:
>
> I meant the Flink jobname. I’m using the below reporter -
>
>  metrics.reporter.prom.class: 
> org.apache.flink.metrics.prometheus.PrometheusReporter
>
> Is there any way to tag job names to the task and job manager metrics.
>
> Thanks,
> Hemant
>
> On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler 
> wrote:
>
>> When you mean "job_name", are you referring to the Prometheus concept of
>> jobs, of the one of Flink?
>>
>> Which of Flink prometheus reporters are you using?
>>
>> On 2/17/2021 7:37 PM, bat man wrote:
>> > Hello there,
>> >
>> > I am using prometheus to push metrics to prometheus and then use
>> > grafana for visualization. There are metrics like
>> >
>> - flink_taskmanager_Status_JVM_CPU_Load, 
>> flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time
>>
>> > etc which do not gives job_name. It is tied to an instance.
>> > When running multiple jobs in the same yarn cluster it is possible
>> > that different jobs have yarn containers on the same instance, in this
>> > case it is very difficult to find out which instance has high CPU
>> > load, Memory usage etc.
>> >
>> > Is there a way to tag job_name to these metrics so that the metrics
>> > could be visualized per job.
>> >
>> > Thanks,
>> > Hemant
>>
>>
>>
>


Re: Tag flink metrics to job name

2021-02-18 Thread Chesnay Schepler

No, Job-/TaskManager metrics cannot be tagged with the job name.
The reason is that this only makes sense for application clusters 
(opposed to session clusters), but we don't differentiate between the 
two when it comes to metrics.


On 2/19/2021 3:59 AM, bat man wrote:

I meant the Flink jobname. I’m using the below reporter -
||
|metrics.reporter.prom.class: 
org.apache.flink.metrics.prometheus.PrometheusReporter|

Is there any way to tag job names to the task and job manager metrics.

Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler > wrote:


When you mean "job_name", are you referring to the Prometheus
concept of
jobs, of the one of Flink?

Which of Flink prometheus reporters are you using?

On 2/17/2021 7:37 PM, bat man wrote:
> Hello there,
>
> I am using prometheus to push metrics to prometheus and then use
> grafana for visualization. There are metrics like
>
- flink_taskmanager_Status_JVM_CPU_Load, 
flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time

> etc which do not gives job_name. It is tied to an instance.
> When running multiple jobs in the same yarn cluster it is possible
> that different jobs have yarn containers on the same instance,
in this
> case it is very difficult to find out which instance has high CPU
> load, Memory usage etc.
>
> Is there a way to tag job_name to these metrics so that the metrics
> could be visualized per job.
>
> Thanks,
> Hemant






Re: Tag flink metrics to job name

2021-02-18 Thread bat man
I meant the Flink jobname. I’m using the below reporter -


metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter

Is there any way to tag job names to the task and job manager metrics.

Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler 
wrote:

> When you mean "job_name", are you referring to the Prometheus concept of
> jobs, of the one of Flink?
>
> Which of Flink prometheus reporters are you using?
>
> On 2/17/2021 7:37 PM, bat man wrote:
> > Hello there,
> >
> > I am using prometheus to push metrics to prometheus and then use
> > grafana for visualization. There are metrics like
> >
> - flink_taskmanager_Status_JVM_CPU_Load, 
> flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time
>
> > etc which do not gives job_name. It is tied to an instance.
> > When running multiple jobs in the same yarn cluster it is possible
> > that different jobs have yarn containers on the same instance, in this
> > case it is very difficult to find out which instance has high CPU
> > load, Memory usage etc.
> >
> > Is there a way to tag job_name to these metrics so that the metrics
> > could be visualized per job.
> >
> > Thanks,
> > Hemant
>
>
>


Re: Tag flink metrics to job name

2021-02-18 Thread Chesnay Schepler
When you mean "job_name", are you referring to the Prometheus concept of 
jobs, of the one of Flink?


Which of Flink prometheus reporters are you using?

On 2/17/2021 7:37 PM, bat man wrote:

Hello there,

I am using prometheus to push metrics to prometheus and then use 
grafana for visualization. There are metrics like 
- flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time 
etc which do not gives job_name. It is tied to an instance.
When running multiple jobs in the same yarn cluster it is possible 
that different jobs have yarn containers on the same instance, in this 
case it is very difficult to find out which instance has high CPU 
load, Memory usage etc.


Is there a way to tag job_name to these metrics so that the metrics 
could be visualized per job.


Thanks,
Hemant





Tag flink metrics to job name

2021-02-17 Thread bat man
Hello there,

I am using prometheus to push metrics to prometheus and then use grafana
for visualization. There are metrics like
- flink_taskmanager_Status_JVM_CPU_Load,
flink_taskmanager_Status_JVM_CPU_Load,
flink_taskmanager_Status_JVM_CPU_Time
etc which do not gives job_name. It is tied to an instance.
When running multiple jobs in the same yarn cluster it is possible that
different jobs have yarn containers on the same instance, in this case it
is very difficult to find out which instance has high CPU load, Memory
usage etc.

Is there a way to tag job_name to these metrics so that the metrics could
be visualized per job.

Thanks,
Hemant


Re: Default Flink Metrics Graphite

2020-09-03 Thread Till Rohrmann
ime.metrics.groups.OperatorMetricGroup.(OperatorMetricGroup.java:48)
>>>>> at
>>>>> org.apache.flink.runtime.metrics.groups.TaskMetricGroup.lambda$getOrAddOperator$0(TaskMetricGroup.java:154)
>>>>> at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
>>>>> at
>>>>> org.apache.flink.runtime.metrics.groups.TaskMetricGroup.getOrAddOperator(TaskMetricGroup.java:154)
>>>>> at
>>>>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.setup(AbstractStreamOperator.java:180)
>>>>> at
>>>>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
>>>>> at
>>>>> org.apache.flink.streaming.api.operators.SimpleOperatorFactory.createStreamOperator(SimpleOperatorFactory.java:75)
>>>>> at
>>>>> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:48)
>>>>> at
>>>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:429)
>>>>> at
>>>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:353)
>>>>> at
>>>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:144)
>>>>> at
>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:433)
>>>>> at
>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
>>>>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
>>>>> at java.lang.Thread.run(Thread.java:748)
>>>>> Regards,
>>>>> Vijay
>>>>>
>>>>>
>>>>> On Wed, Aug 26, 2020 at 7:53 AM Chesnay Schepler 
>>>>> wrote:
>>>>>
>>>>>> metrics.reporter.grph.class:
>>>>>> org.apache.flink.metrics.graphite.GraphiteReporter
>>>>>>
>>>>>>
>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter
>>>>>>
>>>>>> On 26/08/2020 16:40, Vijayendra Yadav wrote:
>>>>>>
>>>>>> Hi Dawid,
>>>>>>
>>>>>> I have 1.10.0 version of flink. What is alternative for this version ?
>>>>>>
>>>>>> Regards,
>>>>>> Vijay
>>>>>>
>>>>>>
>>>>>> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz
>>>>>>   wrote:
>>>>>>
>>>>>> 
>>>>>>
>>>>>> Hi Vijay,
>>>>>>
>>>>>> I think the problem might be that you are using a wrong version of
>>>>>> the reporter.
>>>>>>
>>>>>> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a
>>>>>> plugin, but it was migrated to plugins in 1.11 only[1].
>>>>>>
>>>>>> I'd recommend trying it out with the same 1.11 version of Flink and
>>>>>> Graphite reporter.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Dawid
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-16965
>>>>>> On 26/08/2020 08:04, Vijayendra Yadav wrote:
>>>>>>
>>>>>> Hi Nikola,
>>>>>>
>>>>>> To rule out any other cluster issues, I have tried it in my local
>>>>>> now. Steps as follows, but don't see any metrics yet.
>>>>>>
>>>>>> 1) Set up local Graphite
>>>>>>
>>>>>> docker run -d\
>>>>>>  --name graphite\
>>>>>>  --restart=always\
>>>>>>  -p 80:80\
>>>>>>  -p 2003-2004:2003-2004\
>>>>>>  -p 2023-2024:2023-2024\
>>>>>>  -p 8125:8125/udp\
>>>>>>  -p 8126:8126\
>>>>>>  graphiteapp/graphite-statsd
>>>>>>
>>>>>> Mapped Ports
>>>>>> Host Container Service
>>>>>> 80 80 nginx <https://www.nginx.com/resources/admin-guide/>
>>>>>> 2003 2003 carbon receiver - plaintext
>>>>>

Re: Default Flink Metrics Graphite

2020-09-02 Thread Vijayendra Yadav
>>>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
>>>> at
>>>> org.apache.flink.streaming.api.operators.SimpleOperatorFactory.createStreamOperator(SimpleOperatorFactory.java:75)
>>>> at
>>>> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:48)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:429)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:353)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:144)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:433)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
>>>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
>>>> at java.lang.Thread.run(Thread.java:748)
>>>> Regards,
>>>> Vijay
>>>>
>>>>
>>>> On Wed, Aug 26, 2020 at 7:53 AM Chesnay Schepler 
>>>> wrote:
>>>>
>>>>> metrics.reporter.grph.class:
>>>>> org.apache.flink.metrics.graphite.GraphiteReporter
>>>>>
>>>>>
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter
>>>>>
>>>>> On 26/08/2020 16:40, Vijayendra Yadav wrote:
>>>>>
>>>>> Hi Dawid,
>>>>>
>>>>> I have 1.10.0 version of flink. What is alternative for this version ?
>>>>>
>>>>> Regards,
>>>>> Vijay
>>>>>
>>>>>
>>>>> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz
>>>>>   wrote:
>>>>>
>>>>> 
>>>>>
>>>>> Hi Vijay,
>>>>>
>>>>> I think the problem might be that you are using a wrong version of the
>>>>> reporter.
>>>>>
>>>>> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a
>>>>> plugin, but it was migrated to plugins in 1.11 only[1].
>>>>>
>>>>> I'd recommend trying it out with the same 1.11 version of Flink and
>>>>> Graphite reporter.
>>>>>
>>>>> Best,
>>>>>
>>>>> Dawid
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/FLINK-16965
>>>>> On 26/08/2020 08:04, Vijayendra Yadav wrote:
>>>>>
>>>>> Hi Nikola,
>>>>>
>>>>> To rule out any other cluster issues, I have tried it in my local now.
>>>>> Steps as follows, but don't see any metrics yet.
>>>>>
>>>>> 1) Set up local Graphite
>>>>>
>>>>> docker run -d\
>>>>>  --name graphite\
>>>>>  --restart=always\
>>>>>  -p 80:80\
>>>>>  -p 2003-2004:2003-2004\
>>>>>  -p 2023-2024:2023-2024\
>>>>>  -p 8125:8125/udp\
>>>>>  -p 8126:8126\
>>>>>  graphiteapp/graphite-statsd
>>>>>
>>>>> Mapped Ports
>>>>> Host Container Service
>>>>> 80 80 nginx <https://www.nginx.com/resources/admin-guide/>
>>>>> 2003 2003 carbon receiver - plaintext
>>>>> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol>
>>>>> 2004 2004 carbon receiver - pickle
>>>>> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol>
>>>>> 2023 2023 carbon aggregator - plaintext
>>>>> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>>>>> 2024 2024 carbon aggregator - pickle
>>>>> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>>>>> 8080 8080 Graphite internal gunicorn port (without Nginx proxying).
>>>>> 8125 8125 statsd
>>>>> <https://github.com/etsy/statsd/blob/master/docs/server.md>
>>>>> 8126 8126 statsd admin
>>>>> <https://github.com/etsy/statsd/blob/master/docs/admin

Re: Default Flink Metrics Graphite

2020-09-02 Thread Till Rohrmann
>>
>>> On Wed, Aug 26, 2020 at 7:53 AM Chesnay Schepler 
>>> wrote:
>>>
>>>> metrics.reporter.grph.class:
>>>> org.apache.flink.metrics.graphite.GraphiteReporter
>>>>
>>>>
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter
>>>>
>>>> On 26/08/2020 16:40, Vijayendra Yadav wrote:
>>>>
>>>> Hi Dawid,
>>>>
>>>> I have 1.10.0 version of flink. What is alternative for this version ?
>>>>
>>>> Regards,
>>>> Vijay
>>>>
>>>>
>>>> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz 
>>>>  wrote:
>>>>
>>>> 
>>>>
>>>> Hi Vijay,
>>>>
>>>> I think the problem might be that you are using a wrong version of the
>>>> reporter.
>>>>
>>>> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a
>>>> plugin, but it was migrated to plugins in 1.11 only[1].
>>>>
>>>> I'd recommend trying it out with the same 1.11 version of Flink and
>>>> Graphite reporter.
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>> [1] https://issues.apache.org/jira/browse/FLINK-16965
>>>> On 26/08/2020 08:04, Vijayendra Yadav wrote:
>>>>
>>>> Hi Nikola,
>>>>
>>>> To rule out any other cluster issues, I have tried it in my local now.
>>>> Steps as follows, but don't see any metrics yet.
>>>>
>>>> 1) Set up local Graphite
>>>>
>>>> docker run -d\
>>>>  --name graphite\
>>>>  --restart=always\
>>>>  -p 80:80\
>>>>  -p 2003-2004:2003-2004\
>>>>  -p 2023-2024:2023-2024\
>>>>  -p 8125:8125/udp\
>>>>  -p 8126:8126\
>>>>  graphiteapp/graphite-statsd
>>>>
>>>> Mapped Ports
>>>> Host Container Service
>>>> 80 80 nginx <https://www.nginx.com/resources/admin-guide/>
>>>> 2003 2003 carbon receiver - plaintext
>>>> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol>
>>>> 2004 2004 carbon receiver - pickle
>>>> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol>
>>>> 2023 2023 carbon aggregator - plaintext
>>>> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>>>> 2024 2024 carbon aggregator - pickle
>>>> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>>>> 8080 8080 Graphite internal gunicorn port (without Nginx proxying).
>>>> 8125 8125 statsd
>>>> <https://github.com/etsy/statsd/blob/master/docs/server.md>
>>>> 8126 8126 statsd admin
>>>> <https://github.com/etsy/statsd/blob/master/docs/admin_interface.md>
>>>> 2) WebUI:
>>>>
>>>> 
>>>>
>>>>
>>>>
>>>> 3) Run Flink example Job.
>>>> ./bin/flink run
>>>> ./examples/flink-examples-streaming_2.11-1.11-SNAPSHOT-SocketWindowWordCount.jar
>>>> --port 
>>>>
>>>> with conf/flink-conf.yaml set as:
>>>>
>>>> metrics.reporter.grph.factory.class:
>>>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>>>> metrics.reporter.grph.host: localhost
>>>> metrics.reporter.grph.port: 2003
>>>> metrics.reporter.grph.protocol: TCP
>>>> metrics.reporter.grph.interval: 1 SECONDS
>>>>
>>>> and graphite jar:
>>>>
>>>> plugins/flink-metrics-graphite/flink-metrics-graphite-1.10.0.jar
>>>>
>>>>
>>>> 4) Can't see any activity in webui graphite.
>>>>
>>>>
>>>> Could you review and let me know what is wrong here ? any other way you
>>>> suggest to be able to view the raw metrics data ?
>>>> Also, do you have sample metrics raw format, you can share from any
>>>> other project.
>>>>
>>>> Regards,
>>>> Vijay
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Aug 23, 2020 at 9:26 PM Nikola Hrusov 
>>>> wrote:
>>>>
>>>>> Hi Vijay,
>>>>>
>>>>> Your steps look correct to me.
>>>>> Perhaps you can double check that the graphite port you are sending is
>>>>> correct? THe default carbon port is 2003 and if you use the aggregator it
>>>>> is 2023.
>>>>>
>>>>> You should be able to see in both flink jobmanager and taskmanager
>>>>> that the metrics have been initialized with the config you have pasted.
>>>>>
>>>>> Regards
>>>>> ,
>>>>> Nikola Hrusov
>>>>>
>>>>>
>>>>> On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav <
>>>>> contact@gmail.com> wrote:
>>>>>
>>>>>> Hi Team,
>>>>>>
>>>>>> I am trying  to export Flink stream default metrics using Graphite,
>>>>>> but I can't find it in the Graphite metrics console.  Could you confirm 
>>>>>> the
>>>>>> steps below are correct?
>>>>>>
>>>>>> *1) Updated flink-conf.yaml*
>>>>>>
>>>>>> metrics.reporter.grph.factory.class:
>>>>>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>>>>>> metrics.reporter.grph.host: port
>>>>>> metrics.reporter.grph.port: 9109
>>>>>> metrics.reporter.grph.protocol: TCP
>>>>>> metrics.reporter.grph.interval: 30 SECONDS
>>>>>>
>>>>>> 2) Added Graphite jar in plugin folder :
>>>>>>
>>>>>> ll */usr/lib/flink/plugins/metric/*
>>>>>>  *flink-metrics-graphite-1.10.0.jar*
>>>>>>
>>>>>> 3) Looking metrics in graphite server:
>>>>>>
>>>>>> http://port:8080/metrics <http://10.108.58.63:8080/metrics>
>>>>>>
>>>>>> Note: No code change is done.
>>>>>>
>>>>>> Regards,
>>>>>> Vijay
>>>>>>
>>>>>>
>>>>>>
>>>>


Re: Default Flink Metrics Graphite

2020-09-01 Thread Vijayendra Yadav
Thanks all, I could see the metrics.

On Thu, Aug 27, 2020 at 7:51 AM Robert Metzger  wrote:

> I don't think these error messages give us a hint why you can't see the
> metrics (because they are about registering metrics, not reporting them)
>
> Are you sure you are using the right configuration parameters for Flink
> 1.10? That all required JARs are in the lib/ folder (on all machines) and
> that your graphite setup is working (have you confirmed that you can show
> any metrics in the Graphite UI (maybe from a Graphite demo thingy))?
>
>
> On Thu, Aug 27, 2020 at 2:05 AM Vijayendra Yadav 
> wrote:
>
>> Hi Chesnay and Dawid,
>>
>> I see multiple entries as following in Log:
>>
>> 2020-08-26 23:46:19,105 WARN
>> org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while
>> registering metric: numRecordsIn.
>> java.lang.IllegalArgumentException: A metric named
>> ip-99--99-99.taskmanager.container_1596056409708_1570_01_06.vdcs-kafka-flink-test.Map.0.numRecordsIn
>> already exists
>> at
>> com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
>> 2020-08-26 23:46:19,094 WARN
>> org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while
>> registering metric: numRecordsOut.
>> java.lang.IllegalArgumentException: A metric named
>> ip-99--99-999.taskmanager.container_1596056409708_1570_01_05.vdcs-kafka-flink-test.Map.2.numRecordsOut
>> already exists
>> at
>> com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
>> at
>> org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
>> at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
>> at
>> org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
>> at
>> org.apache.flink.runtime.metrics.MetricRegistryImpl.register(MetricRegistryImpl.java:343)
>> at
>> org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:426)
>> at
>> org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:359)
>> at
>> org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:349)
>> at
>> org.apache.flink.runtime.metrics.groups.OperatorIOMetricGroup.(OperatorIOMetricGroup.java:41)
>> at
>> org.apache.flink.runtime.metrics.groups.OperatorMetricGroup.(OperatorMetricGroup.java:48)
>> at
>> org.apache.flink.runtime.metrics.groups.TaskMetricGroup.lambda$getOrAddOperator$0(TaskMetricGroup.java:154)
>> at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
>> at
>> org.apache.flink.runtime.metrics.groups.TaskMetricGroup.getOrAddOperator(TaskMetricGroup.java:154)
>> at
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.setup(AbstractStreamOperator.java:180)
>> at
>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
>> at
>> org.apache.flink.streaming.api.operators.SimpleOperatorFactory.createStreamOperator(SimpleOperatorFactory.java:75)
>> at
>> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:48)
>> at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:429)
>> at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:353)
>> at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:144)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:433)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
>> at java.lang.Thread.run(Thread.java:748)
>> Regards,
>> Vijay
>>
>>
>> On Wed, Aug 26, 2020 at 7:53 AM Chesnay Schepler 
>> wrote:
>>
>>> metrics.reporter.grph.class:
>>> org.apache.flink.metrics.graphite.GraphiteReporter
>>>
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter
>>>
>>> On 26/08/2020 16:40, Vijayendra Yadav wrote:
>>>
>>> Hi Dawid,
>>>
>>> I have 1.10.0 version of flink. What is alternative for th

Re: Default Flink Metrics Graphite

2020-08-27 Thread Robert Metzger
I don't think these error messages give us a hint why you can't see the
metrics (because they are about registering metrics, not reporting them)

Are you sure you are using the right configuration parameters for Flink
1.10? That all required JARs are in the lib/ folder (on all machines) and
that your graphite setup is working (have you confirmed that you can show
any metrics in the Graphite UI (maybe from a Graphite demo thingy))?


On Thu, Aug 27, 2020 at 2:05 AM Vijayendra Yadav 
wrote:

> Hi Chesnay and Dawid,
>
> I see multiple entries as following in Log:
>
> 2020-08-26 23:46:19,105 WARN
> org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while
> registering metric: numRecordsIn.
> java.lang.IllegalArgumentException: A metric named
> ip-99--99-99.taskmanager.container_1596056409708_1570_01_06.vdcs-kafka-flink-test.Map.0.numRecordsIn
> already exists
> at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
> 2020-08-26 23:46:19,094 WARN
> org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while
> registering metric: numRecordsOut.
> java.lang.IllegalArgumentException: A metric named
> ip-99--99-999.taskmanager.container_1596056409708_1570_01_05.vdcs-kafka-flink-test.Map.2.numRecordsOut
> already exists
> at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
> at
> org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
> at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
> at
> org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
> at
> org.apache.flink.runtime.metrics.MetricRegistryImpl.register(MetricRegistryImpl.java:343)
> at
> org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:426)
> at
> org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:359)
> at
> org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:349)
> at
> org.apache.flink.runtime.metrics.groups.OperatorIOMetricGroup.(OperatorIOMetricGroup.java:41)
> at
> org.apache.flink.runtime.metrics.groups.OperatorMetricGroup.(OperatorMetricGroup.java:48)
> at
> org.apache.flink.runtime.metrics.groups.TaskMetricGroup.lambda$getOrAddOperator$0(TaskMetricGroup.java:154)
> at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
> at
> org.apache.flink.runtime.metrics.groups.TaskMetricGroup.getOrAddOperator(TaskMetricGroup.java:154)
> at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.setup(AbstractStreamOperator.java:180)
> at
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
> at
> org.apache.flink.streaming.api.operators.SimpleOperatorFactory.createStreamOperator(SimpleOperatorFactory.java:75)
> at
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:48)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:429)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:353)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:144)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:433)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
> at java.lang.Thread.run(Thread.java:748)
> Regards,
> Vijay
>
>
> On Wed, Aug 26, 2020 at 7:53 AM Chesnay Schepler 
> wrote:
>
>> metrics.reporter.grph.class:
>> org.apache.flink.metrics.graphite.GraphiteReporter
>>
>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter
>>
>> On 26/08/2020 16:40, Vijayendra Yadav wrote:
>>
>> Hi Dawid,
>>
>> I have 1.10.0 version of flink. What is alternative for this version ?
>>
>> Regards,
>> Vijay
>>
>>
>> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz 
>>  wrote:
>>
>> 
>>
>> Hi Vijay,
>>
>> I think the problem might be that you are using a wrong version of the
>> reporter.
>>
>> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a plugin,
>> but it was migrated to plugins in 1.11 only[1].
>>
>> I'd recommend 

Re: Default Flink Metrics Graphite

2020-08-26 Thread Vijayendra Yadav
Hi Chesnay and Dawid,

I see multiple entries as following in Log:

2020-08-26 23:46:19,105 WARN
org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while
registering metric: numRecordsIn.
java.lang.IllegalArgumentException: A metric named
ip-99--99-99.taskmanager.container_1596056409708_1570_01_06.vdcs-kafka-flink-test.Map.0.numRecordsIn
already exists
at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
2020-08-26 23:46:19,094 WARN
org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while
registering metric: numRecordsOut.
java.lang.IllegalArgumentException: A metric named
ip-99--99-999.taskmanager.container_1596056409708_1570_01_05.vdcs-kafka-flink-test.Map.2.numRecordsOut
already exists
at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
at
org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
at
org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
at
org.apache.flink.runtime.metrics.MetricRegistryImpl.register(MetricRegistryImpl.java:343)
at
org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:426)
at
org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:359)
at
org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:349)
at
org.apache.flink.runtime.metrics.groups.OperatorIOMetricGroup.(OperatorIOMetricGroup.java:41)
at
org.apache.flink.runtime.metrics.groups.OperatorMetricGroup.(OperatorMetricGroup.java:48)
at
org.apache.flink.runtime.metrics.groups.TaskMetricGroup.lambda$getOrAddOperator$0(TaskMetricGroup.java:154)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at
org.apache.flink.runtime.metrics.groups.TaskMetricGroup.getOrAddOperator(TaskMetricGroup.java:154)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.setup(AbstractStreamOperator.java:180)
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
at
org.apache.flink.streaming.api.operators.SimpleOperatorFactory.createStreamOperator(SimpleOperatorFactory.java:75)
at
org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:48)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:429)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:353)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:144)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:433)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
at java.lang.Thread.run(Thread.java:748)
Regards,
Vijay


On Wed, Aug 26, 2020 at 7:53 AM Chesnay Schepler  wrote:

> metrics.reporter.grph.class:
> org.apache.flink.metrics.graphite.GraphiteReporter
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter
>
> On 26/08/2020 16:40, Vijayendra Yadav wrote:
>
> Hi Dawid,
>
> I have 1.10.0 version of flink. What is alternative for this version ?
>
> Regards,
> Vijay
>
>
> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz 
>  wrote:
>
> 
>
> Hi Vijay,
>
> I think the problem might be that you are using a wrong version of the
> reporter.
>
> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a plugin,
> but it was migrated to plugins in 1.11 only[1].
>
> I'd recommend trying it out with the same 1.11 version of Flink and
> Graphite reporter.
>
> Best,
>
> Dawid
>
> [1] https://issues.apache.org/jira/browse/FLINK-16965
> On 26/08/2020 08:04, Vijayendra Yadav wrote:
>
> Hi Nikola,
>
> To rule out any other cluster issues, I have tried it in my local now.
> Steps as follows, but don't see any metrics yet.
>
> 1) Set up local Graphite
>
> docker run -d\
>  --name graphite\
>  --restart=always\
>  -p 80:80\
>  -p 2003-2004:2003-2004\
>  -p 2023-2024:2023-2024\
>  -p 8125:8125/udp\
>  -p 8126:8126\
>  graphiteapp/graphite-statsd
>
> Mapped Ports
> Host Container Service
> 80 80 nginx <https://www.nginx.com/resources/admin-guide/>
> 2003 2003 carbon receiver - plaintext
> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol>
> 2004 2004 carbon receiver - pickle
> <h

Re: Default Flink Metrics Graphite

2020-08-26 Thread Chesnay Schepler
metrics.reporter.grph.class: 
org.apache.flink.metrics.graphite.GraphiteReporter


https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter

On 26/08/2020 16:40, Vijayendra Yadav wrote:

Hi Dawid,

I have 1.10.0 version of flink. What is alternative for this version ?

Regards,
Vijay



On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz 
 wrote:




Hi Vijay,

I think the problem might be that you are using a wrong version of 
the reporter.


You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a 
plugin, but it was migrated to plugins in 1.11 only[1].


I'd recommend trying it out with the same 1.11 version of Flink and 
Graphite reporter.


Best,

Dawid

[1] https://issues.apache.org/jira/browse/FLINK-16965

On 26/08/2020 08:04, Vijayendra Yadav wrote:

Hi Nikola,

To rule out any other cluster issues, I have tried it in my local 
now. Steps as follows, but don't see any metrics yet.


1) Set up local Graphite

|docker run -d\ --name graphite\ --restart=always\ -p 80:80\ -p 
2003-2004:2003-2004\ -p 2023-2024:2023-2024\ -p 8125:8125/udp\ -p 
8126:8126\ graphiteapp/graphite-statsd|



  Mapped Ports

HostContainer   Service
80  80  nginx <https://www.nginx.com/resources/admin-guide/>
2003 	2003 	carbon receiver - plaintext 
<http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol> 

2004 	2004 	carbon receiver - pickle 
<http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol> 

2023 	2023 	carbon aggregator - plaintext 
<http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py> 

2024 	2024 	carbon aggregator - pickle 
<http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py> 


80808080Graphite internal gunicorn port (without Nginx proxying).
8125 	8125 	statsd 
<https://github.com/etsy/statsd/blob/master/docs/server.md>
8126 	8126 	statsd admin 
<https://github.com/etsy/statsd/blob/master/docs/admin_interface.md>


2) WebUI:





3) Run Flink example Job.
./bin/flink run 
./examples/flink-examples-streaming_2.11-1.11-SNAPSHOT-SocketWindowWordCount.jar 
--port 


with conf/flink-conf.yaml set as:

metrics.reporter.grph.factory.class: 
org.apache.flink.metrics.graphite.GraphiteReporterFactory

metrics.reporter.grph.host: localhost
metrics.reporter.grph.port: 2003
metrics.reporter.grph.protocol: TCP
metrics.reporter.grph.interval: 1 SECONDS

and graphite jar:

plugins/flink-metrics-graphite/flink-metrics-graphite-1.10.0.jar


4) Can't see any activity in webui graphite.


Could you review and let me know what is wrong here ? any other way 
you suggest to be able to view the raw metrics data ?
Also, do you have sample metrics raw format, you can share from any 
other project.


Regards,
Vijay




On Sun, Aug 23, 2020 at 9:26 PM Nikola Hrusov <mailto:n.hru...@gmail.com>> wrote:


Hi Vijay,

Your steps look correct to me.
Perhaps you can double check that the graphite port you are
sending is correct? THe default carbon port is 2003 and if you
use the aggregator it is 2023.

You should be able to see in both flink jobmanager and
taskmanager that the metrics have been initialized with the
config you have pasted.

Regards
,
Nikola Hrusov


On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav
mailto:contact@gmail.com>> wrote:

Hi Team,

I am trying  to export Flink stream default metrics using
Graphite, but I can't find it in the Graphite metrics
console.  Could you confirm the steps below are correct?

*1) Updated flink-conf.yaml*

metrics.reporter.grph.factory.class:
org.apache.flink.metrics.graphite.GraphiteReporterFactory
metrics.reporter.grph.host: port
metrics.reporter.grph.port: 9109
metrics.reporter.grph.protocol: TCP
metrics.reporter.grph.interval: 30 SECONDS

2) Added Graphite jar in plugin folder :

    ll */usr/lib/flink/plugins/metric/*
*flink-metrics-graphite-1.10.0.jar*

3) Looking metrics in graphite server:

http://port:8080/metrics <http://10.108.58.63:8080/metrics>

Note: No code change is done.

Regards,
Vijay






Re: Default Flink Metrics Graphite

2020-08-26 Thread Dawid Wysakowicz
I'd recommend then following this instruction from older docs[1]

The difference are that you should set:

|metrics.reporter.grph.class:
org.apache.flink.metrics.graphite.GraphiteReporter|

and put the reporter jar to the /lib folder:

In order to use this reporter you must copy
|/opt/flink-metrics-graphite-1.10.0.jar| into the |/lib| folder of your
Flink distribution.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#graphite-orgapacheflinkmetricsgraphitegraphitereporter

Best,

Dawid

On 26/08/2020 16:40, Vijayendra Yadav wrote:
> Hi Dawid,
>
> I have 1.10.0 version of flink. What is alternative for this version ?
>
> Regards,
> Vijay
>
>>
>> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz
>>  wrote:
>>
>> 
>>
>> Hi Vijay,
>>
>> I think the problem might be that you are using a wrong version of
>> the reporter.
>>
>> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a
>> plugin, but it was migrated to plugins in 1.11 only[1].
>>
>> I'd recommend trying it out with the same 1.11 version of Flink and
>> Graphite reporter.
>>
>> Best,
>>
>> Dawid
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-16965
>>
>> On 26/08/2020 08:04, Vijayendra Yadav wrote:
>>> Hi Nikola,
>>>
>>> To rule out any other cluster issues, I have tried it in my local
>>> now. Steps as follows, but don't see any metrics yet.
>>>
>>> 1) Set up local Graphite 
>>>
>>> |docker run -d\ --name graphite\ --restart=always\ -p 80:80\ -p
>>> 2003-2004:2003-2004\ -p 2023-2024:2023-2024\ -p 8125:8125/udp\ -p
>>> 8126:8126\ graphiteapp/graphite-statsd|
>>>
>>>
>>>   Mapped Ports
>>>
>>> HostContainer   Service
>>> 80  80  nginx <https://www.nginx.com/resources/admin-guide/>
>>> 20032003carbon receiver - plaintext
>>> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol>
>>>
>>> 20042004carbon receiver - pickle
>>> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol>
>>>
>>> 20232023carbon aggregator - plaintext
>>> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>>>
>>> 20242024carbon aggregator - pickle
>>> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>>>
>>> 80808080Graphite internal gunicorn port (without Nginx 
>>> proxying).
>>> 81258125statsd
>>> <https://github.com/etsy/statsd/blob/master/docs/server.md>
>>> 81268126statsd admin
>>> <https://github.com/etsy/statsd/blob/master/docs/admin_interface.md>
>>>
>>> 2) WebUI: 
>>>
>>> 
>>>
>>>
>>>
>>> 3) Run Flink example Job.
>>> ./bin/flink run
>>> ./examples/flink-examples-streaming_2.11-1.11-SNAPSHOT-SocketWindowWordCount.jar
>>> --port 
>>>
>>> with conf/flink-conf.yaml set as:
>>>
>>> metrics.reporter.grph.factory.class:
>>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>>> metrics.reporter.grph.host: localhost
>>> metrics.reporter.grph.port: 2003
>>> metrics.reporter.grph.protocol: TCP
>>> metrics.reporter.grph.interval: 1 SECONDS
>>>
>>> and graphite jar:
>>>
>>> plugins/flink-metrics-graphite/flink-metrics-graphite-1.10.0.jar
>>>
>>>
>>> 4) Can't see any activity in webui graphite. 
>>>
>>>
>>> Could you review and let me know what is wrong here ? any other way
>>> you suggest to be able to view the raw metrics data ?
>>> Also, do you have sample metrics raw format, you can share from any
>>> other project.
>>>
>>> Regards,
>>> Vijay
>>>
>>>
>>>
>>>
>>> On Sun, Aug 23, 2020 at 9:26 PM Nikola Hrusov >> <mailto:n.hru...@gmail.com>> wrote:
>>>
>>> Hi Vijay,
>>>
>>> Your steps look correct to me. 
>>> Perhaps you can double check that the graphite port you are
>>> sending is correct? THe default carbon port is 2003 and if you
>>> use the aggregator it is 2023.
>>>
>>> You should be able to see in both flink jobmanager and
>>> taskmanager that the metrics have been initialized with

Re: Default Flink Metrics Graphite

2020-08-26 Thread Vijayendra Yadav
Hi Dawid,

I have 1.10.0 version of flink. What is alternative for this version ?

Regards,
Vijay

> 
> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz  wrote:
> 
> 
> Hi Vijay,
> 
> I think the problem might be that you are using a wrong version of the 
> reporter.
> 
> You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a plugin, but 
> it was migrated to plugins in 1.11 only[1].
> 
> I'd recommend trying it out with the same 1.11 version of Flink and Graphite 
> reporter.
> 
> Best,
> 
> Dawid
> 
> [1] https://issues.apache.org/jira/browse/FLINK-16965
> 
> On 26/08/2020 08:04, Vijayendra Yadav wrote:
>> Hi Nikola,
>> 
>> To rule out any other cluster issues, I have tried it in my local now. Steps 
>> as follows, but don't see any metrics yet.
>> 
>> 1) Set up local Graphite 
>> 
>> docker run -d\
>>  --name graphite\
>>  --restart=always\
>>  -p 80:80\
>>  -p 2003-2004:2003-2004\
>>  -p 2023-2024:2023-2024\
>>  -p 8125:8125/udp\
>>  -p 8126:8126\
>>  graphiteapp/graphite-statsd
>> Mapped Ports
>> 
>> Host Container   Service
>> 80   80  nginx
>> 2003 2003carbon receiver - plaintext
>> 2004 2004carbon receiver - pickle
>> 2023 2023carbon aggregator - plaintext
>> 2024 2024carbon aggregator - pickle
>> 8080 8080Graphite internal gunicorn port (without Nginx proxying).
>> 8125 8125statsd
>> 8126 8126statsd admin
>> 2) WebUI: 
>> 
>> 
>> 
>> 
>> 
>> 3) Run Flink example Job.
>> ./bin/flink run 
>> ./examples/flink-examples-streaming_2.11-1.11-SNAPSHOT-SocketWindowWordCount.jar
>>  --port 
>> 
>> with conf/flink-conf.yaml set as:
>> 
>> metrics.reporter.grph.factory.class: 
>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>> metrics.reporter.grph.host: localhost
>> metrics.reporter.grph.port: 2003
>> metrics.reporter.grph.protocol: TCP
>> metrics.reporter.grph.interval: 1 SECONDS
>> 
>> and graphite jar:
>> 
>> plugins/flink-metrics-graphite/flink-metrics-graphite-1.10.0.jar
>> 
>> 
>> 4) Can't see any activity in webui graphite. 
>> 
>> 
>> Could you review and let me know what is wrong here ? any other way you 
>> suggest to be able to view the raw metrics data ?
>> Also, do you have sample metrics raw format, you can share from any other 
>> project.
>> 
>> Regards,
>> Vijay
>> 
>> 
>> 
>> 
>> On Sun, Aug 23, 2020 at 9:26 PM Nikola Hrusov  wrote:
>>> Hi Vijay,
>>> 
>>> Your steps look correct to me. 
>>> Perhaps you can double check that the graphite port you are sending is 
>>> correct? THe default carbon port is 2003 and if you use the aggregator it 
>>> is 2023.
>>> 
>>> You should be able to see in both flink jobmanager and taskmanager that the 
>>> metrics have been initialized with the config you have pasted.
>>> 
>>> Regards
>>> ,
>>> Nikola Hrusov
>>> 
>>> 
>>> On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav  
>>> wrote:
>>>> Hi Team,
>>>> 
>>>> I am trying  to export Flink stream default metrics using Graphite, but I 
>>>> can't find it in the Graphite metrics console.  Could you confirm the 
>>>> steps below are correct?
>>>> 
>>>> 1) Updated flink-conf.yaml
>>>> 
>>>> metrics.reporter.grph.factory.class: 
>>>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>>>> metrics.reporter.grph.host: port
>>>> metrics.reporter.grph.port: 9109
>>>> metrics.reporter.grph.protocol: TCP
>>>> metrics.reporter.grph.interval: 30 SECONDS
>>>> 
>>>> 2) Added Graphite jar in plugin folder :
>>>> 
>>>> ll /usr/lib/flink/plugins/metric/
>>>>  flink-metrics-graphite-1.10.0.jar
>>>> 
>>>> 3) Looking metrics in graphite server:
>>>> 
>>>> http://port:8080/metrics  
>>>> 
>>>> Note: No code change is done.
>>>> 
>>>> Regards,
>>>> Vijay
>>>> 
>>>> 


Re: Default Flink Metrics Graphite

2020-08-26 Thread Dawid Wysakowicz
Hi Vijay,

I think the problem might be that you are using a wrong version of the
reporter.

You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a
plugin, but it was migrated to plugins in 1.11 only[1].

I'd recommend trying it out with the same 1.11 version of Flink and
Graphite reporter.

Best,

Dawid

[1] https://issues.apache.org/jira/browse/FLINK-16965

On 26/08/2020 08:04, Vijayendra Yadav wrote:
> Hi Nikola,
>
> To rule out any other cluster issues, I have tried it in my local now.
> Steps as follows, but don't see any metrics yet.
>
> 1) Set up local Graphite 
>
> |docker run -d\ --name graphite\ --restart=always\ -p 80:80\ -p
> 2003-2004:2003-2004\ -p 2023-2024:2023-2024\ -p 8125:8125/udp\ -p
> 8126:8126\ graphiteapp/graphite-statsd|
>
>
>   Mapped Ports
>
> Host  Container   Service
> 8080  nginx <https://www.nginx.com/resources/admin-guide/>
> 2003  2003carbon receiver - plaintext
> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol>
>
> 2004  2004carbon receiver - pickle
> <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol>
>
> 2023  2023carbon aggregator - plaintext
> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>
> 2024  2024carbon aggregator - pickle
> <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
>
> 8080  8080Graphite internal gunicorn port (without Nginx proxying).
> 8125  8125statsd
> <https://github.com/etsy/statsd/blob/master/docs/server.md>
> 8126  8126statsd admin
> <https://github.com/etsy/statsd/blob/master/docs/admin_interface.md>
>
> 2) WebUI: 
>
> image.png
>
>
> 3) Run Flink example Job.
> ./bin/flink run
> ./examples/flink-examples-streaming_2.11-1.11-SNAPSHOT-SocketWindowWordCount.jar
> --port 
>
> with conf/flink-conf.yaml set as:
>
> metrics.reporter.grph.factory.class:
> org.apache.flink.metrics.graphite.GraphiteReporterFactory
> metrics.reporter.grph.host: localhost
> metrics.reporter.grph.port: 2003
> metrics.reporter.grph.protocol: TCP
> metrics.reporter.grph.interval: 1 SECONDS
>
> and graphite jar:
>
> plugins/flink-metrics-graphite/flink-metrics-graphite-1.10.0.jar
>
>
> 4) Can't see any activity in webui graphite. 
>
>
> Could you review and let me know what is wrong here ? any other way
> you suggest to be able to view the raw metrics data ?
> Also, do you have sample metrics raw format, you can share from any
> other project.
>
> Regards,
> Vijay
>
>
>
>
> On Sun, Aug 23, 2020 at 9:26 PM Nikola Hrusov  <mailto:n.hru...@gmail.com>> wrote:
>
> Hi Vijay,
>
> Your steps look correct to me. 
> Perhaps you can double check that the graphite port you are
> sending is correct? THe default carbon port is 2003 and if you use
> the aggregator it is 2023.
>
> You should be able to see in both flink jobmanager and taskmanager
> that the metrics have been initialized with the config you have
> pasted.
>
> Regards
> ,
> Nikola Hrusov
>
>
> On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav
> mailto:contact@gmail.com>> wrote:
>
> Hi Team,
>
> I am trying  to export Flink stream default metrics using
> Graphite, but I can't find it in the Graphite metrics
> console.  Could you confirm the steps below are correct?
>
> *1) Updated flink-conf.yaml*
>
> metrics.reporter.grph.factory.class:
> org.apache.flink.metrics.graphite.GraphiteReporterFactory
> metrics.reporter.grph.host: port
> metrics.reporter.grph.port: 9109
> metrics.reporter.grph.protocol: TCP
> metrics.reporter.grph.interval: 30 SECONDS
>
> 2) Added Graphite jar in plugin folder :
>
> ll */usr/lib/flink/plugins/metric/*
>  *flink-metrics-graphite-1.10.0.jar*
>
> 3) Looking metrics in graphite server:
>
> http://port:8080/metrics <http://10.108.58.63:8080/metrics>  
>
> Note: No code change is done.
>
> Regards,
> Vijay
>
>


signature.asc
Description: OpenPGP digital signature


Re: Default Flink Metrics Graphite

2020-08-26 Thread Vijayendra Yadav
Hi Nikola,

To rule out any other cluster issues, I have tried it in my local now.
Steps as follows, but don't see any metrics yet.

1) Set up local Graphite

docker run -d\
 --name graphite\
 --restart=always\
 -p 80:80\
 -p 2003-2004:2003-2004\
 -p 2023-2024:2023-2024\
 -p 8125:8125/udp\
 -p 8126:8126\
 graphiteapp/graphite-statsd

Mapped Ports
HostContainerService
80 80 nginx <https://www.nginx.com/resources/admin-guide/>
2003 2003 carbon receiver - plaintext
<http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol>
2004 2004 carbon receiver - pickle
<http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol>
2023 2023 carbon aggregator - plaintext
<http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
2024 2024 carbon aggregator - pickle
<http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
8080 8080 Graphite internal gunicorn port (without Nginx proxying).
8125 8125 statsd <https://github.com/etsy/statsd/blob/master/docs/server.md>
8126 8126 statsd admin
<https://github.com/etsy/statsd/blob/master/docs/admin_interface.md>
2) WebUI:

[image: image.png]


3) Run Flink example Job.
./bin/flink run
./examples/flink-examples-streaming_2.11-1.11-SNAPSHOT-SocketWindowWordCount.jar
--port 

with conf/flink-conf.yaml set as:

metrics.reporter.grph.factory.class:
org.apache.flink.metrics.graphite.GraphiteReporterFactory
metrics.reporter.grph.host: localhost
metrics.reporter.grph.port: 2003
metrics.reporter.grph.protocol: TCP
metrics.reporter.grph.interval: 1 SECONDS

and graphite jar:

plugins/flink-metrics-graphite/flink-metrics-graphite-1.10.0.jar


4) Can't see any activity in webui graphite.


Could you review and let me know what is wrong here ? any other way you
suggest to be able to view the raw metrics data ?
Also, do you have sample metrics raw format, you can share from any other
project.

Regards,
Vijay




On Sun, Aug 23, 2020 at 9:26 PM Nikola Hrusov  wrote:

> Hi Vijay,
>
> Your steps look correct to me.
> Perhaps you can double check that the graphite port you are sending is
> correct? THe default carbon port is 2003 and if you use the aggregator it
> is 2023.
>
> You should be able to see in both flink jobmanager and taskmanager that
> the metrics have been initialized with the config you have pasted.
>
> Regards
> ,
> Nikola Hrusov
>
>
> On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav 
> wrote:
>
>> Hi Team,
>>
>> I am trying  to export Flink stream default metrics using Graphite, but I
>> can't find it in the Graphite metrics console.  Could you confirm the steps
>> below are correct?
>>
>> *1) Updated flink-conf.yaml*
>>
>> metrics.reporter.grph.factory.class:
>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>> metrics.reporter.grph.host: port
>> metrics.reporter.grph.port: 9109
>> metrics.reporter.grph.protocol: TCP
>> metrics.reporter.grph.interval: 30 SECONDS
>>
>> 2) Added Graphite jar in plugin folder :
>>
>> ll */usr/lib/flink/plugins/metric/*
>>  *flink-metrics-graphite-1.10.0.jar*
>>
>> 3) Looking metrics in graphite server:
>>
>> http://port:8080/metrics <http://10.108.58.63:8080/metrics>
>>
>> Note: No code change is done.
>>
>> Regards,
>> Vijay
>>
>>
>>


Re: Default Flink Metrics Graphite

2020-08-25 Thread Vijayendra Yadav
Thanks for inputs Nikola. I will check on graphite side.

Sent from my iPhone

> On Aug 23, 2020, at 9:26 PM, Nikola Hrusov  wrote:
> 
> 
> Hi Vijay,
> 
> Your steps look correct to me. 
> Perhaps you can double check that the graphite port you are sending is 
> correct? THe default carbon port is 2003 and if you use the aggregator it is 
> 2023.
> 
> You should be able to see in both flink jobmanager and taskmanager that the 
> metrics have been initialized with the config you have pasted.
> 
> Regards,
> Nikola Hrusov
> 
> 
>> On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav  
>> wrote:
>> Hi Team,
>> 
>> I am trying  to export Flink stream default metrics using Graphite, but I 
>> can't find it in the Graphite metrics console.  Could you confirm the steps 
>> below are correct?
>> 
>> 1) Updated flink-conf.yaml
>> 
>> metrics.reporter.grph.factory.class: 
>> org.apache.flink.metrics.graphite.GraphiteReporterFactory
>> metrics.reporter.grph.host: port
>> metrics.reporter.grph.port: 9109
>> metrics.reporter.grph.protocol: TCP
>> metrics.reporter.grph.interval: 30 SECONDS
>> 
>> 2) Added Graphite jar in plugin folder :
>> 
>> ll /usr/lib/flink/plugins/metric/
>>  flink-metrics-graphite-1.10.0.jar
>> 
>> 3) Looking metrics in graphite server:
>> 
>> http://port:8080/metrics  
>> 
>> Note: No code change is done.
>> 
>> Regards,
>> Vijay
>> 
>> 


Re: Default Flink Metrics Graphite

2020-08-23 Thread Nikola Hrusov
Hi Vijay,

Your steps look correct to me.
Perhaps you can double check that the graphite port you are sending is
correct? THe default carbon port is 2003 and if you use the aggregator it
is 2023.

You should be able to see in both flink jobmanager and taskmanager that the
metrics have been initialized with the config you have pasted.

Regards
,
Nikola Hrusov


On Mon, Aug 24, 2020 at 5:00 AM Vijayendra Yadav 
wrote:

> Hi Team,
>
> I am trying  to export Flink stream default metrics using Graphite, but I
> can't find it in the Graphite metrics console.  Could you confirm the steps
> below are correct?
>
> *1) Updated flink-conf.yaml*
>
> metrics.reporter.grph.factory.class:
> org.apache.flink.metrics.graphite.GraphiteReporterFactory
> metrics.reporter.grph.host: port
> metrics.reporter.grph.port: 9109
> metrics.reporter.grph.protocol: TCP
> metrics.reporter.grph.interval: 30 SECONDS
>
> 2) Added Graphite jar in plugin folder :
>
> ll */usr/lib/flink/plugins/metric/*
>  *flink-metrics-graphite-1.10.0.jar*
>
> 3) Looking metrics in graphite server:
>
> http://port:8080/metrics <http://10.108.58.63:8080/metrics>
>
> Note: No code change is done.
>
> Regards,
> Vijay
>
>
>


Default Flink Metrics Graphite

2020-08-23 Thread Vijayendra Yadav
Hi Team,

I am trying  to export Flink stream default metrics using Graphite, but I
can't find it in the Graphite metrics console.  Could you confirm the steps
below are correct?

*1) Updated flink-conf.yaml*

metrics.reporter.grph.factory.class:
org.apache.flink.metrics.graphite.GraphiteReporterFactory
metrics.reporter.grph.host: port
metrics.reporter.grph.port: 9109
metrics.reporter.grph.protocol: TCP
metrics.reporter.grph.interval: 30 SECONDS

2) Added Graphite jar in plugin folder :

ll */usr/lib/flink/plugins/metric/*
 *flink-metrics-graphite-1.10.0.jar*

3) Looking metrics in graphite server:

http://port:8080/metrics <http://10.108.58.63:8080/metrics>

Note: No code change is done.

Regards,
Vijay


Re: A query on Flink metrics in kubernetes

2020-07-09 Thread Chesnay Schepler
From Flink's perspective no metrics are aggregated, nor are metric 
requests forwarded to some other process.


Each TaskExecutor has its own reporter, that each must be scraped to get 
the full set of metrics.


On 09/07/2020 11:39, Manish G wrote:

Hi,

I have a query regarding prometheus scraping Flink metrics data with 
application running in kubernetes cluster.


If taskmanager is running on multiple nodes, and prometheus requests 
for the metrics data, then is that request directed to one of the 
nodes(based on some strategy, like round-robin) or is data aggregated 
from all the nodes?


With regards





A query on Flink metrics in kubernetes

2020-07-09 Thread Manish G
Hi,

I have a query regarding prometheus scraping Flink metrics data with
application running in kubernetes cluster.

If taskmanager is running on multiple nodes, and prometheus requests for
the metrics data, then is that request directed to one of the nodes(based
on some strategy, like round-robin) or is data aggregated from all the
nodes?

With regards


Re: Logging Flink metrics

2020-07-06 Thread Manish G
>
>>>
>>>
>>>
>>> On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler 
>>> wrote:
>>>
>>>> You've said elsewhere that you do see some metrics in prometheus, which
>>>> are those?
>>>>
>>>> Why are you configuring the host for the prometheus reporter? This
>>>> option is only for the PrometheusPushGatewayReporter.
>>>>
>>>> On 06/07/2020 18:01, Manish G wrote:
>>>>
>>>> Hi,
>>>>
>>>> So I have following in flink-conf.yml :
>>>> //
>>>> metrics.reporter.prom.class:
>>>> org.apache.flink.metrics.prometheus.PrometheusReporter
>>>> metrics.reporter.prom.host: 127.0.0.1
>>>> metrics.reporter.prom.port: 
>>>> metrics.reporter.slf4j.class:
>>>> org.apache.flink.metrics.slf4j.Slf4jReporter
>>>> metrics.reporter.slf4j.interval: 30 SECONDS
>>>> //
>>>>
>>>> And while I can see custom metrics in Taskmanager logs, but prometheus
>>>> dashboard logs doesn't show custom metrics.
>>>>
>>>> With regards
>>>>
>>>> On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler 
>>>> wrote:
>>>>
>>>>> You have explicitly configured a reporter list, resulting in the slf4j
>>>>> reporter being ignored:
>>>>>
>>>>> 2020-07-06 13:48:22,191 INFO
>>>>> org.apache.flink.configuration.GlobalConfiguration- Loading
>>>>> configuration property: metrics.reporters, prom
>>>>> 2020-07-06 13:48:23,203 INFO
>>>>> org.apache.flink.runtime.metrics.ReporterSetup- Excluding
>>>>> reporter slf4j, not configured in reporter list (prom).
>>>>>
>>>>> Note that nowadays metrics.reporters is no longer required; the set
>>>>> of reporters is automatically determined based on configured properties;
>>>>> the only use-case is disabling a reporter without having to remove the
>>>>> entire configuration.
>>>>> I'd suggest to just remove the option, try again, and report back.
>>>>>
>>>>> On 06/07/2020 16:35, Chesnay Schepler wrote:
>>>>>
>>>>> Please enable debug logging and search for warnings from the metric
>>>>> groups/registry/reporter.
>>>>>
>>>>> If you cannot find anything suspicious, you can also send the foll log
>>>>> to me directly.
>>>>>
>>>>> On 06/07/2020 16:29, Manish G wrote:
>>>>>
>>>>> Job is an infinite streaming one, so it keeps going. Flink
>>>>> configuration is as:
>>>>>
>>>>> metrics.reporter.slf4j.class:
>>>>> org.apache.flink.metrics.slf4j.Slf4jReporter
>>>>> metrics.reporter.slf4j.interval: 30 SECONDS
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler 
>>>>> wrote:
>>>>>
>>>>>> How long did the job run for, and what is the configured interval?
>>>>>>
>>>>>>
>>>>>> On 06/07/2020 15:51, Manish G wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Thanks for this.
>>>>>>
>>>>>> I did the configuration as mentioned at the link(changes in
>>>>>> flink-conf.yml, copying the jar in lib directory), and registered the 
>>>>>> Meter
>>>>>> with metrics group and invoked markEvent() method in the target code. 
>>>>>> But I
>>>>>> don't see any related logs.
>>>>>> I am doing this all on my local computer.
>>>>>>
>>>>>> Anything else I need to do?
>>>>>>
>>>>>> With regards
>>>>>> Manish
>>>>>>
>>>>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler 
>>>>>> wrote:
>>>>>>
>>>>>>> Have you looked at the SLF4J reporter?
>>>>>>>
>>>>>>>
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>>>>>
>>>>>>> On 06/07/2020 13:49, Manish G wrote:
>>>>>>> > Hi,
>>>>>>> >
>>>>>>> > Is it possible to log Flink metrics in application logs apart from
>>>>>>> > publishing it to Prometheus?
>>>>>>> >
>>>>>>> > With regards
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Logging Flink metrics

2020-07-06 Thread Chesnay Schepler
rom.host: 127.0.0.1
metrics.reporter.prom.port: 
metrics.reporter.slf4j.class:
org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//

And while I can see custom metrics in Taskmanager logs,
but prometheus dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

You have explicitly configured a reporter list,
resulting in the slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO
org.apache.flink.configuration.GlobalConfiguration
- Loading configuration property:
metrics.reporters, prom
2020-07-06 13:48:23,203 INFO
org.apache.flink.runtime.metrics.ReporterSetup -
Excluding reporter slf4j, not configured in
reporter list (prom).

Note that nowadays metrics.reporters is no longer
required; the set of reporters is automatically
determined based on configured properties; the only
use-case is disabling a reporter without having to
remove the entire configuration.
I'd suggest to just remove the option, try again,
and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:

Please enable debug logging and search for
warnings from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can
also send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:

Job is an infinite streaming one, so it keeps
going. Flink configuration is as:

metrics.reporter.slf4j.class:
org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler
mailto:ches...@apache.org>>
wrote:

How long did the job run for, and what is the
configured interval?


On 06/07/2020 15:51, Manish G wrote:

Hi,

Thanks for this.

I did the configuration as mentioned at the
link(changes in flink-conf.yml, copying the
jar in lib directory), and registered the
Meter with metrics group and invoked
markEvent() method in the target code. But I
don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay
Schepler mailto:ches...@apache.org>> wrote:

Have you looked at the SLF4J reporter?


https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in
application logs apart from
> publishing it to Prometheus?
>
> With regards


















Re: Logging Flink metrics

2020-07-06 Thread Manish G
ics in Taskmanager logs, but prometheus
>>> dashboard logs doesn't show custom metrics.
>>>
>>> With regards
>>>
>>> On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler 
>>> wrote:
>>>
>>>> You have explicitly configured a reporter list, resulting in the slf4j
>>>> reporter being ignored:
>>>>
>>>> 2020-07-06 13:48:22,191 INFO
>>>> org.apache.flink.configuration.GlobalConfiguration- Loading
>>>> configuration property: metrics.reporters, prom
>>>> 2020-07-06 13:48:23,203 INFO
>>>> org.apache.flink.runtime.metrics.ReporterSetup- Excluding
>>>> reporter slf4j, not configured in reporter list (prom).
>>>>
>>>> Note that nowadays metrics.reporters is no longer required; the set of
>>>> reporters is automatically determined based on configured properties; the
>>>> only use-case is disabling a reporter without having to remove the entire
>>>> configuration.
>>>> I'd suggest to just remove the option, try again, and report back.
>>>>
>>>> On 06/07/2020 16:35, Chesnay Schepler wrote:
>>>>
>>>> Please enable debug logging and search for warnings from the metric
>>>> groups/registry/reporter.
>>>>
>>>> If you cannot find anything suspicious, you can also send the foll log
>>>> to me directly.
>>>>
>>>> On 06/07/2020 16:29, Manish G wrote:
>>>>
>>>> Job is an infinite streaming one, so it keeps going. Flink
>>>> configuration is as:
>>>>
>>>> metrics.reporter.slf4j.class:
>>>> org.apache.flink.metrics.slf4j.Slf4jReporter
>>>> metrics.reporter.slf4j.interval: 30 SECONDS
>>>>
>>>>
>>>>
>>>> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler 
>>>> wrote:
>>>>
>>>>> How long did the job run for, and what is the configured interval?
>>>>>
>>>>>
>>>>> On 06/07/2020 15:51, Manish G wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks for this.
>>>>>
>>>>> I did the configuration as mentioned at the link(changes in
>>>>> flink-conf.yml, copying the jar in lib directory), and registered the 
>>>>> Meter
>>>>> with metrics group and invoked markEvent() method in the target code. But 
>>>>> I
>>>>> don't see any related logs.
>>>>> I am doing this all on my local computer.
>>>>>
>>>>> Anything else I need to do?
>>>>>
>>>>> With regards
>>>>> Manish
>>>>>
>>>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler 
>>>>> wrote:
>>>>>
>>>>>> Have you looked at the SLF4J reporter?
>>>>>>
>>>>>>
>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>>>>
>>>>>> On 06/07/2020 13:49, Manish G wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > Is it possible to log Flink metrics in application logs apart from
>>>>>> > publishing it to Prometheus?
>>>>>> >
>>>>>> > With regards
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>


Re: Logging Flink metrics

2020-07-06 Thread Chesnay Schepler
   org.apache.flink.runtime.metrics.ReporterSetup -
Excluding reporter slf4j, not configured in reporter
list (prom).

Note that nowadays metrics.reporters is no longer
required; the set of reporters is automatically
determined based on configured properties; the only
use-case is disabling a reporter without having to
remove the entire configuration.
I'd suggest to just remove the option, try again, and
report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:

Please enable debug logging and search for warnings
from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can also
send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:

Job is an infinite streaming one, so it keeps going.
Flink configuration is as:

metrics.reporter.slf4j.class:
org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

How long did the job run for, and what is the
configured interval?


On 06/07/2020 15:51, Manish G wrote:

Hi,

Thanks for this.

I did the configuration as mentioned at the
link(changes in flink-conf.yml, copying the jar
in lib directory), and registered the Meter with
metrics group and invoked markEvent() method in
the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
mailto:ches...@apache.org>>
wrote:

Have you looked at the SLF4J reporter?


https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in
application logs apart from
> publishing it to Prometheus?
>
> With regards
















Re: Logging Flink metrics

2020-07-06 Thread Manish G
;>> org.apache.flink.runtime.metrics.ReporterSetup- Excluding
>>> reporter slf4j, not configured in reporter list (prom).
>>>
>>> Note that nowadays metrics.reporters is no longer required; the set of
>>> reporters is automatically determined based on configured properties; the
>>> only use-case is disabling a reporter without having to remove the entire
>>> configuration.
>>> I'd suggest to just remove the option, try again, and report back.
>>>
>>> On 06/07/2020 16:35, Chesnay Schepler wrote:
>>>
>>> Please enable debug logging and search for warnings from the metric
>>> groups/registry/reporter.
>>>
>>> If you cannot find anything suspicious, you can also send the foll log
>>> to me directly.
>>>
>>> On 06/07/2020 16:29, Manish G wrote:
>>>
>>> Job is an infinite streaming one, so it keeps going. Flink configuration
>>> is as:
>>>
>>> metrics.reporter.slf4j.class:
>>> org.apache.flink.metrics.slf4j.Slf4jReporter
>>> metrics.reporter.slf4j.interval: 30 SECONDS
>>>
>>>
>>>
>>> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler 
>>> wrote:
>>>
>>>> How long did the job run for, and what is the configured interval?
>>>>
>>>>
>>>> On 06/07/2020 15:51, Manish G wrote:
>>>>
>>>> Hi,
>>>>
>>>> Thanks for this.
>>>>
>>>> I did the configuration as mentioned at the link(changes in
>>>> flink-conf.yml, copying the jar in lib directory), and registered the Meter
>>>> with metrics group and invoked markEvent() method in the target code. But I
>>>> don't see any related logs.
>>>> I am doing this all on my local computer.
>>>>
>>>> Anything else I need to do?
>>>>
>>>> With regards
>>>> Manish
>>>>
>>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler 
>>>> wrote:
>>>>
>>>>> Have you looked at the SLF4J reporter?
>>>>>
>>>>>
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>>>
>>>>> On 06/07/2020 13:49, Manish G wrote:
>>>>> > Hi,
>>>>> >
>>>>> > Is it possible to log Flink metrics in application logs apart from
>>>>> > publishing it to Prometheus?
>>>>> >
>>>>> > With regards
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>


Re: Logging Flink metrics

2020-07-06 Thread Chesnay Schepler
 link(changes in flink-conf.yml, copying the jar in lib
directory), and registered the Meter with metrics
group and invoked markEvent() method in the target
code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

Have you looked at the SLF4J reporter?


https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in
application logs apart from
> publishing it to Prometheus?
>
> With regards














Re: Logging Flink metrics

2020-07-06 Thread Manish G
gt;>> flink-conf.yml, copying the jar in lib directory), and registered the Meter
>>> with metrics group and invoked markEvent() method in the target code. But I
>>> don't see any related logs.
>>> I am doing this all on my local computer.
>>>
>>> Anything else I need to do?
>>>
>>> With regards
>>> Manish
>>>
>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler 
>>> wrote:
>>>
>>>> Have you looked at the SLF4J reporter?
>>>>
>>>>
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>>
>>>> On 06/07/2020 13:49, Manish G wrote:
>>>> > Hi,
>>>> >
>>>> > Is it possible to log Flink metrics in application logs apart from
>>>> > publishing it to Prometheus?
>>>> >
>>>> > With regards
>>>>
>>>>
>>>>
>>>
>>
>>
>


Re: Logging Flink metrics

2020-07-06 Thread Chesnay Schepler
You've said elsewhere that you do see some metrics in prometheus, which 
are those?


Why are you configuring the host for the prometheus reporter? This 
option is only for the PrometheusPushGatewayReporter.


On 06/07/2020 18:01, Manish G wrote:

Hi,

So I have following in flink-conf.yml :
//
metrics.reporter.prom.class: 
org.apache.flink.metrics.prometheus.PrometheusReporter

metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//

And while I can see custom metrics in Taskmanager logs, but prometheus 
dashboard logs doesn't show custom metrics.


With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <mailto:ches...@apache.org>> wrote:


You have explicitly configured a reporter list, resulting in the
slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO
org.apache.flink.runtime.metrics.ReporterSetup - Excluding
reporter slf4j, not configured in reporter list (prom).

Note that nowadays metrics.reporters is no longer required; the
set of reporters is automatically determined based on configured
properties; the only use-case is disabling a reporter without
having to remove the entire configuration.
I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:

Please enable debug logging and search for warnings from the
metric groups/registry/reporter.

If you cannot find anything suspicious, you can also send the
foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:

Job is an infinite streaming one, so it keeps going. Flink
configuration is as:

metrics.reporter.slf4j.class:
org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

How long did the job run for, and what is the configured
interval?


On 06/07/2020 15:51, Manish G wrote:

Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in
flink-conf.yml, copying the jar in lib directory), and
registered the Meter with metrics group and invoked
markEvent() method in the target code. But I don't see any
related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

Have you looked at the SLF4J reporter?


https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
    > Is it possible to log Flink metrics in application
logs apart from
> publishing it to Prometheus?
>
> With regards












Re: Logging Flink metrics

2020-07-06 Thread Manish G
Hi,

So I have following in flink-conf.yml :
//
metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//

And while I can see custom metrics in Taskmanager logs, but prometheus
dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler  wrote:

> You have explicitly configured a reporter list, resulting in the slf4j
> reporter being ignored:
>
> 2020-07-06 13:48:22,191 INFO
> org.apache.flink.configuration.GlobalConfiguration- Loading
> configuration property: metrics.reporters, prom
> 2020-07-06 13:48:23,203 INFO
> org.apache.flink.runtime.metrics.ReporterSetup- Excluding
> reporter slf4j, not configured in reporter list (prom).
>
> Note that nowadays metrics.reporters is no longer required; the set of
> reporters is automatically determined based on configured properties; the
> only use-case is disabling a reporter without having to remove the entire
> configuration.
> I'd suggest to just remove the option, try again, and report back.
>
> On 06/07/2020 16:35, Chesnay Schepler wrote:
>
> Please enable debug logging and search for warnings from the metric
> groups/registry/reporter.
>
> If you cannot find anything suspicious, you can also send the foll log to
> me directly.
>
> On 06/07/2020 16:29, Manish G wrote:
>
> Job is an infinite streaming one, so it keeps going. Flink configuration
> is as:
>
> metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
> metrics.reporter.slf4j.interval: 30 SECONDS
>
>
>
> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler 
> wrote:
>
>> How long did the job run for, and what is the configured interval?
>>
>>
>> On 06/07/2020 15:51, Manish G wrote:
>>
>> Hi,
>>
>> Thanks for this.
>>
>> I did the configuration as mentioned at the link(changes in
>> flink-conf.yml, copying the jar in lib directory), and registered the Meter
>> with metrics group and invoked markEvent() method in the target code. But I
>> don't see any related logs.
>> I am doing this all on my local computer.
>>
>> Anything else I need to do?
>>
>> With regards
>> Manish
>>
>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler 
>> wrote:
>>
>>> Have you looked at the SLF4J reporter?
>>>
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>
>>> On 06/07/2020 13:49, Manish G wrote:
>>> > Hi,
>>> >
>>> > Is it possible to log Flink metrics in application logs apart from
>>> > publishing it to Prometheus?
>>> >
>>> > With regards
>>>
>>>
>>>
>>
>
>


Re: Logging Flink metrics

2020-07-06 Thread Chesnay Schepler
You have explicitly configured a reporter list, resulting in the slf4j 
reporter being ignored:


2020-07-06 13:48:22,191 INFO 
org.apache.flink.configuration.GlobalConfiguration    - Loading 
configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO 
org.apache.flink.runtime.metrics.ReporterSetup    - 
Excluding reporter slf4j, not configured in reporter list (prom).


Note that nowadays metrics.reporters is no longer required; the set of 
reporters is automatically determined based on configured properties; 
the only use-case is disabling a reporter without having to remove the 
entire configuration.

I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings from the metric 
groups/registry/reporter.


If you cannot find anything suspicious, you can also send the foll log 
to me directly.


On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink 
configuration is as:


metrics.reporter.slf4j.class: 
org.apache.flink.metrics.slf4j.Slf4jReporter

metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <mailto:ches...@apache.org>> wrote:


How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:

Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in
flink-conf.yml, copying the jar in lib directory), and
registered the Meter with metrics group and invoked markEvent()
method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

Have you looked at the SLF4J reporter?


https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs
apart from
> publishing it to Prometheus?
>
> With regards










Re: Logging Flink metrics

2020-07-06 Thread Chesnay Schepler
Please enable debug logging and search for warnings from the metric 
groups/registry/reporter.


If you cannot find anything suspicious, you can also send the foll log 
to me directly.


On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink 
configuration is as:


metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <mailto:ches...@apache.org>> wrote:


How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:

Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in
flink-conf.yml, copying the jar in lib directory), and registered
the Meter with metrics group and invoked markEvent() method in
the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

Have you looked at the SLF4J reporter?


https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs
apart from
> publishing it to Prometheus?
>
> With regards








Re: Logging Flink metrics

2020-07-06 Thread Manish G
Job is an infinite streaming one, so it keeps going. Flink configuration is
as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler  wrote:

> How long did the job run for, and what is the configured interval?
>
>
> On 06/07/2020 15:51, Manish G wrote:
>
> Hi,
>
> Thanks for this.
>
> I did the configuration as mentioned at the link(changes in
> flink-conf.yml, copying the jar in lib directory), and registered the Meter
> with metrics group and invoked markEvent() method in the target code. But I
> don't see any related logs.
> I am doing this all on my local computer.
>
> Anything else I need to do?
>
> With regards
> Manish
>
> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler 
> wrote:
>
>> Have you looked at the SLF4J reporter?
>>
>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>
>> On 06/07/2020 13:49, Manish G wrote:
>> > Hi,
>> >
>> > Is it possible to log Flink metrics in application logs apart from
>> > publishing it to Prometheus?
>> >
>> > With regards
>>
>>
>>
>


Re: Logging Flink metrics

2020-07-06 Thread Chesnay Schepler

How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:

Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in 
flink-conf.yml, copying the jar in lib directory), and registered the 
Meter with metrics group and invoked markEvent() method in the target 
code. But I don't see any related logs.

I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <mailto:ches...@apache.org>> wrote:


Have you looked at the SLF4J reporter?


https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards






Re: Logging Flink metrics

2020-07-06 Thread Manish G
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml,
copying the jar in lib directory), and registered the Meter with metrics
group and invoked markEvent() method in the target code. But I don't see
any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler  wrote:

> Have you looked at the SLF4J reporter?
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>
> On 06/07/2020 13:49, Manish G wrote:
> > Hi,
> >
> > Is it possible to log Flink metrics in application logs apart from
> > publishing it to Prometheus?
> >
> > With regards
>
>
>


Re: Logging Flink metrics

2020-07-06 Thread Chesnay Schepler

Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:

Hi,

Is it possible to log Flink metrics in application logs apart from 
publishing it to Prometheus?


With regards





Logging Flink metrics

2020-07-06 Thread Manish G
Hi,

Is it possible to log Flink metrics in application logs apart from
publishing it to Prometheus?

With regards


Re: Re: How to dynamically initialize flink metrics in invoke method and then reuse it?

2020-07-03 Thread Xintong Song
Ok, I see your problem. And yes, keeping a map of metrics should work.

Just for double checking, I assume there's an upper bound of your map keys
(table names)?
Because if not, an infinitely increasing in-memory map that is not managed
by Flink's state might become problematic.

Thank you~

Xintong Song



On Fri, Jul 3, 2020 at 2:39 PM wangl...@geekplus.com.cn <
wangl...@geekplus.com.cn> wrote:

>
> Seems there's no direct solution.
> Perhaps i can implement this by initializing a HashMap with
> all the possible value of tableName  in `open` mehtod and get the
> corresponding  Meter according to tableName in the `invoke` method.
>
>
> Thanks,
> Lei
> --
> wangl...@geekplus.com.cn
>
>
> *Sender:* wangl...@geekplus.com.cn
> *Send Time:* 2020-07-03 14:27
> *Receiver:* Xintong Song 
> *cc:* user 
> *Subject:* Re: Re: How to dynamically initialize flink metrics in invoke
> method and then reuse it?
> Hi Xintong,
>
> Yes, initializing the metric in the `open` method works, but it doesn't
> solve my problem.
> I want to initialize the metric with a name that is extracted from the
> record content. Only in the `invoke` method i can do it.
>
> Actually my scenario is as follows.
> The record is MySQL binlog info.  I want to monitor the qps by tableName.
> The tableName is different for every record.
>
> Thanks,
> Lei
>
>
> --
> wangl...@geekplus.com.cn
>
> *Sender:* Xintong Song 
> *Send Time:* 2020-07-03 13:14
> *Receiver:* wangl...@geekplus.com.cn
> *cc:* user 
> *Subject:* Re: How to dynamically initialize flink metrics in invoke
> method and then reuse it?
> Hi Lei,
>
> I think you should initialize the metric in the `open` method. Then you
> can save the initialized metric as a class field, and update it in the
> `invoke` method for each record.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Jul 3, 2020 at 11:50 AM wangl...@geekplus.com.cn <
> wangl...@geekplus.com.cn> wrote:
>
>>
>> In one flink operator, i want to initialize multiple flink metrics
>> according to message content.
>> As the code below.
>>
>> public void invoke(ObjectNode node, Context context) throws Exception {
>>
>> String tableName = node.get("metadata").get("topic").asText();
>> Meter meter = getRuntimeContext().getMetricGroup().meter(tableName,
>> new MeterView(10));
>> meter.markEvent();
>> log.info("### counter: " + meter.toString() + "\t" +
>> meter.getCount());
>>
>>
>> But in this way every invoke call will initialize a new metrics and the
>> count will be from zero again.
>> How can i reuse the metric initialized before?
>>
>> Thanks,
>> Lei
>> --
>> wangl...@geekplus.com.cn
>>
>>


Re: Re: How to dynamically initialize flink metrics in invoke method and then reuse it?

2020-07-03 Thread wangl...@geekplus.com.cn

Seems there's no direct solution.
Perhaps i can implement this by initializing a HashMap with all 
the possible value of tableName  in `open` mehtod and get the corresponding  
Meter according to tableName in the `invoke` method. 


Thanks,
Lei 


wangl...@geekplus.com.cn 
 
Sender: wangl...@geekplus.com.cn
Send Time: 2020-07-03 14:27
Receiver: Xintong Song
cc: user
Subject: Re: Re: How to dynamically initialize flink metrics in invoke method 
and then reuse it?
Hi Xintong, 

Yes, initializing the metric in the `open` method works, but it doesn't solve 
my problem. 
I want to initialize the metric with a name that is extracted from the record 
content. Only in the `invoke` method i can do it.

Actually my scenario is as follows.
The record is MySQL binlog info.  I want to monitor the qps by tableName. The 
tableName is different for every record. 

Thanks,
Lei




wangl...@geekplus.com.cn 

Sender: Xintong Song
Send Time: 2020-07-03 13:14
Receiver: wangl...@geekplus.com.cn
cc: user
Subject: Re: How to dynamically initialize flink metrics in invoke method and 
then reuse it?
Hi Lei,

I think you should initialize the metric in the `open` method. Then you can 
save the initialized metric as a class field, and update it in the `invoke` 
method for each record.

Thank you~
Xintong Song


On Fri, Jul 3, 2020 at 11:50 AM wangl...@geekplus.com.cn 
 wrote:

In one flink operator, i want to initialize multiple flink metrics according to 
message content. 
As the code below.

public void invoke(ObjectNode node, Context context) throws Exception {

String tableName = node.get("metadata").get("topic").asText();
Meter meter = getRuntimeContext().getMetricGroup().meter(tableName, new 
MeterView(10));
meter.markEvent();
log.info("### counter: " + meter.toString() + "\t" +  meter.getCount());


But in this way every invoke call will initialize a new metrics and the count 
will be from zero again.
How can i reuse the metric initialized before?

Thanks,
Lei


wangl...@geekplus.com.cn 



Re: Re: How to dynamically initialize flink metrics in invoke method and then reuse it?

2020-07-03 Thread wangl...@geekplus.com.cn
Hi Xintong, 

Yes, initializing the metric in the `open` method works, but it doesn't solve 
my problem. 
I want to initialize the metric with a name that is extracted from the record 
content. Only in the `invoke` method i can do it.

Actually my scenario is as follows.
The record is MySQL binlog info.  I want to monitor the qps by tableName. The 
tableName is different for every record. 

Thanks,
Lei




wangl...@geekplus.com.cn 

Sender: Xintong Song
Send Time: 2020-07-03 13:14
Receiver: wangl...@geekplus.com.cn
cc: user
Subject: Re: How to dynamically initialize flink metrics in invoke method and 
then reuse it?
Hi Lei,

I think you should initialize the metric in the `open` method. Then you can 
save the initialized metric as a class field, and update it in the `invoke` 
method for each record.

Thank you~
Xintong Song


On Fri, Jul 3, 2020 at 11:50 AM wangl...@geekplus.com.cn 
 wrote:

In one flink operator, i want to initialize multiple flink metrics according to 
message content. 
As the code below.

public void invoke(ObjectNode node, Context context) throws Exception {

String tableName = node.get("metadata").get("topic").asText();
Meter meter = getRuntimeContext().getMetricGroup().meter(tableName, new 
MeterView(10));
meter.markEvent();
log.info("### counter: " + meter.toString() + "\t" +  meter.getCount());


But in this way every invoke call will initialize a new metrics and the count 
will be from zero again.
How can i reuse the metric initialized before?

Thanks,
Lei


wangl...@geekplus.com.cn 



Re: How to dynamically initialize flink metrics in invoke method and then reuse it?

2020-07-02 Thread Xintong Song
Hi Lei,

I think you should initialize the metric in the `open` method. Then you can
save the initialized metric as a class field, and update it in the `invoke`
method for each record.

Thank you~

Xintong Song



On Fri, Jul 3, 2020 at 11:50 AM wangl...@geekplus.com.cn <
wangl...@geekplus.com.cn> wrote:

>
> In one flink operator, i want to initialize multiple flink metrics
> according to message content.
> As the code below.
>
> public void invoke(ObjectNode node, Context context) throws Exception {
>
> String tableName = node.get("metadata").get("topic").asText();
> Meter meter = getRuntimeContext().getMetricGroup().meter(tableName,
> new MeterView(10));
> meter.markEvent();
> log.info("### counter: " + meter.toString() + "\t" +
> meter.getCount());
>
>
> But in this way every invoke call will initialize a new metrics and the
> count will be from zero again.
> How can i reuse the metric initialized before?
>
> Thanks,
> Lei
> --
> wangl...@geekplus.com.cn
>
>


How to dynamically initialize flink metrics in invoke method and then reuse it?

2020-07-02 Thread wangl...@geekplus.com.cn

In one flink operator, i want to initialize multiple flink metrics according to 
message content. 
As the code below.

public void invoke(ObjectNode node, Context context) throws Exception {

String tableName = node.get("metadata").get("topic").asText();
Meter meter = getRuntimeContext().getMetricGroup().meter(tableName, new 
MeterView(10));
meter.markEvent();
log.info("### counter: " + meter.toString() + "\t" +  meter.getCount());


But in this way every invoke call will initialize a new metrics and the count 
will be from zero again.
How can i reuse the metric initialized before?

Thanks,
Lei


wangl...@geekplus.com.cn 



Re: Flink Metrics in kubernetes

2020-05-13 Thread Averell
Hi Gary,

Sorry for the false alarm. It's caused by a bug in my deployment - no
metrics were added into the registry.
Sorry for wasting your time.

Thanks and best regards,
Averell 



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Flink Metrics in kubernetes

2020-05-12 Thread Averell
Hi Gary,

Thanks for the help.
Here below is the output from jstack. It seems not being blocked. 



In my JobManager log, there's this WARN, I am not sure whether it's relevant
at all.


Attached is the full jstack dump  k8xDump.txt

 
.

Thanks and regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Flink Metrics in kubernetes

2020-05-12 Thread Gary Yao
Hi Averell,

If you are seeing the log message from [1] and Scheduled#report() is
not called, the thread in the "Flink-MetricRegistry" thread pool might
be blocked. You can use the jstack utility to see on which task the
thread pool is blocked.

Best,
Gary

[1] 
https://github.com/apache/flink/blob/e346215edcf2252cc60c5cef507ea77ce2ac9aca/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java#L141

On Tue, May 12, 2020 at 4:32 PM Averell  wrote:
>
> Hi,
>
> I'm trying to config Flink running in Kubernetes native to push some metrics
> to NewRelic (using a custom ScheduledDropwizardReporter).
>
> From the logs, I could see that an instance of ScheduledDropwizardReporter
> has already been created successfully (the overridden  getReporter() method
> <https://github.com/apache/flink/blob/e346215edcf2252cc60c5cef507ea77ce2ac9aca/flink-metrics/flink-metrics-dropwizard/src/main/java/org/apache/flink/dropwizard/ScheduledDropwizardReporter.java#L234>
> was called).
> An instance of  MetricRegistryImpl
> <https://github.com/apache/flink/blob/e346215edcf2252cc60c5cef507ea77ce2ac9aca/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java#L141>
> also created successfully (this log was shown: /Periodically reporting
> metrics in intervals of 30 SECONDS for reporter my_newrelic_reporter/)
>
> However, the  report() method
> <https://github.com/apache/flink/blob/e346215edcf2252cc60c5cef507ea77ce2ac9aca/flink-metrics/flink-metrics-core/src/main/java/org/apache/flink/metrics/reporter/Scheduled.java#L30>
> was not called.
>
> When running on my laptop, there's no issue at all.
> Are there any special things that I need to care for when running in
> Kubernetes?
>
> Thanks a lot.
>
> Regards,
> Averell
>
>
>
>
>
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Flink Metrics in kubernetes

2020-05-12 Thread Averell
Hi,

I'm trying to config Flink running in Kubernetes native to push some metrics
to NewRelic (using a custom ScheduledDropwizardReporter).

>From the logs, I could see that an instance of ScheduledDropwizardReporter
has already been created successfully (the overridden  getReporter() method
<https://github.com/apache/flink/blob/e346215edcf2252cc60c5cef507ea77ce2ac9aca/flink-metrics/flink-metrics-dropwizard/src/main/java/org/apache/flink/dropwizard/ScheduledDropwizardReporter.java#L234>
  
was called).
An instance of  MetricRegistryImpl
<https://github.com/apache/flink/blob/e346215edcf2252cc60c5cef507ea77ce2ac9aca/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java#L141>
  
also created successfully (this log was shown: /Periodically reporting
metrics in intervals of 30 SECONDS for reporter my_newrelic_reporter/)

However, the  report() method
<https://github.com/apache/flink/blob/e346215edcf2252cc60c5cef507ea77ce2ac9aca/flink-metrics/flink-metrics-core/src/main/java/org/apache/flink/metrics/reporter/Scheduled.java#L30>
  
was not called.

When running on my laptop, there's no issue at all.
Are there any special things that I need to care for when running in
Kubernetes?

Thanks a lot.

Regards,
Averell





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Prometheus pushgateway 监控 Flink metrics的问题

2020-05-11 Thread 李佳宸
十分感谢~~~但我确实RandomJobNameSuffix为true时没有问题,很奇怪。
另外,我使用prometheus reporter发现比pushgateway少了特别多的metrics,不知道您有这种情况吗?

972684638  于2020年5月12日周二 上午10:22写道:

> 我不清楚这算不算BUG,但是你说的问题,我确实遇到过,并经历了一段时间的排查,最终得以解决。
>
> 这跟metrics.reporter.promgateway.randomJobNameSuffix没有关系,建议你详细阅读一下pushgateway的官方文档,搞清楚推送方式GET和POST的区别。
>
> 然后去flink-metrics-prometheus包下面找到org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter#report这个方法,将它推送方式修改一下,重新打包,就可以了。很高兴能帮到你。
> 详细排查过程,参考我的文章:
> https://daijiguo.blog.csdn.net/article/details/105453643
>
>
>
>
>
>
> --原始邮件--
> 发件人:"李佳宸" 发送时间:2020年5月12日(星期二) 上午8:57
> 收件人:"user-zh"
> 主题:Prometheus pushgateway 监控 Flink metrics的问题
>
>
>
> 您好!
>
> 我在使用prometheus监控flink时发现一个问题不知是不是bug,反映如下
>
> 版本信息
> Flink 1.9.1
> Prometheus 2.18
> pushgateway 1.2.0
>
> 问题:
> 配置
>
> metrics.reporter.promgateway.randomJobNameSuffix为false后,部分metrics不能正确的push到pushgateway里。具体表现是,部分metrics(主要是jobmanager相关,如
> flink_jobmanager_Status_JVM_CPU_Load
> ),无法持久的存在pushgateway中,频繁刷新发现指标一会儿消失,一会儿又出现。还有部分指标直接丢失了,如
> flink_jobmanager_job_fullRestarts。
>
> metrics.reporter.promgateway.randomJobNameSuffix设置为true时,功能是正常的。
>
> 以下是我的相关配置:
> metrics.reporter.promgateway.class:
> org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
> metrics.reporter.promgateway.host: localhost
> metrics.reporter.promgateway.port: 9091
> metrics.reporter.promgateway.jobName: cluster1
> metrics.reporter.promgateway.randomJobNameSuffix: *false*
> metrics.reporter.promgateway.deleteOnShutdown: *false*
>
> 望能解决我的疑惑,谢谢


??????Prometheus pushgateway ???? Flink metrics??????

2020-05-11 Thread 972684638
BUG??
metrics.reporter.promgateway.randomJobNameSuffixpushgateway??GET??POST
??flink-metrics-prometheus??org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter#report

https://daijiguo.blog.csdn.net/article/details/105453643






----
??:"??"

Prometheus pushgateway 监控 Flink metrics的问题

2020-05-11 Thread 李佳宸
您好!

我在使用prometheus监控flink时发现一个问题不知是不是bug,反映如下

版本信息
Flink 1.9.1
Prometheus 2.18
pushgateway 1.2.0

问题:
配置
metrics.reporter.promgateway.randomJobNameSuffix为false后,部分metrics不能正确的push到pushgateway里。具体表现是,部分metrics(主要是jobmanager相关,如
flink_jobmanager_Status_JVM_CPU_Load
),无法持久的存在pushgateway中,频繁刷新发现指标一会儿消失,一会儿又出现。还有部分指标直接丢失了,如
flink_jobmanager_job_fullRestarts。

metrics.reporter.promgateway.randomJobNameSuffix设置为true时,功能是正常的。

以下是我的相关配置:
metrics.reporter.promgateway.class:
org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
metrics.reporter.promgateway.host: localhost
metrics.reporter.promgateway.port: 9091
metrics.reporter.promgateway.jobName: cluster1
metrics.reporter.promgateway.randomJobNameSuffix: *false*
metrics.reporter.promgateway.deleteOnShutdown: *false*

望能解决我的疑惑,谢谢


Re: Flink Metrics - PrometheusReporter

2020-01-22 Thread Sidney Feiner
Ok, I configured the PrometheusReporter's ports to be a range and now every 
TaskManager has it's own port where I can see it's metrics. Thank you very much!


Sidney Feiner / Data Platform Developer
M: +972.528197720 / Skype: sidney.feiner.startapp

[emailsignature]



From: Chesnay Schepler 
Sent: Wednesday, January 22, 2020 6:07 PM
To: Sidney Feiner ; flink-u...@apache.org 

Subject: Re: Flink Metrics - PrometheusReporter

Metrics are exposed via reporters by each process separately, whereas the WebUI 
aggregates metrics.

As such you have to configure Prometheus to also scrape the TaskExecutors.

On 22/01/2020 16:58, Sidney Feiner wrote:
Hey,
I've been trying to use the PrometheusReporter and when I used in locally on my 
computer, I would access the port I configured and see all the metrics I've 
created.
In production, we use High Availability mode and when I try to access the 
JobManager's metrics in the port I've configured on the PrometheusReporter, I 
see some very basic metrics - default Flink metrics, but I can't see any of my 
custom metrics.

Weird thing is I can see those metrics through Flink's UI in the Metrics tab:
[cid:part1.8D6219CF.AA6B4229@apache.org]

Does anybody have a clue why my custom metrics are configured but not being 
reported in high availability but are reported when I run the job locally 
though IntelliJ?

Thanks 



Sidney Feiner / Data Platform Developer
M: +972.528197720 / Skype: sidney.feiner.startapp

[emailsignature]




Re: Flink Metrics - PrometheusReporter

2020-01-22 Thread Chesnay Schepler
Metrics are exposed via reporters by each process separately, whereas 
the WebUI aggregates metrics.


As such you have to configure Prometheus to also scrape the TaskExecutors.

On 22/01/2020 16:58, Sidney Feiner wrote:

Hey,
I've been trying to use the PrometheusReporter and when I used in 
locally on my computer, I would access the port I configured and see 
all the metrics I've created.
In production, we use High Availability mode and when I try to access 
the JobManager's metrics in the port I've configured on the 
PrometheusReporter, I see some very basic metrics - default Flink 
metrics, but I can't see any of my custom metrics.


Weird thing is I can see those metrics through Flink's UI in the 
Metrics tab:



Does anybody have a clue why my custom metrics are configured but not 
being reported in high availability but are reported when I run the 
job locally though IntelliJ?


Thanks 



*Sidney Feiner**/*Data Platform Developer
M: +972.528197720 */*Skype: sidney.feiner.startapp
emailsignature





Flink Metrics - PrometheusReporter

2020-01-22 Thread Sidney Feiner
Hey,
I've been trying to use the PrometheusReporter and when I used in locally on my 
computer, I would access the port I configured and see all the metrics I've 
created.
In production, we use High Availability mode and when I try to access the 
JobManager's metrics in the port I've configured on the PrometheusReporter, I 
see some very basic metrics - default Flink metrics, but I can't see any of my 
custom metrics.

Weird thing is I can see those metrics through Flink's UI in the Metrics tab:
[cid:dc6050e2-a947-4856-8339-5daea66b6a77]

Does anybody have a clue why my custom metrics are configured but not being 
reported in high availability but are reported when I run the job locally 
though IntelliJ?

Thanks 




Sidney Feiner / Data Platform Developer
M: +972.528197720 / Skype: sidney.feiner.startapp

[emailsignature]



Re: Does Flink Metrics provide information about each records inserted into the database

2020-01-18 Thread Flavio Pompermaier
What about using an accumulator? Does it work for you needs?

Il Sab 18 Gen 2020, 10:03 Soheil Pourbafrani  ha
scritto:

> Hi,
>
> I'm using Flink to insert some processed records into the database. I need
> to have some aggregated information about records inserted into the
> database so far. For example, for a specific column value, I need to know
> how many records have been inserted. Can I use the Flink Matrics to provide
> this information?
>
> Thanks
>


Does Flink Metrics provide information about each records inserted into the database

2020-01-18 Thread Soheil Pourbafrani
Hi,

I'm using Flink to insert some processed records into the database. I need
to have some aggregated information about records inserted into the
database so far. For example, for a specific column value, I need to know
how many records have been inserted. Can I use the Flink Matrics to provide
this information?

Thanks


Re:Re: 回复:使用influxdb作为flink metrics reporter

2020-01-06 Thread 张江
好的,多谢
在 2020-01-06 01:38:22,"Yun Tang"  写道:
>Hi 张江
>
>这个invalid boolean 
>一般是tag和field中间穿插空格有关,导致influxDB识别匹配的时候出了问题,你的原始报错信息是什么,不要隐去你的operator 
>name和task name,另外task_id= 后面的那个空格是你粘贴时候的错误还是原先就是这样。
>
>最后,这些只会是warning,不会导致你的其他metrics数据无法插入,不影响整体使用。
>
>祝好
>唐云
>
>
>From: 张江 
>Sent: Saturday, January 4, 2020 19:14
>To: user-zh ; myas...@live.com 
>Subject: 回复:使用influxdb作为flink metrics reporter
>
>
>你好,
>
>
>我看我这里报错的问题是invalid boolean,并不是NaN/infinity value造成的,不知道是什么原因?
>
>
>而且我用的flink是1.9.1版本,influxdb是1.7.9版本。
>
>
>祝好,
>
><https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1=%E5%BC%A0%E6%B1%9F=zjkingdom2010%40163.com=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Azjkingdom2010%40163.com%22%5D>
>[https://mail-online.nosdn.127.net/qiyelogo/defaultAvatar.png]
>张江
>邮箱:zjkingdom2...@163.com
>
>签名由 网易邮箱大师<https://mail.163.com/dashi/dlpro.html?from=mail88> 定制
>
>在2020年01月04日 00:56,Yun Tang<mailto:myas...@live.com> 写道:
>Hi 张江
>
>
> *   Retention policy 需要现在InfluxDB端创建,InfluxDBReporter不会自行创建不存在的 retention 
> policy.
> *   kafka的一些metrics在使用influxDB reporter的时候,会出现一些cast exception,可以参考 
> [1],在Flink-1.9 版本下可以忽略这些异常。
>
>[1] https://issues.apache.org/jira/browse/FLINK-12147
>
>祝好
>唐云
>________
>From: 张江 
>Sent: Friday, January 3, 2020 21:22
>To: user-zh@flink.apache.org 
>Subject: 使用influxdb作为flink metrics reporter
>
>大家好,
>
>
>我按照官网所介绍的flink metrics reporter设置,选用了influxdb,进行了如下设置:
>metrics.reporter.influxdb.class:org.apache.flink.metrics.influxdb.InfluxdbReportermetrics.reporter.influxdb.host:localhostmetrics.reporter.influxdb.port:8086metrics.reporter.influxdb.db:flinkmetrics.reporter.influxdb.username:flink-metrics
>metrics.reporter.influxdb.password:qwerty
>metrics.reporter.influxdb.retentionPolicy:one_hour
>但是,启动flink作业(on yarn per job模式)和flinxdb后,发现一直报错:
>error  [500] - "retention policy not found: one_hour" {"log_id": 
>"OK6nejJI000", "service": "httpd"} [httpd] 10.90.*.* - flinkuser 
>[03/Jan/2020:19:35:58 +0800] "POST /write? 
>db=flink=one_hour=n=one HTTP/1.1" 500 49 "-" 
>"okhttp/3.11.0" 3637af63-2e1d-11ea-802a-000c2947e206 165
>
>
>我使用的是 flink 1.9.1,influxdb版本是1.79.
>
>
>而且,当我不设置retentionPolicy时,还是会报错,提示:
>org.apache.flink.metrics.influxdb.shaded.org.influxdb.InfluxDBException$UnableToParseException:
> partial write: unable to parse 
>"taskmanager_job_task_operator_sync-time-avg,host=master,job_id=03136f4c1a78e9930262b455ef0657e2,job_name=Flink-app,operator_id=cbc357ccb763df2852fee8c4fc7d55f2,operator_name=XXX,task_attempt_num=0,task_id=
> 
>cbc357ccb763df2852fee8c4fc7d55f2,task_name=XX,tm_id=container_1577507646998_0054_01_02
> value=? 157805124760500": invalid boolean
>
>
>求问各位大佬,这些问题怎么解决?
>谢谢
>
>
>祝好,
>
>
>


Re: 回复:使用influxdb作为flink metrics reporter

2020-01-05 Thread Yun Tang
Hi 张江

这个invalid boolean 
一般是tag和field中间穿插空格有关,导致influxDB识别匹配的时候出了问题,你的原始报错信息是什么,不要隐去你的operator name和task 
name,另外task_id= 后面的那个空格是你粘贴时候的错误还是原先就是这样。

最后,这些只会是warning,不会导致你的其他metrics数据无法插入,不影响整体使用。

祝好
唐云


From: 张江 
Sent: Saturday, January 4, 2020 19:14
To: user-zh ; myas...@live.com 
Subject: 回复:使用influxdb作为flink metrics reporter


你好,


我看我这里报错的问题是invalid boolean,并不是NaN/infinity value造成的,不知道是什么原因?


而且我用的flink是1.9.1版本,influxdb是1.7.9版本。


祝好,

<https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1=%E5%BC%A0%E6%B1%9F=zjkingdom2010%40163.com=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Azjkingdom2010%40163.com%22%5D>
[https://mail-online.nosdn.127.net/qiyelogo/defaultAvatar.png]
张江
邮箱:zjkingdom2...@163.com

签名由 网易邮箱大师<https://mail.163.com/dashi/dlpro.html?from=mail88> 定制

在2020年01月04日 00:56,Yun Tang<mailto:myas...@live.com> 写道:
Hi 张江


 *   Retention policy 需要现在InfluxDB端创建,InfluxDBReporter不会自行创建不存在的 retention 
policy.
 *   kafka的一些metrics在使用influxDB reporter的时候,会出现一些cast exception,可以参考 
[1],在Flink-1.9 版本下可以忽略这些异常。

[1] https://issues.apache.org/jira/browse/FLINK-12147

祝好
唐云

From: 张江 
Sent: Friday, January 3, 2020 21:22
To: user-zh@flink.apache.org 
Subject: 使用influxdb作为flink metrics reporter

大家好,


我按照官网所介绍的flink metrics reporter设置,选用了influxdb,进行了如下设置:
metrics.reporter.influxdb.class:org.apache.flink.metrics.influxdb.InfluxdbReportermetrics.reporter.influxdb.host:localhostmetrics.reporter.influxdb.port:8086metrics.reporter.influxdb.db:flinkmetrics.reporter.influxdb.username:flink-metrics
metrics.reporter.influxdb.password:qwerty
metrics.reporter.influxdb.retentionPolicy:one_hour
但是,启动flink作业(on yarn per job模式)和flinxdb后,发现一直报错:
error  [500] - "retention policy not found: one_hour" {"log_id": "OK6nejJI000", 
"service": "httpd"} [httpd] 10.90.*.* - flinkuser [03/Jan/2020:19:35:58 +0800] 
"POST /write? db=flink=one_hour=n=one HTTP/1.1" 500 49 
"-" "okhttp/3.11.0" 3637af63-2e1d-11ea-802a-000c2947e206 165


我使用的是 flink 1.9.1,influxdb版本是1.79.


而且,当我不设置retentionPolicy时,还是会报错,提示:
org.apache.flink.metrics.influxdb.shaded.org.influxdb.InfluxDBException$UnableToParseException:
 partial write: unable to parse 
"taskmanager_job_task_operator_sync-time-avg,host=master,job_id=03136f4c1a78e9930262b455ef0657e2,job_name=Flink-app,operator_id=cbc357ccb763df2852fee8c4fc7d55f2,operator_name=XXX,task_attempt_num=0,task_id=
 
cbc357ccb763df2852fee8c4fc7d55f2,task_name=XX,tm_id=container_1577507646998_0054_01_02
 value=? 157805124760500": invalid boolean


求问各位大佬,这些问题怎么解决?
谢谢


祝好,





回复:使用influxdb作为flink metrics reporter

2020-01-04 Thread 张江
你好,




我看我这里报错的问题是invalid boolean,并不是NaN/infinity value造成的,不知道是什么原因?




而且我用的flink是1.9.1版本,influxdb是1.7.9版本。




祝好,



| |
张江
|
|
邮箱:zjkingdom2...@163.com
|

签名由 网易邮箱大师 定制

在2020年01月04日 00:56,Yun Tang 写道:
Hi 张江


 *   Retention policy 需要现在InfluxDB端创建,InfluxDBReporter不会自行创建不存在的 retention 
policy.
 *   kafka的一些metrics在使用influxDB reporter的时候,会出现一些cast exception,可以参考 
[1],在Flink-1.9 版本下可以忽略这些异常。

[1] https://issues.apache.org/jira/browse/FLINK-12147

祝好
唐云

From: 张江 
Sent: Friday, January 3, 2020 21:22
To: user-zh@flink.apache.org 
Subject: 使用influxdb作为flink metrics reporter

大家好,


我按照官网所介绍的flink metrics reporter设置,选用了influxdb,进行了如下设置:
metrics.reporter.influxdb.class:org.apache.flink.metrics.influxdb.InfluxdbReportermetrics.reporter.influxdb.host:localhostmetrics.reporter.influxdb.port:8086metrics.reporter.influxdb.db:flinkmetrics.reporter.influxdb.username:flink-metrics
metrics.reporter.influxdb.password:qwerty
metrics.reporter.influxdb.retentionPolicy:one_hour
但是,启动flink作业(on yarn per job模式)和flinxdb后,发现一直报错:
error  [500] - "retention policy not found: one_hour" {"log_id": "OK6nejJI000", 
"service": "httpd"} [httpd] 10.90.*.* - flinkuser [03/Jan/2020:19:35:58 +0800] 
"POST /write? db=flink=one_hour=n=one HTTP/1.1" 500 49 
"-" "okhttp/3.11.0" 3637af63-2e1d-11ea-802a-000c2947e206 165


我使用的是 flink 1.9.1,influxdb版本是1.79.


而且,当我不设置retentionPolicy时,还是会报错,提示:
org.apache.flink.metrics.influxdb.shaded.org.influxdb.InfluxDBException$UnableToParseException:
 partial write: unable to parse 
"taskmanager_job_task_operator_sync-time-avg,host=master,job_id=03136f4c1a78e9930262b455ef0657e2,job_name=Flink-app,operator_id=cbc357ccb763df2852fee8c4fc7d55f2,operator_name=XXX,task_attempt_num=0,task_id=
 
cbc357ccb763df2852fee8c4fc7d55f2,task_name=XX,tm_id=container_1577507646998_0054_01_02
 value=? 157805124760500": invalid boolean


求问各位大佬,这些问题怎么解决?
谢谢


祝好,





Re: 使用influxdb作为flink metrics reporter

2020-01-03 Thread Yun Tang
Hi 张江


  *   Retention policy 需要现在InfluxDB端创建,InfluxDBReporter不会自行创建不存在的 retention 
policy.
  *   kafka的一些metrics在使用influxDB reporter的时候,会出现一些cast exception,可以参考 
[1],在Flink-1.9 版本下可以忽略这些异常。

[1] https://issues.apache.org/jira/browse/FLINK-12147

祝好
唐云

From: 张江 
Sent: Friday, January 3, 2020 21:22
To: user-zh@flink.apache.org 
Subject: 使用influxdb作为flink metrics reporter

大家好,


我按照官网所介绍的flink metrics reporter设置,选用了influxdb,进行了如下设置:
metrics.reporter.influxdb.class:org.apache.flink.metrics.influxdb.InfluxdbReportermetrics.reporter.influxdb.host:localhostmetrics.reporter.influxdb.port:8086metrics.reporter.influxdb.db:flinkmetrics.reporter.influxdb.username:flink-metrics
metrics.reporter.influxdb.password:qwerty
metrics.reporter.influxdb.retentionPolicy:one_hour
但是,启动flink作业(on yarn per job模式)和flinxdb后,发现一直报错:
error  [500] - "retention policy not found: one_hour" {"log_id": "OK6nejJI000", 
"service": "httpd"} [httpd] 10.90.*.* - flinkuser [03/Jan/2020:19:35:58 +0800] 
"POST /write? db=flink=one_hour=n=one HTTP/1.1" 500 49 
"-" "okhttp/3.11.0" 3637af63-2e1d-11ea-802a-000c2947e206 165


我使用的是 flink 1.9.1,influxdb版本是1.79.


而且,当我不设置retentionPolicy时,还是会报错,提示:
org.apache.flink.metrics.influxdb.shaded.org.influxdb.InfluxDBException$UnableToParseException:
 partial write: unable to parse 
"taskmanager_job_task_operator_sync-time-avg,host=master,job_id=03136f4c1a78e9930262b455ef0657e2,job_name=Flink-app,operator_id=cbc357ccb763df2852fee8c4fc7d55f2,operator_name=XXX,task_attempt_num=0,task_id=
 
cbc357ccb763df2852fee8c4fc7d55f2,task_name=XX,tm_id=container_1577507646998_0054_01_02
 value=? 157805124760500": invalid boolean


求问各位大佬,这些问题怎么解决?
谢谢


祝好,





使用influxdb作为flink metrics reporter

2020-01-03 Thread 张江
大家好,


我按照官网所介绍的flink metrics reporter设置,选用了influxdb,进行了如下设置:
metrics.reporter.influxdb.class:org.apache.flink.metrics.influxdb.InfluxdbReportermetrics.reporter.influxdb.host:localhostmetrics.reporter.influxdb.port:8086metrics.reporter.influxdb.db:flinkmetrics.reporter.influxdb.username:flink-metrics
metrics.reporter.influxdb.password:qwerty 
metrics.reporter.influxdb.retentionPolicy:one_hour
但是,启动flink作业(on yarn per job模式)和flinxdb后,发现一直报错:
error  [500] - "retention policy not found: one_hour" {"log_id": "OK6nejJI000", 
"service": "httpd"} [httpd] 10.90.*.* - flinkuser [03/Jan/2020:19:35:58 +0800] 
"POST /write? db=flink=one_hour=n=one HTTP/1.1" 500 49 
"-" "okhttp/3.11.0" 3637af63-2e1d-11ea-802a-000c2947e206 165


我使用的是 flink 1.9.1,influxdb版本是1.79.


而且,当我不设置retentionPolicy时,还是会报错,提示:
org.apache.flink.metrics.influxdb.shaded.org.influxdb.InfluxDBException$UnableToParseException:
 partial write: unable to parse 
"taskmanager_job_task_operator_sync-time-avg,host=master,job_id=03136f4c1a78e9930262b455ef0657e2,job_name=Flink-app,operator_id=cbc357ccb763df2852fee8c4fc7d55f2,operator_name=XXX,task_attempt_num=0,task_id=
 
cbc357ccb763df2852fee8c4fc7d55f2,task_name=XX,tm_id=container_1577507646998_0054_01_02
 value=? 157805124760500": invalid boolean


求问各位大佬,这些问题怎么解决?
谢谢


祝好,





Re: Apache Flink - Flink Metrics collection using Prometheus on EMR from streaming mode

2019-12-25 Thread M Singh
 Thanks Vino and Rafi for your references.
Regarding push gateway recommendations for batch - I am following this 
reference (https://prometheus.io/docs/practices/pushing/).
The scenario that I have is that we start Flink Apps on EMR whenever we need 
them. Sometimes the task manager gets killed and then restarted on another 
node.  In order to keep up with registering new task/job managers and 
de-registering the stopped/removed ones, I wanted to see if there is any 
service discovery integration with Flink apps.  
Thanks again for your help and let me know if you have any additional pointers.
On Wednesday, December 25, 2019, 03:39:31 AM EST, Rafi Aroch 
 wrote:  
 
 Hi,
Take a look here: https://github.com/eastcirclek/flink-service-discovery
I used it successfully quite a while ago, so things might have changed since.
Thanks, Rafi 
On Wed, Dec 25, 2019, 05:54 vino yang  wrote:

Hi Mans,
IMO, the mechanism of metrics reporter does not depend on any deployment mode.
>> is there any Prometheus configuration or service discovery option available 
>>that will dynamically pick up the metrics from the Filnk job and task 
>>managers running in cluster ?
Can you share more information about your scene?
>> I believe for a batch job I can configure flink config to use Prometheus 
>>gateway configuration but I think this is not recommended for a streaming job.
What does this mean? Why the Prometheus gateway configuration for Flink batch 
job is not recommended for a streaming job?
Best,Vino
M Singh  于2019年12月24日周二 下午4:02写道:

Hi:
I wanted to find out what's the best way of collecting Flink metrics using 
Prometheus in a streaming application on EMR/Hadoop.
Since the Flink streaming jobs could be running on any node - is there any 
Prometheus configuration or service discovery option available that will 
dynamically pick up the metrics from the Filnk job and task managers running in 
cluster ?  
I believe for a batch job I can configure flink config to use Prometheus 
gateway configuration but I think this is not recommended for a streaming job.
Please let me know if you have any advice.
Thanks
Mans

  

Re: Apache Flink - Flink Metrics collection using Prometheus on EMR from streaming mode

2019-12-25 Thread Rafi Aroch
Hi,

Take a look here: https://github.com/eastcirclek/flink-service-discovery

I used it successfully quite a while ago, so things might have changed
since.

Thanks,
Rafi

On Wed, Dec 25, 2019, 05:54 vino yang  wrote:

> Hi Mans,
>
> IMO, the mechanism of metrics reporter does not depend on any deployment
> mode.
>
> >> is there any Prometheus configuration or service discovery option
> available that will dynamically pick up the metrics from the Filnk job and
> task managers running in cluster ?
>
> Can you share more information about your scene?
>
> >> I believe for a batch job I can configure flink config to use
> Prometheus gateway configuration but I think this is not recommended for a
> streaming job.
>
> What does this mean? Why the Prometheus gateway configuration for Flink
> batch job is not recommended for a streaming job?
>
> Best,
> Vino
>
> M Singh  于2019年12月24日周二 下午4:02写道:
>
>> Hi:
>>
>> I wanted to find out what's the best way of collecting Flink metrics
>> using Prometheus in a streaming application on EMR/Hadoop.
>>
>> Since the Flink streaming jobs could be running on any node - is there
>> any Prometheus configuration or service discovery option available that
>> will dynamically pick up the metrics from the Filnk job and task managers
>> running in cluster ?
>>
>> I believe for a batch job I can configure flink config to use Prometheus
>> gateway configuration but I think this is not recommended for a streaming
>> job.
>>
>> Please let me know if you have any advice.
>>
>> Thanks
>>
>> Mans
>>
>


Re: Apache Flink - Flink Metrics collection using Prometheus on EMR from streaming mode

2019-12-24 Thread vino yang
Hi Mans,

IMO, the mechanism of metrics reporter does not depend on any deployment
mode.

>> is there any Prometheus configuration or service discovery option
available that will dynamically pick up the metrics from the Filnk job and
task managers running in cluster ?

Can you share more information about your scene?

>> I believe for a batch job I can configure flink config to use Prometheus
gateway configuration but I think this is not recommended for a streaming
job.

What does this mean? Why the Prometheus gateway configuration for Flink
batch job is not recommended for a streaming job?

Best,
Vino

M Singh  于2019年12月24日周二 下午4:02写道:

> Hi:
>
> I wanted to find out what's the best way of collecting Flink metrics using
> Prometheus in a streaming application on EMR/Hadoop.
>
> Since the Flink streaming jobs could be running on any node - is there any
> Prometheus configuration or service discovery option available that will
> dynamically pick up the metrics from the Filnk job and task managers
> running in cluster ?
>
> I believe for a batch job I can configure flink config to use Prometheus
> gateway configuration but I think this is not recommended for a streaming
> job.
>
> Please let me know if you have any advice.
>
> Thanks
>
> Mans
>


Apache Flink - Flink Metrics collection using Prometheus on EMR from streaming mode

2019-12-24 Thread M Singh
Hi:
I wanted to find out what's the best way of collecting Flink metrics using 
Prometheus in a streaming application on EMR/Hadoop.
Since the Flink streaming jobs could be running on any node - is there any 
Prometheus configuration or service discovery option available that will 
dynamically pick up the metrics from the Filnk job and task managers running in 
cluster ?  
I believe for a batch job I can configure flink config to use Prometheus 
gateway configuration but I think this is not recommended for a streaming job.
Please let me know if you have any advice.
Thanks
Mans

Re: Apache Flink - Flink Metrics - How to distinguish b/w metrics for two job manager on the same host

2019-12-19 Thread M Singh
 Thanks Vino and Biao for your help.  Mans
On Thursday, December 19, 2019, 02:25:40 AM EST, Biao Liu 
 wrote:  
 
 Hi Mans,
That's indeed a problem. We have a plan to fix it. I think it could be included 
in 1.11. You could follow this issue [1] to check the progress. 
[1] https://issues.apache.org/jira/browse/FLINK-9543

Thanks,Biao /'bɪ.aʊ/


On Thu, 19 Dec 2019 at 14:51, vino yang  wrote:

Hi Mans,
IMO, one job manager represents one Flink cluster and one Flink cluster has a 
suite of Flink configuration e.g. metrics reporter.
Some metrics reporters support tag feature, you can specify it to distinguish 
different Flink cluster.[1]
[1]: 
https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#datadog-orgapacheflinkmetricsdatadogdatadoghttpreporter
Best,Vino
M Singh  于2019年12月19日周四 上午2:54写道:

Hi:
I am using AWS EMR with Flink application and two of the job managers are 
running on the same host.  I am looking at the metrics documentation (Apache 
Flink 1.9 Documentation: Metrics) and and see the following: 

| 
| 
|  | 
Apache Flink 1.9 Documentation: Metrics


 |

 |

 |

   
   - metrics.scope.jm  
  - Default: .jobmanager
  - Applied to all metrics that were scoped to a job manager.
  - 

...
List of all Variables
   
   - JobManager: 
   - TaskManager: , 
   - Job: , 
   - Task: , , , , 

   - Operator: ,, 


My question is there a way to distinguish b/w the two job managers ? I see only 
the  variable for JobManager and since the two are running on the same 
host, the value is the same.  Is there any other variable that I can use to 
distinguish the two.

For taskmanager I have taskmanager id but am not sure about the job manager.
Thanks
Mans


  

Re: Apache Flink - Flink Metrics - How to distinguish b/w metrics for two job manager on the same host

2019-12-18 Thread Biao Liu
Hi Mans,

That's indeed a problem. We have a plan to fix it. I think it could be
included in 1.11. You could follow this issue [1] to check the progress.

[1] https://issues.apache.org/jira/browse/FLINK-9543

Thanks,
Biao /'bɪ.aʊ/



On Thu, 19 Dec 2019 at 14:51, vino yang  wrote:

> Hi Mans,
>
> IMO, one job manager represents one Flink cluster and one Flink cluster
> has a suite of Flink configuration e.g. metrics reporter.
>
> Some metrics reporters support tag feature, you can specify it to
> distinguish different Flink cluster.[1]
>
> [1]:
> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#datadog-orgapacheflinkmetricsdatadogdatadoghttpreporter
>
> Best,
> Vino
>
> M Singh  于2019年12月19日周四 上午2:54写道:
>
>> Hi:
>>
>> I am using AWS EMR with Flink application and two of the job managers are
>> running on the same host.  I am looking at the metrics documentation (Apache
>> Flink 1.9 Documentation: Metrics
>> )
>> and and see the following:
>>
>> Apache Flink 1.9 Documentation: Metrics
>>
>>
>> 
>>
>>- metrics.scope.jm
>>   - Default: .jobmanager
>>   - Applied to all metrics that were scoped to a job manager.
>>   -
>>
>> ...
>> List of all Variables
>> 
>>
>>- JobManager: 
>>- TaskManager: , 
>>- Job: , 
>>- Task: , , ,
>>, 
>>- Operator: ,, 
>>
>>
>>
>> My question is there a way to distinguish b/w the two job managers ? I
>> see only the  variable for JobManager and since the two are running
>> on the same host, the value is the same.  Is there any other variable that
>> I can use to distinguish the two.
>>
>> For taskmanager I have taskmanager id but am not sure about the job
>> manager.
>>
>> Thanks
>>
>> Mans
>>
>>


Re: Apache Flink - Flink Metrics - How to distinguish b/w metrics for two job manager on the same host

2019-12-18 Thread vino yang
Hi Mans,

IMO, one job manager represents one Flink cluster and one Flink cluster has
a suite of Flink configuration e.g. metrics reporter.

Some metrics reporters support tag feature, you can specify it to
distinguish different Flink cluster.[1]

[1]:
https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#datadog-orgapacheflinkmetricsdatadogdatadoghttpreporter

Best,
Vino

M Singh  于2019年12月19日周四 上午2:54写道:

> Hi:
>
> I am using AWS EMR with Flink application and two of the job managers are
> running on the same host.  I am looking at the metrics documentation (Apache
> Flink 1.9 Documentation: Metrics
> )
> and and see the following:
>
> Apache Flink 1.9 Documentation: Metrics
>
>
> 
>
>- metrics.scope.jm
>   - Default: .jobmanager
>   - Applied to all metrics that were scoped to a job manager.
>   -
>
> ...
> List of all Variables
> 
>
>- JobManager: 
>- TaskManager: , 
>- Job: , 
>- Task: , , , ,
>
>- Operator: ,, 
>
>
>
> My question is there a way to distinguish b/w the two job managers ? I see
> only the  variable for JobManager and since the two are running on
> the same host, the value is the same.  Is there any other variable that I
> can use to distinguish the two.
>
> For taskmanager I have taskmanager id but am not sure about the job
> manager.
>
> Thanks
>
> Mans
>
>


  1   2   >