退订
On Wed, Aug 30, 2023 at 19:14 allanqinjy wrote:
> hi,
>请教大家一个问题,就是在上报指标到prometheus时候,jobname会随机生成一个后缀,看源码也是new Abstract
> ID(),有方法在这里获取本次上报的作业applicationid吗?
hi,
可以尝试获取下 _APP_ID 这个 JVM 环境变量.
System.getenv(YarnConfigKeys.ENV_APP_ID);
https://github.com/apache/flink/blob/6c9bb3716a3a92f3b5326558c6238432c669556d/flink-yarn/src/main/java/org/apache/flink/yarn/YarnConfigKeys.java#L28
Best,
Feng
On Wed, Aug 30, 2023 at 7:14 PM allanqinjy wrote:
> hi,
Hi Sigalit,
first of all, have you read the docs page on metrics [1], and in particular
the Prometheus section on metrics reporters [2]?
Apart from that, there is also a (somewhat older) blog post about
integrating Flink with Prometheus, including a link to a repo with example
code [3].
Hope
Some more background on MetricGroups:
Internally there (mostly) 3 types of metric groups:
On the one hand we have the ComponentMetricGroups (like
TaskManagerMetricGroup) that describe a high-level Flink entity, which
just add a constant expression to the logical scope(like taskmanager,
task
Upon further inspection, it seems like the user scope is not universal (i.e.
comes through the connectors and not UDFs (like rich map function)), but the
question still stands if the process makes sense.
> On Jun 1, 2021, at 10:38 AM, Mason Chen wrote:
>
> Makes sense. We are primarily
Makes sense. We are primarily concerned with removing the metric labels from
the names as the user metrics get too long. i.e. the groups from `addGroup` are
concatenated in the metric name.
Do you think there would be any issues with removing the group information in
the metric name and
The uniqueness of metrics and the naming of the Prometheus reporter are
somewhat related but also somewhat orthogonal.
Prometheus works similar to JMX in that the metric name (e.g.,
taskmanager.job.task.operator.numRecordsIn) is more or less a _class_ of
metrics, with tags/labels allowing you
Hi Mason,
The idea is that a metric is not uniquely identified by its name alone but
instead by its path. The groups in which it is defined specify this path
(similar to directories). That's why it is valid to specify two metrics
with the same name if they reside in different groups.
I think
This is currently not possible. See also FLINK-8358
On 4/9/2021 4:47 AM, Claude M wrote:
Hello,
I've setup Flink as an Application Cluster in Kubernetes. Now I'm
looking into monitoring the Flink cluster in Datadog. This is what is
configured in the flink-conf.yaml to emit metrics:
Hi,
1)
Do you want to output those metrics as Flink metrics? Or output those
"metrics"/counters as values to some external system (like Kafka)? The
problem discussed in [1], was that the metrics (Counters) were not fitting
in memory, so David suggested to hold them on Flink's state and treat the
Hi Gary,
Sorry for the false alarm. It's caused by a bug in my deployment - no
metrics were added into the registry.
Sorry for wasting your time.
Thanks and best regards,
Averell
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Hi Gary,
Thanks for the help.
Here below is the output from jstack. It seems not being blocked.
In my JobManager log, there's this WARN, I am not sure whether it's relevant
at all.
Attached is the full jstack dump k8xDump.txt
Hi Averell,
If you are seeing the log message from [1] and Scheduled#report() is
not called, the thread in the "Flink-MetricRegistry" thread pool might
be blocked. You can use the jstack utility to see on which task the
thread pool is blocked.
Best,
Gary
[1]
]
From: Chesnay Schepler
Sent: Wednesday, January 22, 2020 6:07 PM
To: Sidney Feiner ; flink-u...@apache.org
Subject: Re: Flink Metrics - PrometheusReporter
Metrics are exposed via reporters by each process separately, whereas the WebUI
aggregates metrics.
As such you have to configure
Metrics are exposed via reporters by each process separately, whereas
the WebUI aggregates metrics.
As such you have to configure Prometheus to also scrape the TaskExecutors.
On 22/01/2020 16:58, Sidney Feiner wrote:
Hey,
I've been trying to use the PrometheusReporter and when I used in
Hi Flavio,
Below is my explanation to your question, based on anecdotal evidence:
As you may know, Flink distribution package is already scala version
specific and bundles some jar artifacts.
User Flink job is supposed to be compiled against some of those jars (with
maven's `provided` scope).
Sorry,
I just discovered that those jars are actually in the opt folder within
Flink dist..however the second point still holds: why there's a single
influxdb jar inside flink's opt jar while on maven there are 2 versions
(one for scala 2.11 and one for 2.12)?
Best,
Flavio
On Thu, Oct 10, 2019
Hi Biao!
> Do you mean "distinguish metrics from different JobManager running on
same host"?
Exactly.
>Will give you a feedback if there is a conclusion.
Thanks!
On Thu, 15 Aug 2019 at 06:40, Biao Liu wrote:
> Hi Vasily,
>
> > Is there any way to distinguish logs from different JobManager
Hi Vasily,
> Is there any way to distinguish logs from different JobManager running on
same host?
Do you mean "distinguish metrics from different JobManager running on same
host"?
I guess there is no other variable you could use for now.
But I think it's reasonable to support this requirement.
取hostname的第一部分是为了和hdfs的用法保持一致,可以参考一下当时的issue,作者专门提到了为什么要这么做。
https://issues.apache.org/jira/browse/FLINK-1170?focusedCommentId=14175285=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14175285
Thank you~
Xintong Song
On Wed, May 15, 2019 at 9:11 PM Yun Tang wrote:
Hi 嘉诚
不清楚你使用的Flink具体版本,不过这个显示host-name第一部分的逻辑是一直存在的,因为大部分场景下host-name只需要取第一部分即可表征。具体实现代码可以参阅
[1] 和 [2] 。
受到你的启发,我创建了一个JIRA [3] 来追踪这个问题,解法是提供一个metrics
options,使得你们场景下可以展示metrics的完整hostname
祝好
唐云
[1]
Hi Brian,
You can implement a new org.apache.flink.metrics.reporter.MetricReporter as
you like and register it to flink in flink conf.
e.g.
metrics.reporters:my_reporter
metrics.reporter.my_other_reporter.class: xxx
metrics.reporter.my_other_reporter.config1: yyy
Aha! This is almost certainly it. I remembered thinking something like this
might be a problem. I'll need to change the deployment a bit to add this
(not straightforward to edit the YAML in my case, but thanks!
On Sun, Mar 24, 2019 at 10:01 AM dawid <
Padarn Wilson-2 wrote
> I am running Fink 1.7.2 on Kubernetes in a setup with task manager and job
> manager separate.
>
> I'm having trouble seeing the metrics from my Flink job in the UI
> dashboard. Actually I'm using the Datadog reporter to expose most of my
> metrics, but latency tracking
Thanks David. I cannot see the metrics there, so let me play around a bit
more and make sure they are enabled correctly.
On Sat, Mar 23, 2019 at 9:19 PM David Anderson wrote:
> > I have done this (actually I do it in my flink-conf.yaml), but I am not
> seeing any metrics at all in the Flink UI,
> I have done this (actually I do it in my flink-conf.yaml), but I am not
seeing any metrics at all in the Flink UI,
> let alone the latency tracking. The latency tracking itself does not seem
to be exported to datadog (should it be?)
The latency metrics are job metrics, and are not shown in the
Because latency tracking is expensive, it is turned off by default. You
turn it on by setting the interval; that looks something like this:
env.getConfig().setLatencyTrackingInterval(1000);
The full set of configuration options is described in the docs:
If you're working with 1.7/master you're probably running into
https://issues.apache.org/jira/browse/FLINK-11127 .
On 17.12.2018 18:12, eric hoffmann wrote:
Hi,
In a Kubernetes delpoyment, im not able to display metrics in the dashboard, I
try to expose and fix the
Ah ok, the onTimer() and processElement() methods are all protected by
synchronized blocks on the same lock. So that shouldn’t be a problem.
> On 22. May 2017, at 15:08, Chesnay Schepler wrote:
>
> Yes, that could cause the observed issue.
>
> The default implementations
Yes, that could cause the observed issue.
The default implementations are not thread-safe; if you do concurrent
writes they may be lost/overwritten.
You will have to either guard accesses to that metric with a
synchronized block or implement your own thread-safe counter.
On 22.05.2017 14:17,
@Chesnay With timers it will happen that onTimer() is called from a different
Thread than the Tread that is calling processElement(). If Metrics updates
happen in both, would that be a problem?
> On 19. May 2017, at 11:57, Chesnay Schepler wrote:
>
> 2. isn't quite
2. isn't quite accurate actually; metrics on the TaskManager are not
persisted across restarts.
On 19.05.2017 11:21, Chesnay Schepler wrote:
1. This shouldn't happen. Do you access the counter from different
threads?
2. Metrics in general are not persisted across restarts, and there is
no
1. This shouldn't happen. Do you access the counter from different threads?
2. Metrics in general are not persisted across restarts, and there is no
way to configure flink to do so at the moment.
3. Counters are sent as gauges since as far as I know StatsD counters
are not allowed to be
Hi there,
I am using Graphite and querying it in Grafana is super easy. You just
select fields and they come up automatically for you to select from
depending on how your metric structure in Graphite looks like. You can also
use wildcards.
The only thing I had to do because I am also using
Hi Jamie,
Thanks for sharing your thoughts. I'll try and integrate with Graphite to
see if this gets resolved.
Regards,
Anchit
--
View this message in context:
Hi Anchit,
That last bit is very interesting - the fact that it works fine with
subtasks <= 30. It could be that either Influx or Grafana are not able to
keep up with the data being produced. I would guess that the culprit is
Grafana if looking at any particular subtask index works fine and
I've set the metric reporting frequency to InfluxDB as 10s. In the
screenshot, I'm using Grafana query interval of 1s. I've tried 10s and more
too, the graph shape changes a bit but the incorrect negative values are
still plotted(makes no difference).
Something to add: If the subtasks are less
Hmm. I can't recreate that behavior here. I have seen some issues like
this if you're grouping by a time interval different from the metrics
reporting interval you're using, though. How often are you reporting
metrics to Influx? Are you using the same interval in your Grafana
queries? I see
Hi Jamie,
Thank you so much for your response.
The below query:
SELECT derivative(sum("count"), 1s) FROM "numRecordsIn" WHERE "task_name" =
'Sink: Unnamed' AND $timeFilter GROUP BY time(1s)
behaves the same as with the use of the templating variable in the 'All'
case i.e. shows a plots of
Ahh.. I haven’t used templating all that much but this also works for your
substask variable so that you don’t have to enumerate all the possible
values:
Template Variable Type: query
query: SHOW TAG VALUES FROM numRecordsIn WITH KEY = "subtask_index"
On Tue, Nov 1, 2016 at 2:51 PM, Jamie
Another note. In the example the template variable type is "custom" and
the values have to be enumerated manually. So in your case you would have
to configure all the possible values of "subtask" to be 0-49.
On Tue, Nov 1, 2016 at 2:43 PM, Jamie Grier wrote:
> This
This works well for me. This will aggregate the data across all sub-task
instances:
SELECT derivative(sum("count"), 1s) FROM "numRecordsIn" WHERE "task_name" =
'Sink: Unnamed' AND $timeFilter GROUP BY time(1s)
You can also plot each sub-task instance separately on the same graph by
doing:
.apache.org
> *Sent:* Monday, October 17, 2016 12:52 AM
> *Subject:* Re: Flink Metrics
>
> Hi Govind,
>
> I think the DropwizardMeterWrapper implementation is just a reference
> implementation where it was decided to report the minute rate. You can
> define your own
ubject: Re: Flink Metrics
Hi Govind,
I think the DropwizardMeterWrapper implementation is just a reference
implementation where it was decided to report the minute rate. You can
define your own meter class which allows to configure the rate interval
accordingly.
Concerning Timers, I think nobody
Hello,
we could also offer a small utility method that creates 3 flink meters,
each reporting one rate of a DW meter.
Timers weren't added yet since, as Till said, no one requested them yet
and we haven't found a proper internal use-case for them
Regards,
Chesnay
On 17.10.2016 09:52, Till
Hi Govind,
I think the DropwizardMeterWrapper implementation is just a reference
implementation where it was decided to report the minute rate. You can
define your own meter class which allows to configure the rate interval
accordingly.
Concerning Timers, I think nobody requested this metric so
46 matches
Mail list logo