Re: Tag flink metrics to job name

2021-02-19 Thread Chesnay Schepler

hmm...in a roundabout way this could be possible I suppose.

For a given job, search through your metrics for some job metric (like 
numRestarts on the JM, or any task metric for TMs), and from that you 
should be able to infer the JM/TM that belongs to that (based on the TM 
ID / host information in the metric).
Conversely, when you see high cpu usage in one of the metrics for a 
JM/TM, search for a job metric for that same process.


On 2/19/2021 9:14 AM, bat man wrote:
Is there a way I can look into say for a specific job what’s the cpu 
usage or memory usage of the yarn containers when multiple jobs are 
running on the same cluster.
Also, the issue am trying to resolve is I’m seeing high memory usage 
for one of the containers I want isolate the issue with one job and 
then investigate further.


Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:18 PM, Chesnay Schepler > wrote:


No, Job-/TaskManager metrics cannot be tagged with the job name.
The reason is that this only makes sense for application clusters
(opposed to session clusters), but we don't differentiate between
the two when it comes to metrics.

On 2/19/2021 3:59 AM, bat man wrote:

I meant the Flink jobname. I’m using the below reporter -
||
|metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter|
Is there any way to tag job names to the task and job manager
metrics.

Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler
mailto:ches...@apache.org>> wrote:

When you mean "job_name", are you referring to the Prometheus
concept of
jobs, of the one of Flink?

Which of Flink prometheus reporters are you using?

On 2/17/2021 7:37 PM, bat man wrote:
> Hello there,
>
> I am using prometheus to push metrics to prometheus and
then use
> grafana for visualization. There are metrics like
>
- flink_taskmanager_Status_JVM_CPU_Load, 
flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time

> etc which do not gives job_name. It is tied to an instance.
> When running multiple jobs in the same yarn cluster it is
possible
> that different jobs have yarn containers on the same
instance, in this
> case it is very difficult to find out which instance has
high CPU
> load, Memory usage etc.
>
> Is there a way to tag job_name to these metrics so that the
metrics
> could be visualized per job.
>
> Thanks,
> Hemant








Re: Tag flink metrics to job name

2021-02-19 Thread bat man
Is there a way I can look into say for a specific job what’s the cpu usage
or memory usage of the yarn containers when multiple jobs are running on
the same cluster.
Also, the issue am trying to resolve is I’m seeing high memory usage for
one of the containers I want isolate the issue with one job and then
investigate further.

Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:18 PM, Chesnay Schepler 
wrote:

> No, Job-/TaskManager metrics cannot be tagged with the job name.
> The reason is that this only makes sense for application clusters (opposed
> to session clusters), but we don't differentiate between the two when it
> comes to metrics.
>
> On 2/19/2021 3:59 AM, bat man wrote:
>
> I meant the Flink jobname. I’m using the below reporter -
>
>  metrics.reporter.prom.class: 
> org.apache.flink.metrics.prometheus.PrometheusReporter
>
> Is there any way to tag job names to the task and job manager metrics.
>
> Thanks,
> Hemant
>
> On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler 
> wrote:
>
>> When you mean "job_name", are you referring to the Prometheus concept of
>> jobs, of the one of Flink?
>>
>> Which of Flink prometheus reporters are you using?
>>
>> On 2/17/2021 7:37 PM, bat man wrote:
>> > Hello there,
>> >
>> > I am using prometheus to push metrics to prometheus and then use
>> > grafana for visualization. There are metrics like
>> >
>> - flink_taskmanager_Status_JVM_CPU_Load, 
>> flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time
>>
>> > etc which do not gives job_name. It is tied to an instance.
>> > When running multiple jobs in the same yarn cluster it is possible
>> > that different jobs have yarn containers on the same instance, in this
>> > case it is very difficult to find out which instance has high CPU
>> > load, Memory usage etc.
>> >
>> > Is there a way to tag job_name to these metrics so that the metrics
>> > could be visualized per job.
>> >
>> > Thanks,
>> > Hemant
>>
>>
>>
>


Re: Tag flink metrics to job name

2021-02-18 Thread Chesnay Schepler

No, Job-/TaskManager metrics cannot be tagged with the job name.
The reason is that this only makes sense for application clusters 
(opposed to session clusters), but we don't differentiate between the 
two when it comes to metrics.


On 2/19/2021 3:59 AM, bat man wrote:

I meant the Flink jobname. I’m using the below reporter -
||
|metrics.reporter.prom.class: 
org.apache.flink.metrics.prometheus.PrometheusReporter|

Is there any way to tag job names to the task and job manager metrics.

Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler > wrote:


When you mean "job_name", are you referring to the Prometheus
concept of
jobs, of the one of Flink?

Which of Flink prometheus reporters are you using?

On 2/17/2021 7:37 PM, bat man wrote:
> Hello there,
>
> I am using prometheus to push metrics to prometheus and then use
> grafana for visualization. There are metrics like
>
- flink_taskmanager_Status_JVM_CPU_Load, 
flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time

> etc which do not gives job_name. It is tied to an instance.
> When running multiple jobs in the same yarn cluster it is possible
> that different jobs have yarn containers on the same instance,
in this
> case it is very difficult to find out which instance has high CPU
> load, Memory usage etc.
>
> Is there a way to tag job_name to these metrics so that the metrics
> could be visualized per job.
>
> Thanks,
> Hemant






Re: Tag flink metrics to job name

2021-02-18 Thread bat man
I meant the Flink jobname. I’m using the below reporter -


metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter

Is there any way to tag job names to the task and job manager metrics.

Thanks,
Hemant

On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler 
wrote:

> When you mean "job_name", are you referring to the Prometheus concept of
> jobs, of the one of Flink?
>
> Which of Flink prometheus reporters are you using?
>
> On 2/17/2021 7:37 PM, bat man wrote:
> > Hello there,
> >
> > I am using prometheus to push metrics to prometheus and then use
> > grafana for visualization. There are metrics like
> >
> - flink_taskmanager_Status_JVM_CPU_Load, 
> flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time
>
> > etc which do not gives job_name. It is tied to an instance.
> > When running multiple jobs in the same yarn cluster it is possible
> > that different jobs have yarn containers on the same instance, in this
> > case it is very difficult to find out which instance has high CPU
> > load, Memory usage etc.
> >
> > Is there a way to tag job_name to these metrics so that the metrics
> > could be visualized per job.
> >
> > Thanks,
> > Hemant
>
>
>


Re: Tag flink metrics to job name

2021-02-18 Thread Chesnay Schepler
When you mean "job_name", are you referring to the Prometheus concept of 
jobs, of the one of Flink?


Which of Flink prometheus reporters are you using?

On 2/17/2021 7:37 PM, bat man wrote:

Hello there,

I am using prometheus to push metrics to prometheus and then use 
grafana for visualization. There are metrics like 
- flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time 
etc which do not gives job_name. It is tied to an instance.
When running multiple jobs in the same yarn cluster it is possible 
that different jobs have yarn containers on the same instance, in this 
case it is very difficult to find out which instance has high CPU 
load, Memory usage etc.


Is there a way to tag job_name to these metrics so that the metrics 
could be visualized per job.


Thanks,
Hemant





Tag flink metrics to job name

2021-02-17 Thread bat man
Hello there,

I am using prometheus to push metrics to prometheus and then use grafana
for visualization. There are metrics like
- flink_taskmanager_Status_JVM_CPU_Load,
flink_taskmanager_Status_JVM_CPU_Load,
flink_taskmanager_Status_JVM_CPU_Time
etc which do not gives job_name. It is tied to an instance.
When running multiple jobs in the same yarn cluster it is possible that
different jobs have yarn containers on the same instance, in this case it
is very difficult to find out which instance has high CPU load, Memory
usage etc.

Is there a way to tag job_name to these metrics so that the metrics could
be visualized per job.

Thanks,
Hemant