Re: Problems with PrometheusReporter

2022-04-22 Thread Peter Schrott
Hi Chesnay,

I had a look in my logs, there are not WARNINGS regarding metrics and
registering metrics when starting this job.

I ran the example jobs
- ./examples/table/ChangelogSocketExample.jar (table streaming)
- ./examples/streaming/StateMachineExample.jar (streaming)

When running those jobs the metrics on the taskamagers are available. I
will continue debugging my job, which uses the flink table api.

Thanks, Peter

On Thu, Apr 21, 2022 at 9:12 AM Chesnay Schepler  wrote:

> Please check the logs for warnings. It could be that a metric registered
> by a job is throwing exceptions.
>
> On 20/04/2022 18:45, Peter Schrott wrote:
>
> Hi kuweiha,
>
> Just to confirm, you tried with 1.15 - none of the rcs are working for me?
>
> This port is definitely free as it was already used on the same hosts with
> Flink 1.14.4. And as I said, when no job is running on the taskmanager it
> actually reports metrics on that certain port - I only get the "empty
> response" when a job is running on the taskmanager I am querying. Did you
> also run a job and could you access metrics like flink_taskmanager_job_*?
>
> The logs only tell me that everything is working fine:
> 2022-04-20 13:46:39,597 INFO  [main] o.a.f.r.metrics.MetricRegistryImpl:?
> - Reporting metrics for reporter prom of type
> org.apache.flink.metrics.prometheus.PrometheusReporter.
> and
> 2022-04-20 12:12:26,394 INFO  [main] o.a.f.m.p.PrometheusReporter:? -
> Started PrometheusReporter HTTP server on port 
>
> Best & thanks,
> Peter
>
>
> On Wed, Apr 20, 2022 at 6:30 PM huweihua  wrote:
>
>> Hi, Peter
>> I have not been able to reproduce this problem.
>>
>> From your description, it is possible that the specified port  has
>> been listened by other processes, and PrometheusReporter failed to start.
>> You can confirm it from taskmanager.log, or check if port  of the
>> host is being listened by the TaskManager process.
>>
>>
>> 2022年4月20日 下午10:48,Peter Schrott  写道:
>>
>> Hi Flink-Users,
>>
>> After upgrading to Flink 1.15 (rc3) (coming from 1.14) I noticed that
>> there is a problem with the metrics exposed through the
>> PrometheusReporter.
>>
>> It is configured as followed in the flink-config.yml:
>> metrics.reporters: prom
>> metrics.reporter.prom.class:
>> org.apache.flink.metrics.prometheus.PrometheusReporter
>> metrics.reporter.prom.port: 
>>
>> My cluster is running in standalone mode with 2 taskmanagers and 2
>> jobmanagers.
>>
>> More specifically:
>>
>> On the taskmanger that runs a job I get curl: (52) Empty reply from
>> server when I call curl localhost:. I was looking for the metrics in
>> the namespace flink_taskmanager_job_*, which are only - and obviously -
>> exposed on the taskmanager running a job.
>>
>> On the other taskmanger that runs no job I get a response with a couple
>> of metrics of the namespace flink_taskmanager_Status - as expected.
>>
>> When configuring the JMXReporterFactory for too. I find the desired and
>> all other metrics via VisualVM on that taskmanager running the job. Also in
>> the Flink web ui, in the "Jobs -> Overview -> Metrics" part I can select
>> and visualize metrics like flink_taskmanager_job_task_busyTimeMsPerSecond
>> .
>>
>> Does someone have any idea what's going on here? maybe even confirm my
>> findings?
>>
>> Best & thanks,
>> Peter
>>
>>
>>
>


Re: Problems with PrometheusReporter

2022-04-21 Thread Chesnay Schepler
Please check the logs for warnings. It could be that a metric registered 
by a job is throwing exceptions.


On 20/04/2022 18:45, Peter Schrott wrote:

Hi kuweiha,

Just to confirm, you tried with 1.15 - none of the rcs are working for me?

This port is definitely free as it was already used on the same hosts 
with Flink 1.14.4. And as I said, when no job is running on the 
taskmanager it actually reports metrics on that certain port - I only 
get the "empty response" when a job is running on the taskmanager I am 
querying. Did you also run a job and could you access metrics like 
flink_taskmanager_job_*?


The logs only tell me that everything is working fine:
2022-04-20 13:46:39,597 INFO  [main] 
o.a.f.r.metrics.MetricRegistryImpl:? - Reporting metrics for reporter 
prom of type org.apache.flink.metrics.prometheus.PrometheusReporter.

and
2022-04-20 12:12:26,394 INFO  [main] o.a.f.m.p.PrometheusReporter:? - 
Started PrometheusReporter HTTP server on port 


Best & thanks,
Peter


On Wed, Apr 20, 2022 at 6:30 PM huweihua  wrote:

Hi, Peter
I have not been able to reproduce this problem.

From your description, it is possible that the specified port 
has been listened by other processes, and PrometheusReporter
failed to start.
You can confirm it from taskmanager.log, or check if port  of
the host is being listened by the TaskManager process.



2022年4月20日 下午10:48,Peter Schrott  写道:

Hi Flink-Users,

After upgrading to Flink 1.15 (rc3) (coming from 1.14) I noticed
that there is a problem with the metrics exposed through the
PrometheusReporter.

It is configured as followed in the flink-config.yml:
metrics.reporters: prom
metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 

My cluster is running in standalone mode with 2 taskmanagers and
2 jobmanagers.

More specifically:

On the taskmanger that runs a job I get curl: (52) Empty reply
from server when I call curl localhost:. I was looking for
the metrics in the namespace flink_taskmanager_job_*, which are
only - and obviously - exposed on the taskmanager running a job.

On the other taskmanger that runs no job I get a response with a
couple of metrics of the namespace flink_taskmanager_Status- as
expected.

When configuring the JMXReporterFactory for too. I find the
desired and all other metrics via VisualVM on that
taskmanager running the job. Also in the Flink web ui, in the
"Jobs -> Overview -> Metrics" part I can select and visualize
metrics like flink_taskmanager_job_task_busyTimeMsPerSecond.

Does someone have any idea what's going on here? maybe even
confirm my findings?

Best & thanks,
Peter





Re: Problems with PrometheusReporter

2022-04-20 Thread Peter Schrott
Hi kuweiha,

Just to confirm, you tried with 1.15 - none of the rcs are working for me?

This port is definitely free as it was already used on the same hosts with
Flink 1.14.4. And as I said, when no job is running on the taskmanager it
actually reports metrics on that certain port - I only get the "empty
response" when a job is running on the taskmanager I am querying. Did you
also run a job and could you access metrics like flink_taskmanager_job_*?

The logs only tell me that everything is working fine:
2022-04-20 13:46:39,597 INFO  [main] o.a.f.r.metrics.MetricRegistryImpl:? -
Reporting metrics for reporter prom of type
org.apache.flink.metrics.prometheus.PrometheusReporter.
and
2022-04-20 12:12:26,394 INFO  [main] o.a.f.m.p.PrometheusReporter:? -
Started PrometheusReporter HTTP server on port 

Best & thanks,
Peter


On Wed, Apr 20, 2022 at 6:30 PM huweihua  wrote:

> Hi, Peter
> I have not been able to reproduce this problem.
>
> From your description, it is possible that the specified port  has
> been listened by other processes, and PrometheusReporter failed to start.
> You can confirm it from taskmanager.log, or check if port  of the host
> is being listened by the TaskManager process.
>
>
> 2022年4月20日 下午10:48,Peter Schrott  写道:
>
> Hi Flink-Users,
>
> After upgrading to Flink 1.15 (rc3) (coming from 1.14) I noticed that
> there is a problem with the metrics exposed through the PrometheusReporter
> .
>
> It is configured as followed in the flink-config.yml:
> metrics.reporters: prom
> metrics.reporter.prom.class:
> org.apache.flink.metrics.prometheus.PrometheusReporter
> metrics.reporter.prom.port: 
>
> My cluster is running in standalone mode with 2 taskmanagers and 2
> jobmanagers.
>
> More specifically:
>
> On the taskmanger that runs a job I get curl: (52) Empty reply from
> server when I call curl localhost:. I was looking for the metrics in
> the namespace flink_taskmanager_job_*, which are only - and obviously -
> exposed on the taskmanager running a job.
>
> On the other taskmanger that runs no job I get a response with a couple of
> metrics of the namespace flink_taskmanager_Status - as expected.
>
> When configuring the JMXReporterFactory for too. I find the desired and
> all other metrics via VisualVM on that taskmanager running the job. Also in
> the Flink web ui, in the "Jobs -> Overview -> Metrics" part I can select
> and visualize metrics like flink_taskmanager_job_task_busyTimeMsPerSecond.
>
> Does someone have any idea what's going on here? maybe even confirm my
> findings?
>
> Best & thanks,
> Peter
>
>
>


Re: Problems with PrometheusReporter

2022-04-20 Thread huweihua
Hi, Peter
I have not been able to reproduce this problem. 

From your description, it is possible that the specified port  has been 
listened by other processes, and PrometheusReporter failed to start. 
You can confirm it from taskmanager.log, or check if port  of the host is 
being listened by the TaskManager process.


> 2022年4月20日 下午10:48,Peter Schrott  写道:
> 
> Hi Flink-Users,
> 
> After upgrading to Flink 1.15 (rc3) (coming from 1.14) I noticed that there 
> is a problem with the metrics exposed through the PrometheusReporter. 
> 
> It is configured as followed in the flink-config.yml:
> metrics.reporters: prom
> metrics.reporter.prom.class: 
> org.apache.flink.metrics.prometheus.PrometheusReporter
> metrics.reporter.prom.port: 
> 
> My cluster is running in standalone mode with 2 taskmanagers and 2 
> jobmanagers.
> 
> More specifically: 
> 
> On the taskmanger that runs a job I get curl: (52) Empty reply from server 
> when I call curl localhost:. I was looking for the metrics in the 
> namespace flink_taskmanager_job_*, which are only - and obviously - exposed 
> on the taskmanager running a job.
> 
> On the other taskmanger that runs no job I get a response with a couple of 
> metrics of the namespace flink_taskmanager_Status - as expected.
> 
> When configuring the JMXReporterFactory for too. I find the desired and all 
> other metrics via VisualVM on that taskmanager running the job. Also in the 
> Flink web ui, in the "Jobs -> Overview -> Metrics" part I can select and 
> visualize metrics like flink_taskmanager_job_task_busyTimeMsPerSecond.
> 
> Does someone have any idea what's going on here? maybe even confirm my 
> findings?
> 
> Best & thanks,
> Peter
> 



Problems with PrometheusReporter

2022-04-20 Thread Peter Schrott
Hi Flink-Users,

After upgrading to Flink 1.15 (rc3) (coming from 1.14) I noticed that there
is a problem with the metrics exposed through the PrometheusReporter.

It is configured as followed in the flink-config.yml:
metrics.reporters: prom
metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 

My cluster is running in standalone mode with 2 taskmanagers and 2
jobmanagers.

More specifically:

On the taskmanger that runs a job I get curl: (52) Empty reply from server when
I call curl localhost:. I was looking for the metrics in the namespace
flink_taskmanager_job_*, which are only - and obviously - exposed on the
taskmanager running a job.

On the other taskmanger that runs no job I get a response with a couple of
metrics of the namespace flink_taskmanager_Status - as expected.

When configuring the JMXReporterFactory for too. I find the desired and all
other metrics via VisualVM on that taskmanager running the job. Also in the
Flink web ui, in the "Jobs -> Overview -> Metrics" part I can select and
visualize metrics like flink_taskmanager_job_task_busyTimeMsPerSecond.

Does someone have any idea what's going on here? maybe even confirm my
findings?

Best & thanks,
Peter