Hi Gordon,

We have Kafka 0.10.1.1 running and use the flink-connector-kafka-0.10
driver.

There are a bunch of flink_taskmanager_job_task_operator_* metrics,
including some about the committed offset for each partition. It seems I
have 4 different records_lag_max with different attempt_id, though, 3 with
-Inf and 1 with a value -- which will give me some more understand of
Prometheus to extract this properly.

I was also checking our Grafana and the metric we were using was
"flink_taskmanager_job_task_operator_KafkaConsumer_records_lag_max",
actually. "flink_taskmanager_job_task_operator_records_lag_max" seems to be
new (with the attempt thingy).

On the "KafkaConsumer" front, but it only has the "commited_offset" for
each partition.

On Wed, Jun 13, 2018 at 5:41 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
wrote:

> Hi,
>
> Which Kafka version are you using?
>
> AFAIK, the only recent changes to Kafka connector metrics in the 1.4.x
> series would be FLINK-8419 [1].
> The ‘records_lag_max’ metric is a Kafka-shipped metric simply forwarded
> from the internally used Kafka client, so nothing should have been affected.
>
> Do you see other metrics under the pattern of 
> ‘flink_taskmanager_job_task_operator_*’?
> All Kafka-shipped metrics should still follow this pattern.
> If not, could you find the ‘records_lag_max’ metric (or any other
> Kafka-shipped metrics [2]) under the user scope ‘KafkaConsumer’?
>
> The above should provide more insight into what may be wrong here.
>
> - Gordon
>
> [1] https://issues.apache.org/jira/browse/FLINK-8419
> [2] https://docs.confluent.io/current/kafka/monitoring.html#fetch-metrics
>
> On 12 June 2018 at 11:47:51 PM, Julio Biason (julio.bia...@azion.com)
> wrote:
>
> Hey guys,
>
> I just updated our Flink install from 1.4.0 to 1.4.2, but our Prometheus
> monitoring is not getting the current Kafka lag.
>
> After updating to 1.4.2 and making the symlink between
> opt/flink-metrics-prometheus-1.4.2.jar to lib/, I got the metrics back on
> Prometheus, but the most important one, 
> flink_taskmanager_job_task_operator_records_lag_max
> is now returning -Inf.
>
> Did I miss something?
>
> --
> *Julio Biason*, Sofware Engineer
> *AZION*  |  Deliver. Accelerate. Protect.
> Office: +55 51 3083 8101 <callto:+555130838101>  |  Mobile: +55 51
> <callto:+5551996209291>*99907 0554*
>
>


-- 
*Julio Biason*, Sofware Engineer
*AZION*  |  Deliver. Accelerate. Protect.
Office: +55 51 3083 8101 <callto:+555130838101>  |  Mobile: +55 51
<callto:+5551996209291>*99907 0554*

Reply via email to