Hi Lec,

You can take a look at Flink Paimon's Metrics, which contains a wealth
of metrics. Paimon is just a lake format, so it doesn't have any
metrics, but the Flink Source we implemented produces a large number
of metrics.

See doc: https://paimon.apache.org/docs/master/maintenance/metrics/

Best,
Jingsong

On Thu, Jan 29, 2026 at 10:17 PM lec ssmi <[email protected]> wrote:
>
> Hi Paimon community,
>
> I have a question regarding the runtime execution model and throughput 
> semantics when reading Paimon tables in streaming mode with consumer-id.
>
> From my understanding and observations, when consumer-id is specified, the 
> execution graph generated by Flink is different from some other common 
> sources (e.g. Kafka). Instead of a single source operator that directly emits 
> records, the graph usually contains:
>
> - A monitor-like source (often with parallelism = 1), which tracks snapshot 
> changes and produces snapshot/split events
> - One or more downstream read operators, which receive those splits and 
> perform the actual file reading, emitting the real RowData records
>
> In this setup, the “source” node in the execution graph mainly emits metadata 
> events (snapshot IDs / splits), while the real data throughput is produced by 
> the downstream read operators.
>
> This leads to a practical issue for platform-level monitoring tools. In many 
> Flink platforms, source throughput (records/s, bytes/s) is commonly measured 
> by observing the source vertex metrics. That approach works well for sources 
> like Kafka, where the source operator itself emits user records. However, in 
> the Paimon + consumer-id case, monitoring only the source vertex seems 
> misleading, because it does not reflect the actual data ingestion rate.
>
> So my questions are:
>
> 1. Is this monitor + reader split in the execution graph an intentional and 
> stable design for Paimon streaming reads with consumer-id?
> 2. From the Paimon/Flink semantics perspective, which operator should be 
> considered the “ingress point” for measuring real data throughput?
> 3. Is there any recommended or documented way for external monitoring systems 
> to correctly identify the operator that represents actual data ingestion when 
> reading from Paimon?
>
> The motivation here is to build a connector-agnostic source rate detection 
> mechanism, and understanding the intended semantics on the Paimon side would 
> be very helpful.
>
> Thanks in advance for your insights, and thanks for the great work on Paimon.
>
> Best regards.

Reply via email to