Hi Lec, You can take a look at Flink Paimon's Metrics, which contains a wealth of metrics. Paimon is just a lake format, so it doesn't have any metrics, but the Flink Source we implemented produces a large number of metrics.
See doc: https://paimon.apache.org/docs/master/maintenance/metrics/ Best, Jingsong On Thu, Jan 29, 2026 at 10:17 PM lec ssmi <[email protected]> wrote: > > Hi Paimon community, > > I have a question regarding the runtime execution model and throughput > semantics when reading Paimon tables in streaming mode with consumer-id. > > From my understanding and observations, when consumer-id is specified, the > execution graph generated by Flink is different from some other common > sources (e.g. Kafka). Instead of a single source operator that directly emits > records, the graph usually contains: > > - A monitor-like source (often with parallelism = 1), which tracks snapshot > changes and produces snapshot/split events > - One or more downstream read operators, which receive those splits and > perform the actual file reading, emitting the real RowData records > > In this setup, the “source” node in the execution graph mainly emits metadata > events (snapshot IDs / splits), while the real data throughput is produced by > the downstream read operators. > > This leads to a practical issue for platform-level monitoring tools. In many > Flink platforms, source throughput (records/s, bytes/s) is commonly measured > by observing the source vertex metrics. That approach works well for sources > like Kafka, where the source operator itself emits user records. However, in > the Paimon + consumer-id case, monitoring only the source vertex seems > misleading, because it does not reflect the actual data ingestion rate. > > So my questions are: > > 1. Is this monitor + reader split in the execution graph an intentional and > stable design for Paimon streaming reads with consumer-id? > 2. From the Paimon/Flink semantics perspective, which operator should be > considered the “ingress point” for measuring real data throughput? > 3. Is there any recommended or documented way for external monitoring systems > to correctly identify the operator that represents actual data ingestion when > reading from Paimon? > > The motivation here is to build a connector-agnostic source rate detection > mechanism, and understanding the intended semantics on the Paimon side would > be very helpful. > > Thanks in advance for your insights, and thanks for the great work on Paimon. > > Best regards.
