Hi Mao, Why not use `sink.writer-coordinator.enabled`?
Best, Jingsong On Tue, Jan 13, 2026 at 7:51 PM Mao Liu <[email protected]> wrote: > > Greetings Paimon community and devs, > > I’d like to share some findings from recent performance testing of Paimon in > Flink, on highly partitioned tables with large data volumes. > > For PK tables with thousands of partitions and fixed bucket, where all > partitions are receiving streaming writes, we have observed that: > - manifest files are fetched from cloud filesystem rather excessively, > 100,000s of times or even more > - manifest fetching can be computationally intensive due to decompression & > deserialization, leading to Flink TaskManager CPU utilization pinned at 100% > until manifest fetching are all completed > - this causes very long run times for dedicated compaction jobs, unstable > streaming job write performance, as well as very high API request costs to > cloud filesystems for all the manifest file retrievals > > After a lot of logging and debugging, the bottleneck appears to be > FileSystemWriteRestore.restoreFiles, which repeatedly fetches the manifest > for each (partition, bucket) combination, filtering down to only the relevant > data files for its own slice of (partition, bucket). > > We have been testing a patch on FileSystemWriteRestore, where we are > pre-fetching and caching the entire manifest to avoid duplicated API requests > and reduce computational burden of repeated decompression/deserialization. > > Draft PR for discussion: https://github.com/apache/paimon/pull/7031 > Github issue with some further details: > https://github.com/apache/paimon/issues/7030 > > I'd like to get some feedback from Paimon maintainers/devs on > - whether this is an acceptable approach / suggestions for alternative > implementation approaches > - are there any caveats/issues that this might cause (e.g. any risk that may > lead to data loss?) > > Many thanks, > Mao
