Re: Write performance of highly partitioned tables

Jingsong Li Tue, 13 Jan 2026 18:28:17 -0800

Hi Mao,

Why not use `sink.writer-coordinator.enabled`?


Best,
Jingsong

On Tue, Jan 13, 2026 at 7:51 PM Mao Liu <[email protected]> wrote:
>
> Greetings Paimon community and devs,
>
> I’d like to share some findings from recent performance testing of Paimon in 
> Flink, on highly partitioned tables with large data volumes.
>
> For PK tables with thousands of partitions and fixed bucket, where all 
> partitions are receiving streaming writes, we have observed that:
> - manifest files are fetched from cloud filesystem rather excessively, 
> 100,000s of times or even more
> - manifest fetching can be computationally intensive due to decompression & 
> deserialization, leading to Flink TaskManager CPU utilization pinned at 100% 
> until manifest fetching are all completed
> - this causes very long run times for dedicated compaction jobs, unstable 
> streaming job write performance, as well as very high API request costs to 
> cloud filesystems for all the manifest file retrievals
>
> After a lot of logging and debugging, the bottleneck appears to be 
> FileSystemWriteRestore.restoreFiles, which repeatedly fetches the manifest 
> for each (partition, bucket) combination, filtering down to only the relevant 
> data files for its own slice of (partition, bucket).
>
> We have been testing a patch on FileSystemWriteRestore, where we are 
> pre-fetching and caching the entire manifest to avoid duplicated API requests 
> and reduce computational burden of repeated decompression/deserialization.
>
> Draft PR for discussion: https://github.com/apache/paimon/pull/7031
> Github issue with some further details: 
> https://github.com/apache/paimon/issues/7030
>
> I'd like to get some feedback from Paimon maintainers/devs on
> - whether this is an acceptable approach / suggestions for alternative 
> implementation approaches
> - are there any caveats/issues that this might cause (e.g. any risk that may 
> lead to data loss?)
>
> Many thanks,
> Mao

Re: Write performance of highly partitioned tables

Reply via email to