Feasibility of per-partition instead of per-table bucket count

Mike Dias via user Sun, 15 Feb 2026 21:20:04 -0800

Hi Paimon maintainers,

I'm looking to implement a change that would allow different partitions
within a PK fixed-bucket table to have different bucket counts, primarily
to support highly skewed partitions with more/fewer buckets.


We would use dynamic buckets to handle skew, but we really need multiple
writers writing to the same active partitions in both streaming and batch,
which doesn't seem to be something we could easily support with dynamic
buckets without coordinating changes to the bucket index file...

On the fixed-buckets side, though, it seems we are in a good spot to
implement per-partition bucketing, and this rescale doc
<https://paimon.apache.org/docs/1.3/maintenance/rescale-bucket/> suggests
we can already do that for partitions that aren't receiving writes.
Unfortunately, our partitions are not time-based, and most of them are
always receiving writes...

Hence, we would need to adapt the current code to allow writers to look up
the bucket counts from the manifest partition rather than relying on the
global table bucket count.

That brings me to the following questions:

   1. *Can we actually do this?:* Are there architectural reasons why
   bucket counts must be uniform across all partitions? Are there assumptions
   elsewhere in the codebase that depend on a single global bucket count?
   2. *Concurrent writers:* If multiple writers are active, they each
   independently load the partition bucket mapping at initialization, which
   creates a risk of inconsistency if a rescale operation completes between
   when different writers load their mappings. This is not too different from
   the existing behavior, but with a global bucket count, it is much easier to
   safeguard against it. Do you have ideas on how we could mitigate this issue
   or warn users against this pitfall?
   3. *Read path:* On the read side, does the scan/split logic already
   handle partitions with heterogeneous bucket counts, or would changes be
   needed there as well?


Any guidance on gotchas or prior art in this area would be greatly
appreciated. Happy to share the full diff or open a draft PR if that would
be easier to review.

--
Thanks,
Mike Dias

Feasibility of per-partition instead of per-table bucket count

Reply via email to