Hi Paimon maintainers, I'm looking to implement a change that would allow different partitions within a PK fixed-bucket table to have different bucket counts, primarily to support highly skewed partitions with more/fewer buckets.
We would use dynamic buckets to handle skew, but we really need multiple writers writing to the same active partitions in both streaming and batch, which doesn't seem to be something we could easily support with dynamic buckets without coordinating changes to the bucket index file... On the fixed-buckets side, though, it seems we are in a good spot to implement per-partition bucketing, and this rescale doc <https://paimon.apache.org/docs/1.3/maintenance/rescale-bucket/> suggests we can already do that for partitions that aren't receiving writes. Unfortunately, our partitions are not time-based, and most of them are always receiving writes... Hence, we would need to adapt the current code to allow writers to look up the bucket counts from the manifest partition rather than relying on the global table bucket count. That brings me to the following questions: 1. *Can we actually do this?:* Are there architectural reasons why bucket counts must be uniform across all partitions? Are there assumptions elsewhere in the codebase that depend on a single global bucket count? 2. *Concurrent writers:* If multiple writers are active, they each independently load the partition bucket mapping at initialization, which creates a risk of inconsistency if a rescale operation completes between when different writers load their mappings. This is not too different from the existing behavior, but with a global bucket count, it is much easier to safeguard against it. Do you have ideas on how we could mitigate this issue or warn users against this pitfall? 3. *Read path:* On the read side, does the scan/split logic already handle partitions with heterogeneous bucket counts, or would changes be needed there as well? Any guidance on gotchas or prior art in this area would be greatly appreciated. Happy to share the full diff or open a draft PR if that would be easier to review. -- Thanks, Mike Dias
