Thank Steven, for starting this discussion.
As I suggested in the previous thread, this can be a joint effort
beneficial for various projects.
I would also like to hear opinions from @Jingsong Li
, who is maintaining Flink Table Store.
Best,
Jark
On Tue, 31 Jan 2023 at 08:46, Steven Wu wrote:
Hi,
We had a proposal to add a streaming shuffling stage in the Flink Iceberg
sink to to improve data clustering and tame the small files problem [1].
Here are a couple of common use cases.
* Event time partitioned table where we can get small files problem due to
skewed and long-tail