I would suggest taking a look at CheckpointRollingPolicy.
You need to extend it and override the default behviors in your FileSink.

HTH.

Thanks
Deepak

On Mon, Dec 27, 2021 at 8:13 PM Mathieu D <matd...@gmail.com> wrote:

> Hello,
>
> We’re trying to use a Parquet file sink to output files in s3.
>
> When running in Streaming mode, it seems that parquet files are flushed
> and rolled at each checkpoint. The result is a crazy high number of very
> small parquet files which completely defeats the purpose of that format.
>
>
> Is there a way to build larger output parquet files? Or is it only at the
> price of having a very large checkpointing interval?
>
> Thanks for your insights.
>
> Mathieu
>


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Reply via email to