I would suggest taking a look at CheckpointRollingPolicy. You need to extend it and override the default behviors in your FileSink.
HTH. Thanks Deepak On Mon, Dec 27, 2021 at 8:13 PM Mathieu D <matd...@gmail.com> wrote: > Hello, > > We’re trying to use a Parquet file sink to output files in s3. > > When running in Streaming mode, it seems that parquet files are flushed > and rolled at each checkpoint. The result is a crazy high number of very > small parquet files which completely defeats the purpose of that format. > > > Is there a way to build larger output parquet files? Or is it only at the > price of having a very large checkpointing interval? > > Thanks for your insights. > > Mathieu > -- Thanks Deepak www.bigdatabig.com www.keosha.net