partitionBy creating lot of small files

Nikhil Goyal Sat, 04 Jun 2022 09:44:46 -0700

Hi all,

Is there a way to use dataframe.partitionBy("col") and control the number
of output files without doing a full repartition? The thing is some
partitions have more data while some have less. Doing a .repartition is a
costly operation. We want to control the size of the output files. Is it
even possible?


Thanks

partitionBy creating lot of small files

Reply via email to