Hi, I'm trying to control the size and/or count of spark output.

Here is my code. I expect to get 5 files  but I get dozens of small files.
Why?

dataset
.repartition(5)
.sort("long_repeated_string_in_this_column") // should be better compressed
with snappy
.write
.parquet(outputPath)

Reply via email to