Hi, I'm trying to control the size and/or count of spark output. Here is my code. I expect to get 5 files but I get dozens of small files. Why?
dataset .repartition(5) .sort("long_repeated_string_in_this_column") // should be better compressed with snappy .write .parquet(outputPath)