Hi there, We've spent several hours to split our input data into several parquet files (or several folders, i.e. /datasink/output-parquets/<key>/foobar.parquet), based on a low-cardinality key. This works very well with a when using saveAsHadoopFile, but we can't achieve a similar thing with Parquet files.
The only working solution so far is to persist the RDD and then loop over it N times to write N files. That does not look acceptable... Do you guys have any suggestion to do such an operation? -- *Adrien Mogenet* Head of Backend/Infrastructure adrien.moge...@contentsquare.com (+33)6.59.16.64.22 http://www.contentsquare.com 50, avenue Montaigne - 75008 Paris