Hi, I have a DataFrame of records with dates, and I'd like to write all 12-month (with overlap) windows to separate outputs.
Currently, I have a loop equivalent to: for ((windowStart, windowEnd) <- windows) { val windowData = allData.filter( getFilterCriteria(windowStart, windowEnd)) windowData.write.format(...).save(...) } This works fine, but has the drawback that since Spark doesn't parallelize the writes, there is a fairly cost based on the number of windows. Is there a way around this? In MapReduce, I'd probably multiply the data in a Mapper with a window ID and then maybe use something like MultipleOutputs <https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html>. But I'm a bit worried of trying to do this in Spark because of the data explosion and RAM use. What's the best approach? Thanks! - Everett