You can use spark.executor.memory to specify the memory of the executors
which will hold this intermediate results.
You may want to look at the section "Understanding Memory Management in
Spark" of this link:
https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applica
Is there anyway to let spark know ahead of time what size of RDD to expect as a
result of a flatmap() operation?
And would that help in terms of performance?
For instance, if I have an RDD of 1million rows and I know that my flatMap()
will produce 100million rows, is there a way to indicate that