Re: flatmap() and spark performance

2015-09-28 Thread Hemant Bhanawat
You can use spark.executor.memory to specify the memory of the executors which will hold this intermediate results. You may want to look at the section "Understanding Memory Management in Spark" of this link: https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applica

flatmap() and spark performance

2015-09-28 Thread jeff saremi
Is there anyway to let spark know ahead of time what size of RDD to expect as a result of a flatmap() operation? And would that help in terms of performance? For instance, if I have an RDD of 1million rows and I know that my flatMap() will produce 100million rows, is there a way to indicate that