Hi,
Using spark 0.8.1 ... jave code running on 8 CPU with 16GRAM single node
It's looks like upon setting spark parallelism using
System.setProperty("spark.default.parallelism", 24) before creating my spark
context as described in
http://spark.incubator.apache.org/docs/latest/tuning.html#level-of-parallelism
has no effect on the default number of partitions that spark uses in its api's
like saveAsTextFile() .
For example if I set spark.default.parallelism to 24, I was expecting 24 tasks
to be invoked upon calling saveAsTextFile() but it's not the case as I am
seeing only 1 task get invoked
If I set my RDD parallelize() to 2 as
dataSetRDD = SparkDriver.getSparkContext().parallelize(mydata,2);
then invoke
dataSetRDD.saveAsTextFile(JavaRddFilePath);
I am seeing 2 tasks get invoked even my spark.default.parallelism was set to 24
Can someone explain the above behavior?
Thanks,
Hussam