Hi,
        I started to use spark on yarn recently and found a problem while 
tuning my program.

When SparkContext is initialized as sc and ready to read text file from hdfs, 
the textFile(path, defaultMinPartitions) method is called.
I traced down the second parameter in the spark source code and finally found 
this:
   conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))  
in  CoarseGrainedSchedulerBackend.scala

I do not specify the property "spark.default.parallelism" anywhere so the 
getInt will return value from the larger one between totalCoreCount and 2.

When I submit the application using spark-submit and specify the parameter: 
--num-executors  2   --executor-cores 6, I suppose the totalCoreCount will be
2*6 = 12, so defaultMinPartitions will be 12.

But when I print the value of defaultMinPartitions in my program, I still get 2 
in return,  How does this happen, or where do I make a mistake?

Reply via email to