Re: default parallelism in trunk

Koert Kuipers Sun, 02 Feb 2014 17:50:24 -0800

After the upgrade spark-shell still behaved properly. But a scala program
that defined it's own SparkContext and did not set
spark.default.parallelism suddenly was stuck with a parallelism of 2. I
"fixed it" by setting a desired spark.default.parallelism system property
for now, and no longer relying on the default.



On Sun, Feb 2, 2014 at 7:48 PM, Aaron Davidson <[email protected]> wrote:

> Sorry, I meant to say we will use the maximum between (the total number of
> cores in the cluster) and (2) if spark.default.parallelism is not set. So
> this should not be causing your problem unless your cluster thinks it has
> less than 2 cores.
>
>
> On Sun, Feb 2, 2014 at 4:46 PM, Aaron Davidson <[email protected]> wrote:
>
>> Could you give an example where default parallelism is set to 2 where it
>> didn't used to be?
>>
>> Here is the relevant section for the spark standalone mode:
>> CoarseGrainedSchedulerBackend.scala#L211<https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L211>.
>> If spark.default.parallelism is set, it will override anything else. If it
>> is not set, we will use the total number of cores in the cluster and 2,
>> which is the same logic that has been used since 
>> spark-0.7<https://github.com/apache/incubator-spark/blob/branch-0.7/core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala#L156>
>> .
>>
>> Simplest possibility is that you're setting spark.default.parallelism,
>> otherwise there may be a bug introduced somewhere that isn't defaulting
>> correctly anymore.
>>
>>
>> On Sat, Feb 1, 2014 at 12:30 AM, Koert Kuipers <[email protected]> wrote:
>>
>>> i just managed to upgrade my 0.9-SNAPSHOT from the last scala 2.9.x
>>> version to the latest.
>>>
>>>
>>> everything seems good except that my default parallelism is now set to 2
>>> for jobs instead of some smart number based on the number of cores (i think
>>> that is what it used to do). it this change on purpose?
>>>
>>> i am running spark standalone.
>>>
>>> thx, koert
>>>
>>
>>
>

Re: default parallelism in trunk

Reply via email to