just multiply 2-4 with the cpu core number of the node .

2015-06-05 18:04 GMT+08:00 ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>:

> I did not change spark.default.parallelism,
> What is recommended value for it.
>
> On Fri, Jun 5, 2015 at 3:31 PM, 李铖 <lidali...@gmail.com> wrote:
>
>> Did you have a change of the value of 'spark.default.parallelism'?be a
>> bigger number.
>>
>> 2015-06-05 17:56 GMT+08:00 Evo Eftimov <evo.efti...@isecc.com>:
>>
>>> It may be that your system runs out of resources (ie 174 is the ceiling)
>>> due to the following
>>>
>>>
>>>
>>> 1.       RDD Partition = (Spark) Task
>>>
>>> 2.       RDD Partition != (Spark) Executor
>>>
>>> 3.       (Spark) Task != (Spark) Executor
>>>
>>> 4.       (Spark) Task = JVM Thread
>>>
>>> 5.       (Spark) Executor = JVM instance
>>>
>>>
>>>
>>> *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com]
>>> *Sent:* Friday, June 5, 2015 10:48 AM
>>> *To:* user
>>> *Subject:* How to increase the number of tasks
>>>
>>>
>>>
>>> I have a  stage that spawns 174 tasks when i run repartition on avro
>>> data.
>>>
>>> Tasks read between 512/317/316/214/173  MB of data. Even if i increase
>>> number of executors/ number of partitions (when calling repartition) the
>>> number of tasks launched remains fixed to 174.
>>>
>>>
>>>
>>> 1) I want to speed up this task. How do i do it ?
>>>
>>> 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why
>>> is this behavior ?
>>>
>>> Since this is a repartition stage, it should not depend on the nature of
>>> data.
>>>
>>>
>>>
>>> Its taking more than 30 mins and i want to speed it up by throwing more
>>> executors at it.
>>>
>>>
>>>
>>> Please suggest
>>>
>>>
>>>
>>> Deepak
>>>
>>>
>>>
>>
>>
>
>
> --
> Deepak
>
>

Reply via email to