just multiply 2-4 with the cpu core number of the node . 2015-06-05 18:04 GMT+08:00 ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>:
> I did not change spark.default.parallelism, > What is recommended value for it. > > On Fri, Jun 5, 2015 at 3:31 PM, 李铖 <lidali...@gmail.com> wrote: > >> Did you have a change of the value of 'spark.default.parallelism'?be a >> bigger number. >> >> 2015-06-05 17:56 GMT+08:00 Evo Eftimov <evo.efti...@isecc.com>: >> >>> It may be that your system runs out of resources (ie 174 is the ceiling) >>> due to the following >>> >>> >>> >>> 1. RDD Partition = (Spark) Task >>> >>> 2. RDD Partition != (Spark) Executor >>> >>> 3. (Spark) Task != (Spark) Executor >>> >>> 4. (Spark) Task = JVM Thread >>> >>> 5. (Spark) Executor = JVM instance >>> >>> >>> >>> *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] >>> *Sent:* Friday, June 5, 2015 10:48 AM >>> *To:* user >>> *Subject:* How to increase the number of tasks >>> >>> >>> >>> I have a stage that spawns 174 tasks when i run repartition on avro >>> data. >>> >>> Tasks read between 512/317/316/214/173 MB of data. Even if i increase >>> number of executors/ number of partitions (when calling repartition) the >>> number of tasks launched remains fixed to 174. >>> >>> >>> >>> 1) I want to speed up this task. How do i do it ? >>> >>> 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why >>> is this behavior ? >>> >>> Since this is a repartition stage, it should not depend on the nature of >>> data. >>> >>> >>> >>> Its taking more than 30 mins and i want to speed it up by throwing more >>> executors at it. >>> >>> >>> >>> Please suggest >>> >>> >>> >>> Deepak >>> >>> >>> >> >> > > > -- > Deepak > >