Did you have a change of the value of 'spark.default.parallelism'?be a bigger number.
2015-06-05 17:56 GMT+08:00 Evo Eftimov <evo.efti...@isecc.com>: > It may be that your system runs out of resources (ie 174 is the ceiling) > due to the following > > > > 1. RDD Partition = (Spark) Task > > 2. RDD Partition != (Spark) Executor > > 3. (Spark) Task != (Spark) Executor > > 4. (Spark) Task = JVM Thread > > 5. (Spark) Executor = JVM instance > > > > *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] > *Sent:* Friday, June 5, 2015 10:48 AM > *To:* user > *Subject:* How to increase the number of tasks > > > > I have a stage that spawns 174 tasks when i run repartition on avro data. > > Tasks read between 512/317/316/214/173 MB of data. Even if i increase > number of executors/ number of partitions (when calling repartition) the > number of tasks launched remains fixed to 174. > > > > 1) I want to speed up this task. How do i do it ? > > 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is > this behavior ? > > Since this is a repartition stage, it should not depend on the nature of > data. > > > > Its taking more than 30 mins and i want to speed it up by throwing more > executors at it. > > > > Please suggest > > > > Deepak > > >