Hi All,

I am currently on Spark 1.6 and I was doing a sql join on two tables that
are over 100 million rows each and I noticed that it was spawn 30000+ tasks
(this is the progress meter that we are seeing show up).  We tried to
coalesece, repartition and shuffle partitions to drop the number of tasks
down because we were getting time outs due to the number of task being
spawned, but those operations did not seem to reduce the number of tasks. 
The solution we came up with was actually to set the num executors to 50
(--num-executors=50) and it looks like it spawned 200 active tasks, but the
total number of tasks remained the same.  Was wondering if anyone knows what
is going on?  Is there an optimal number of executors, I was under the
impression that the default dynamic allocation would pick the optimal number
of executors for us and that this situation wouldn't happen.  Is there
something I am missing?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Setting-Optimal-Number-of-Spark-Executor-Instances-tp28493.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to