[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-21 Thread jcuquemelle
Github user jcuquemelle commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r176036965
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-20 Thread jcuquemelle
Github user jcuquemelle commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r175852840
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
+  1
+  
+By default, the dynamic allocation will request enough executors to 
maximize the 
+parallelism according to the number of tasks to process. While this 
minimizes the 
+latency of the job, with small tasks this setting wastes a lot of 
resources due to
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-20 Thread jcuquemelle
Github user jcuquemelle commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r175852906
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
+  1
+  
+By default, the dynamic allocation will request enough executors to 
maximize the 
+parallelism according to the number of tasks to process. While this 
minimizes the 
+latency of the job, with small tasks this setting wastes a lot of 
resources due to
+executor allocation overhead, as some executor might not even do any 
work.
+This setting allows to set a divisor that will be used to reduce the 
number of
+executors w.r.t. full parallelism
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-20 Thread jcuquemelle
Github user jcuquemelle commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r175852881
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
+  1
+  
+By default, the dynamic allocation will request enough executors to 
maximize the 
+parallelism according to the number of tasks to process. While this 
minimizes the 
+latency of the job, with small tasks this setting wastes a lot of 
resources due to
+executor allocation overhead, as some executor might not even do any 
work.
+This setting allows to set a divisor that will be used to reduce the 
number of
+executors w.r.t. full parallelism
+Defaults to 1.0
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-20 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r175835570
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
--- End diff --

sorry didn't get back to this earlier, I think 
fullExecutorAllocationDivisor would be fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-20 Thread jcuquemelle
Github user jcuquemelle commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r175831139
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
--- End diff --

How about something like fullAllocationDivisor ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-13 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r174130728
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
+  1
+  
+By default, the dynamic allocation will request enough executors to 
maximize the 
+parallelism according to the number of tasks to process. While this 
minimizes the 
+latency of the job, with small tasks this setting wastes a lot of 
resources due to
+executor allocation overhead, as some executor might not even do any 
work.
+This setting allows to set a divisor that will be used to reduce the 
number of
+executors w.r.t. full parallelism
--- End diff --

add period at end of parallelism


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-13 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r174126562
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
+  1
+  
+By default, the dynamic allocation will request enough executors to 
maximize the 
+parallelism according to the number of tasks to process. While this 
minimizes the 
+latency of the job, with small tasks this setting wastes a lot of 
resources due to
+executor allocation overhead, as some executor might not even do any 
work.
+This setting allows to set a divisor that will be used to reduce the 
number of
+executors w.r.t. full parallelism
+Defaults to 1.0
--- End diff --

I think we should define that maxExecutors trumps this setting.  

If I have 1 tasks, divisor 2, I would expect 5000 executors, but if max 
executors is 1000, that is all I get. 

we should add a test for this interaction as well


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-13 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r174145101
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
--- End diff --

Naming configs is really hard and lots of different opinions on it and in 
the end someone is going to be confused,  I need to think about this some more. 
 I see the reason to use Parallelism here rather then maxExecutors  
(maxExecutorsDivisor -  could be confusing if people think it applies to the 
maxExecutors config), but I also think parallelism would be confused with the 
parallelism in the spark.default.parallelism, its not defining number of tasks 
but number of executors to allocate based on the parallelism.  Another one I 
thought of is executorAllocationDivisor.  I'll think about it some more and get 
back.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-13 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r174125381
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
+  1
+  
+By default, the dynamic allocation will request enough executors to 
maximize the 
+parallelism according to the number of tasks to process. While this 
minimizes the 
+latency of the job, with small tasks this setting wastes a lot of 
resources due to
--- End diff --

can waste.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-02-07 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r166655479
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -114,9 +114,13 @@ private[spark] class ExecutorAllocationManager(
   // TODO: The default value of 1 for spark.executor.cores works right now 
because dynamic
   // allocation is only supported for YARN and the default number of cores 
per executor in YARN is
   // 1, but it might need to be attained differently for different cluster 
managers
-  private val tasksPerExecutor =
+  private val taskSlotPerExecutor =
 conf.getInt("spark.executor.cores", 1) / 
conf.getInt("spark.task.cpus", 1)
 
+  private val tasksPerExecutorSlot = 
conf.getInt("spark.dynamicAllocation.tasksPerExecutorSlot", 1)
+
+  private val tasksPerExecutor = tasksPerExecutorSlot * taskSlotPerExecutor
--- End diff --

Since we aren't using concept of slots, I think we should leave the 
tasksPerExecutor alone and put this functionality into maxNumExecutorsNeeded()


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-02-07 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r166649526
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -114,9 +114,13 @@ private[spark] class ExecutorAllocationManager(
   // TODO: The default value of 1 for spark.executor.cores works right now 
because dynamic
   // allocation is only supported for YARN and the default number of cores 
per executor in YARN is
   // 1, but it might need to be attained differently for different cluster 
managers
-  private val tasksPerExecutor =
+  private val taskSlotPerExecutor =
 conf.getInt("spark.executor.cores", 1) / 
conf.getInt("spark.task.cpus", 1)
 
+  private val tasksPerExecutorSlot = 
conf.getInt("spark.dynamicAllocation.tasksPerExecutorSlot", 1)
--- End diff --

I think we should change the name of this config because spark doesn't have 
the concept of slots and I think it could be confusing to the users who might 
expect exactly x tasks to be processed on each executor. I am thinking more 
along the lines of spark.dynamicAllocation.maxExecutorsPerStageDivisor=max # of 
executors based on  # of tasks required for that stage divided by this number.  
 I'm open to other config names here though.

I think we would also need to define its interaction with 
spark.dynamicAllocation.maxExecutors as well as how it works as # of running/to 
be run tasks changes. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2017-12-04 Thread jcuquemelle
GitHub user jcuquemelle opened a pull request:

https://github.com/apache/spark/pull/19881

[SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

## What changes were proposed in this pull request?

let's say an executor has spark.executor.cores / spark.task.cpus taskSlots
The current dynamic allocation policy allocates enough executors
to have each taskSlot execute a single task, which wastes resources when
tasks are small regarding executor allocation overhead. By adding the
tasksPerExecutorSlot, it is made possible to specify how many tasks
a single slot should ideally execute to mitigate the overhead of executor
allocation.

## How was this patch tested?
Units tests and runs on various actual workloads on a Yarn Cluster 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jcuquemelle/spark AddTaskPerExecutorSlot

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19881.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19881


commit 895882feebc53f44a70278e0b475b2fb937d331a
Author: Julien Cuquemelle 
Date:   2017-11-30T16:28:06Z

[SPARK-22683][CORE] Allow tuning the number of dynamically allocated 
executors

let's say an executor has spark.executor.cores / spark.task.cpus taskSlots

The current dynamic allocation policy allocates enough executors
to have each taskSlot execute a single task, which wastes resources when
 tasks are small regarding executor allocation overhead. By adding the
tasksPerExecutorSlot, it is made possible to specify how many tasks
a single slot should ideally execute to mitigate the overhead of executor
allocation.

commit fce3b976d0b22c4d01ef4fdd5339835bc6d6fcb1
Author: Julien Cuquemelle 
Date:   2017-11-30T16:28:06Z

[SPARK-22683][DOC] document tasksPerExecutorSlot parameter




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org