[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user jcuquemelle commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r176036965 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user jcuquemelle commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r175852840 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor + 1 + +By default, the dynamic allocation will request enough executors to maximize the +parallelism according to the number of tasks to process. While this minimizes the +latency of the job, with small tasks this setting wastes a lot of resources due to --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user jcuquemelle commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r175852906 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor + 1 + +By default, the dynamic allocation will request enough executors to maximize the +parallelism according to the number of tasks to process. While this minimizes the +latency of the job, with small tasks this setting wastes a lot of resources due to +executor allocation overhead, as some executor might not even do any work. +This setting allows to set a divisor that will be used to reduce the number of +executors w.r.t. full parallelism --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user jcuquemelle commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r175852881 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor + 1 + +By default, the dynamic allocation will request enough executors to maximize the +parallelism according to the number of tasks to process. While this minimizes the +latency of the job, with small tasks this setting wastes a lot of resources due to +executor allocation overhead, as some executor might not even do any work. +This setting allows to set a divisor that will be used to reduce the number of +executors w.r.t. full parallelism +Defaults to 1.0 --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r175835570 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor --- End diff -- sorry didn't get back to this earlier, I think fullExecutorAllocationDivisor would be fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user jcuquemelle commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r175831139 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor --- End diff -- How about something like fullAllocationDivisor ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r174130728 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor + 1 + +By default, the dynamic allocation will request enough executors to maximize the +parallelism according to the number of tasks to process. While this minimizes the +latency of the job, with small tasks this setting wastes a lot of resources due to +executor allocation overhead, as some executor might not even do any work. +This setting allows to set a divisor that will be used to reduce the number of +executors w.r.t. full parallelism --- End diff -- add period at end of parallelism --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r174126562 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor + 1 + +By default, the dynamic allocation will request enough executors to maximize the +parallelism according to the number of tasks to process. While this minimizes the +latency of the job, with small tasks this setting wastes a lot of resources due to +executor allocation overhead, as some executor might not even do any work. +This setting allows to set a divisor that will be used to reduce the number of +executors w.r.t. full parallelism +Defaults to 1.0 --- End diff -- I think we should define that maxExecutors trumps this setting. If I have 1 tasks, divisor 2, I would expect 5000 executors, but if max executors is 1000, that is all I get. we should add a test for this interaction as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r174145101 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor --- End diff -- Naming configs is really hard and lots of different opinions on it and in the end someone is going to be confused, I need to think about this some more. I see the reason to use Parallelism here rather then maxExecutors (maxExecutorsDivisor - could be confusing if people think it applies to the maxExecutors config), but I also think parallelism would be confused with the parallelism in the spark.default.parallelism, its not defining number of tasks but number of executors to allocate based on the parallelism. Another one I thought of is executorAllocationDivisor. I'll think about it some more and get back. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r174125381 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor + 1 + +By default, the dynamic allocation will request enough executors to maximize the +parallelism according to the number of tasks to process. While this minimizes the +latency of the job, with small tasks this setting wastes a lot of resources due to --- End diff -- can waste. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r166655479 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -114,9 +114,13 @@ private[spark] class ExecutorAllocationManager( // TODO: The default value of 1 for spark.executor.cores works right now because dynamic // allocation is only supported for YARN and the default number of cores per executor in YARN is // 1, but it might need to be attained differently for different cluster managers - private val tasksPerExecutor = + private val taskSlotPerExecutor = conf.getInt("spark.executor.cores", 1) / conf.getInt("spark.task.cpus", 1) + private val tasksPerExecutorSlot = conf.getInt("spark.dynamicAllocation.tasksPerExecutorSlot", 1) + + private val tasksPerExecutor = tasksPerExecutorSlot * taskSlotPerExecutor --- End diff -- Since we aren't using concept of slots, I think we should leave the tasksPerExecutor alone and put this functionality into maxNumExecutorsNeeded() --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r166649526 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -114,9 +114,13 @@ private[spark] class ExecutorAllocationManager( // TODO: The default value of 1 for spark.executor.cores works right now because dynamic // allocation is only supported for YARN and the default number of cores per executor in YARN is // 1, but it might need to be attained differently for different cluster managers - private val tasksPerExecutor = + private val taskSlotPerExecutor = conf.getInt("spark.executor.cores", 1) / conf.getInt("spark.task.cpus", 1) + private val tasksPerExecutorSlot = conf.getInt("spark.dynamicAllocation.tasksPerExecutorSlot", 1) --- End diff -- I think we should change the name of this config because spark doesn't have the concept of slots and I think it could be confusing to the users who might expect exactly x tasks to be processed on each executor. I am thinking more along the lines of spark.dynamicAllocation.maxExecutorsPerStageDivisor=max # of executors based on # of tasks required for that stage divided by this number. I'm open to other config names here though. I think we would also need to define its interaction with spark.dynamicAllocation.maxExecutors as well as how it works as # of running/to be run tasks changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
GitHub user jcuquemelle opened a pull request: https://github.com/apache/spark/pull/19881 [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter ## What changes were proposed in this pull request? let's say an executor has spark.executor.cores / spark.task.cpus taskSlots The current dynamic allocation policy allocates enough executors to have each taskSlot execute a single task, which wastes resources when tasks are small regarding executor allocation overhead. By adding the tasksPerExecutorSlot, it is made possible to specify how many tasks a single slot should ideally execute to mitigate the overhead of executor allocation. ## How was this patch tested? Units tests and runs on various actual workloads on a Yarn Cluster You can merge this pull request into a Git repository by running: $ git pull https://github.com/jcuquemelle/spark AddTaskPerExecutorSlot Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19881.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19881 commit 895882feebc53f44a70278e0b475b2fb937d331a Author: Julien CuquemelleDate: 2017-11-30T16:28:06Z [SPARK-22683][CORE] Allow tuning the number of dynamically allocated executors let's say an executor has spark.executor.cores / spark.task.cpus taskSlots The current dynamic allocation policy allocates enough executors to have each taskSlot execute a single task, which wastes resources when tasks are small regarding executor allocation overhead. By adding the tasksPerExecutorSlot, it is made possible to specify how many tasks a single slot should ideally execute to mitigate the overhead of executor allocation. commit fce3b976d0b22c4d01ef4fdd5339835bc6d6fcb1 Author: Julien Cuquemelle Date: 2017-11-30T16:28:06Z [SPARK-22683][DOC] document tasksPerExecutorSlot parameter --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org