[GitHub] [spark] tgravescs commented on a change in pull request #29292: [SPARK-30322][DOCS] Add stage level scheduling docs

GitBox Wed, 29 Jul 2020 11:16:30 -0700


tgravescs commented on a change in pull request #29292:
URL: https://github.com/apache/spark/pull/29292#discussion_r462399890




##########
File path: docs/configuration.md
##########
@@ -3028,3 +3028,10 @@ There are configurations available to request resources 
for the driver: <code>sp
 Spark will use the configurations specified to first request containers with 
the corresponding resources from the cluster manager. Once it gets the 
container, Spark launches an Executor in that container which will discover 
what resources the container has and the addresses associated with each 
resource. The Executor will register with the Driver and report back the 
resources available to that Executor. The Spark scheduler can then schedule 
tasks to each Executor and assign specific resource addresses based on the 
resource requirements the user specified. The user can see the resources 
assigned to a task using the <code>TaskContext.get().resources</code> api. On 
the driver, the user can see the resources assigned with the SparkContext 
<code>resources</code> call. It's then up to the user to use the 
assignedaddresses to do the processing they want or pass those into the ML/AI 
framework they are using.
 
 See your cluster manager specific page for requirements and details on each of 
- [YARN](running-on-yarn.html#resource-allocation-and-configuration-overview), 
[Kubernetes](running-on-kubernetes.html#resource-allocation-and-configuration-overview)
 and [Standalone 
Mode](spark-standalone.html#resource-allocation-and-configuration-overview). It 
is currently not available with Mesos or local mode. And please also note that 
local-cluster mode with multiple workers is not supported(see Standalone 
documentation).
+
+# Stage Level Scheduling Overview
+
+The stage level scheduling feature allows users to specify task and executor 
resource requirements at the stage level. This allows for different stages to 
run with executors that have different resources. A prime example of this is 
one ETL stage runs with executors with just CPUs, the next stage is an ML stage 
that needs GPUs. Stage level scheduling allows for user to request different 
executors that have GPUs when the ML stage runs rather then having to acquire 
executors with GPUs at the start of the application and them be idle while the 
ETL stage is being run.
+This is only available for the RDD api in Scala, Java, and Python and requires 
dynamic allocation to be enabled.  It is only available on YARN at this time. 
See the [YARN](running-on-yarn.html#stage-level-scheduling-overview) page for 
more implementation details.
+
+See the `RDD.withResources` and `ResourceProfileBuilder` api's for using this 
feature. The current implementation acquires new executors for each 
ResourceProfile created and currently has to be an exact match. Spark does not 
try to fit tasks into an executor that require a different ResourceProfile than 
the executor was created with. Executors that are not in use will idle timeout 
with the dynamic allocation logic. The default configuration for this feature 
is to only allow one ResourceProfile per stage. If the user associates more 
then 1 ResourceProfile to an RDD, Spark will throw an exception by default. See 
config <code>spark.scheduler.resource.profileMergeConflicts</code> to control 
that behavior. The current merge strategy Spark implements when 
<code>spark.scheduler.resource.profileMergeConflicts</code> is enabled is a 
simple max of each resource within the conflicting ResourceProfiles. Spark will 
create a new ResourceProfile with the max of each of the resources.

Review comment:
       I use ResourceProfile a lot, how about I do it to the first one and then 
leave the rest?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #29292: [SPARK-30322][DOCS] Add stage level scheduling docs

Reply via email to