tgravescs commented on a change in pull request #30204:
URL: https://github.com/apache/spark/pull/30204#discussion_r523107046



##########
File path: docs/running-on-yarn.md
##########
@@ -644,6 +644,7 @@ YARN does not tell Spark the addresses of the resources 
allocated to each contai
 # Stage Level Scheduling Overview
 
 Stage level scheduling is supported on YARN when dynamic allocation is 
enabled. One thing to note that is YARN specific is that each ResourceProfile 
requires a different container priority on YARN. The mapping is simply the 
ResourceProfile id becomes the priority, on YARN lower numbers are higher 
priority. This means that profiles created earlier will have a higher priority 
in YARN. Normally this won't matter as Spark finishes one stage before starting 
another one, the only case this might have an affect is in a job server type 
scenario, so its something to keep in mind.
+Note there is a difference in the way custom resources are handled between the 
base default profile and custom ResourceProfiles. To allow for the user to 
request YARN containers with extra resources without Spark scheduling on them, 
the user can specify resources via the 
<code>spark.yarn.executor.resource.</code> config. Those configs are only used 
in the base default profile though and do not get propogated into any other 
custom ResourceProfiles. This is because there would be no way to remove them 
if you wanted a stage to not have them. This results in your default profile 
getting custom resources defined in <code>spark.yarn.executor.resource.</code> 
plus spark defined resources of GPU or FPGA. Spark converts GPU and FPGA 
resources into the YARN built in types <code>yarn.io/gpu</code>) and 
<code>yarn.io/fpga</code>, but does not know the mapping of any other 
resources. Any other Spark custom resources are not propogated to YARN for the 
default profile. So if you want Spark to sche
 dule based off a custom resource and have it requested from YARN, you must 
specify it in both YARN (<code>spark.yarn.{driver/executor}.resource.</code>) 
and Spark (<code>spark.{driver/executor}.resource.</code>) configs. Leave the 
Spark config off if you only want YARN containers with the extra resources but 
Spark not to schedule using them. Now for custom ResourceProfiles, it doesn't 
currently have a way to only specify YARN resources without Spark scheduling 
off of them. This means for custom ResourceProfiles we propogate all the 
resources defined in the ResourceProfile to YARN. We still convert GPU and FPGA 
to the YARN build in types as well. This requires that the name of any custom 
resources you specify match what they are defined as in YARN.
 

Review comment:
       we currently do not. this falls under the allow other spark confs to be 
specified in the profile and was to be added after initial development.  this 
definitely makes sense though as I know of a couple companies that have things 
setup this way.  I'm honestly not sure if it can be done via yarn api right now 
though. I thought they added ability to request application master container go 
to separate queue from the other ones but not sure if you can ask for 
containers from different queue.  
   Do you have immediate use case for this? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to