mridulm commented on a change in pull request #27583: [SPARK-29149][YARN]
Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r383662626
##########
File path:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
##########
@@ -227,6 +227,17 @@ private object ResourceRequestHelper extends Logging {
resourceInformation
}
+ def isYarnCustomResourcesNonEmpty(resource: Resource): Boolean = {
+ try {
+ // Use reflection as this uses APIs only available in Hadoop 3
Review comment:
Thanks for clarifying the behavior when YARN does support GPU, etc as a
resource.
I am probably missing something here, would be great to understand this
better when YARN does not.
Suppose I have a spark application, depending on some library which requires
GPU (for example) and set corresponding resource profile expectations on the
RDD's created (I am trying to make a case where app developer did not
explicitly configure the resource profiles, but is implicitly leveraging them
via some library).
Now, if this application gets run on hadoop 2.7 (or anything before 2.10 as
you mentioned), what will be the behavior ?
If I understood it right :
1) We will make requests to YARN without GPU's in the allocation request
since YARN does not support it.
2) On the nodes received, we will try to use the discovery script in
assumption that GPU's are available - YARN is just oblivious about them. We
will probably be using node-label constraint to ensure GPU availability ?
3) If there are GPU's detected, we use them - else executor fails ?
Is this right?
If yes, how do we handle multi-tenancy on the executor host ? or choose
which gpu(s) to use ?
Is the assumption that in workloads like this, the entire node is reserved
to prevent contention ? I am not sure if you have documented/detailed this
somewhere and I missed it !
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]