tgravescs commented on a change in pull request #28053: [SPARK-29153][CORE]Add 
ability to merge resource profiles within a stage with Stage Level Scheduling
URL: https://github.com/apache/spark/pull/28053#discussion_r400240875
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
 ##########
 @@ -447,10 +449,28 @@ private[spark] class DAGScheduler(
       stageResourceProfiles: HashSet[ResourceProfile]): ResourceProfile = {
     logDebug(s"Merging stage rdd profiles: $stageResourceProfiles")
     val resourceProfile = if (stageResourceProfiles.size > 1) {
-      // add option later to actually merge profiles - SPARK-29153
-      throw new IllegalArgumentException("Multiple ResourceProfile's specified 
in the RDDs for " +
-        "this stage, please resolve the conflicting ResourceProfile's as Spark 
doesn't" +
-        "currently support merging them.")
+      if (shouldMergeResourceProfiles) {
+        var mergedProfile: ResourceProfile = stageResourceProfiles.head
+        for (profile <- stageResourceProfiles.drop(1)) {
+          mergedProfile = mergeResourceProfiles(mergedProfile, profile)
+        }
+        // compared merged profile with existing ones so we we don't add it 
over and over again
+        // if the user runs the same operation multiple times
+        val resProfile = 
sc.resourceProfileManager.getEquivalentProfile(mergedProfile)
+        resProfile match {
+          case Some(existingRp) => existingRp
 
 Review comment:
   if a user configures same resource for 2 separate stages then yes it could 
use the same profile. This is no different than currently so I don't understand 
your question about race for resources.  You can run 2 stages now and they will 
be both require resources to run. The dynamic allocation manager handles this 
and asks for the appropriate number of executors. If you don't have enough to 
run all in parallel, the scheduler simply schedules as available. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to