[GitHub] [spark] mridulm commented on a change in pull request #27313: [SPARK-29148][CORE] Add stage level scheduling dynamic allocation and scheduler backend changes

GitBox Sat, 08 Feb 2020 23:14:00 -0800

mridulm commented on a change in pull request #27313: [SPARK-29148][CORE] Add 
stage level scheduling dynamic allocation and scheduler backend changes
URL: https://github.com/apache/spark/pull/27313#discussion_r376760827


 ##########
 File path: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
 ##########
 @@ -423,21 +510,30 @@ private[spark] class ExecutorAllocationManager(
    */
   private def removeExecutors(executors: Seq[String]): Seq[String] = 
synchronized {
     val executorIdsToBeRemoved = new ArrayBuffer[String]
-
     logDebug(s"Request to remove executorIds: ${executors.mkString(", ")}")
-    val numExistingExecutors = executorMonitor.executorCount - 
executorMonitor.pendingRemovalCount
-
-    var newExecutorTotal = numExistingExecutors
+    val numExecutorsTotalPerRpId = mutable.Map[Int, Int]()
     executors.foreach { executorIdToBeRemoved =>
-      if (newExecutorTotal - 1 < minNumExecutors) {
-        logDebug(s"Not removing idle executor $executorIdToBeRemoved because 
there are only " +
-          s"$newExecutorTotal executor(s) left (minimum number of executor 
limit $minNumExecutors)")
-      } else if (newExecutorTotal - 1 < numExecutorsTarget) {
-        logDebug(s"Not removing idle executor $executorIdToBeRemoved because 
there are only " +
-          s"$newExecutorTotal executor(s) left (number of executor target 
$numExecutorsTarget)")
+      val rpId = getResourceProfileIdOfExecutor(executorIdToBeRemoved)
+      if (rpId == UNKNOWN_RESOURCE_PROFILE_ID) {
+        logWarning(s"Not removing executor $executorIdsToBeRemoved because 
couldn't find " +
+          "ResourceProfile for it!")
 
 Review comment:
   Essentially where ever we are introducing resource profile, a Set[E] becomes 
a Map[Int, Set[E]] and an Int becomes a Map[Int, Int]. Under normal 
circumstances, I do not expect these to be bad.
   But given that we dont have named resource profiles, I am not sure how it 
interacts with a loop. For example, in an ML loop :
   ```
   val inputRdd = prepare()
   while (condition) {
     val computeRdd = needGpuResources(inputRdd.map().foo.bar)
     ...
   }
   ```
   
   Here, if I understood, for each iteration we will create a new resource 
profile. Depending on number of iterations, we could end up with an increasing 
memory usage (degenerate case - if this is used in streaming).
   Note: this is not a regression - for default profile, there is no usage 
increase (other than a negligible increase).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm commented on a change in pull request #27313: [SPARK-29148][CORE] Add stage level scheduling dynamic allocation and scheduler backend changes

Reply via email to