tgravescs commented on code in PR #43746:
URL: https://github.com/apache/spark/pull/43746#discussion_r1391209682


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2087,6 +2087,17 @@ package object config {
       .doubleConf
       .createOptional
 
+  private[spark] val SCHEDULER_MIN_RESOURCES_TO_SURVIVE_RATIO =
+    ConfigBuilder("spark.scheduler.minResourcesToSurviveRatio")

Review Comment:
   I think this config should have the excludeOnFailure in the name if it is 
applying to that feature, which is implied in description of this.  I also 
think this feature could be quite confusing to users, should be mentioned in 
the that documentation.



##########
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala:
##########
@@ -717,6 +719,15 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
 
   def sufficientResourcesRegistered(): Boolean = true
 
+  // When the executor failure tracker collects enough failures, if the 
current resources are
+  // insufficient for keep the app running, it will fail the application 
directly; otherwise,
+  // it survives this check round.
+  def insufficientResourcesRetained(): Boolean = {
+    totalRegisteredExecutors.get() < maxExecutors * minSurviveRatio

Review Comment:
   with dynamic allocation maxExecutors is Int.MaxValue, so how does that 
really work with it?  I would basically say it doesn't.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to