tgravescs commented on code in PR #43746:
URL: https://github.com/apache/spark/pull/43746#discussion_r1391209682
##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2087,6 +2087,17 @@ package object config {
.doubleConf
.createOptional
+ private[spark] val SCHEDULER_MIN_RESOURCES_TO_SURVIVE_RATIO =
+ ConfigBuilder("spark.scheduler.minResourcesToSurviveRatio")
Review Comment:
I think this config should have the excludeOnFailure in the name if it is
applying to that feature, which is implied in description of this. I also
think this feature could be quite confusing to users, should be mentioned in
the that documentation.
##########
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala:
##########
@@ -717,6 +719,15 @@ class CoarseGrainedSchedulerBackend(scheduler:
TaskSchedulerImpl, val rpcEnv: Rp
def sufficientResourcesRegistered(): Boolean = true
+ // When the executor failure tracker collects enough failures, if the
current resources are
+ // insufficient for keep the app running, it will fail the application
directly; otherwise,
+ // it survives this check round.
+ def insufficientResourcesRetained(): Boolean = {
+ totalRegisteredExecutors.get() < maxExecutors * minSurviveRatio
Review Comment:
with dynamic allocation maxExecutors is Int.MaxValue, so how does that
really work with it? I would basically say it doesn't.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]