Ngone51 commented on code in PR #47793:
URL: https://github.com/apache/spark/pull/47793#discussion_r1735495020


##########
core/src/main/scala/org/apache/spark/scheduler/HealthTracker.scala:
##########
@@ -425,14 +425,16 @@ private[spark] object HealthTracker extends Logging {
   private val DEFAULT_TIMEOUT = "1h"
 
   /**
-   * Returns true if the excludeOnFailure is enabled, based on checking the 
configuration
-   * in the following order:
-   * 1. Is it specifically enabled or disabled?
+   * Returns true if the excludeOnFailure is enabled on the application level,
+   * based on checking the configuration in the following order:
+   * 1. Is application level exclusion specifically enabled or disabled?
+   * 2. Is overall exclusion feature enabled or disabled?
    * 2. Is it enabled via the legacy timeout conf?
    * 3. Default is off

Review Comment:
   ```suggestion
      * 3. Is it enabled via the legacy timeout conf?
      * 4. Default is off
   ```



##########
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:
##########
@@ -143,8 +143,18 @@ private[spark] class TaskSetManager(
   private var calculatedTasks = 0
 
   private[scheduler] val taskSetExcludelistHelperOpt: 
Option[TaskSetExcludelist] = {
-    healthTracker.map { _ =>
-      new TaskSetExcludelist(sched.sc.listenerBus, conf, stageId, 
taskSet.stageAttemptId, clock)
+    if (TaskSetExcludelist.isExcludeOnFailureEnabled(conf)) {
+      Some(new TaskSetExcludelist(sched.sc.listenerBus, conf, stageId,
+        taskSet.stageAttemptId, clock))
+    } else if (healthTracker.isDefined) {
+      // If we enabled exclusion at application level but not at taskset level 
exclusion, we create
+      // TaskSetExcludelist in dry run mode.

Review Comment:
   This seems to introduce the behaviour change. Before this PR,  
`TaskSetExcludelist` is created in non-dry-run mode when 
`EXCLUDE_ON_FAILURE_ENABLED` is on. After this PR, the non-dry-run mode 
creation requires another conf `EXCLUDE_ON_FAILURE_ENABLED_TASK_AND_STAGE` on 
implicitly. I think adding an explicit conf for "_whether creating 
TaskSetExcludelist in dry-run mode_" maybe a better solution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to