mridulm commented on code in PR #37924:
URL: https://github.com/apache/spark/pull/37924#discussion_r974600104
##########
docs/configuration.md:
##########
@@ -2605,6 +2605,15 @@ Apart from these, the following properties are also
available, and may be useful
</td>
<td>2.2.0</td>
</tr>
+<tr>
+ <td><code>spark.stage.attempt.ignoreOnDecommissionFetchFailure</code></td>
Review Comment:
`spark.stage.attempt.ignoreOnDecommissionFetchFailure` ->
`spark.stage.ignoreOnDecommissionFetchFailure`
##########
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##########
@@ -1860,8 +1867,18 @@ private[spark] class DAGScheduler(
s"(attempt ${failedStage.latestInfo.attemptNumber}) running")
} else {
failedStage.failedAttemptIds.add(task.stageAttemptId)
+ val ignoreStageFailure = ignoreDecommissionFetchFailure &&
+ isExecutorDecommissioned(taskScheduler, bmAddress)
+ if (ignoreStageFailure) {
+ logInfo("Ignoring fetch failure from $task of $failedStage attempt
" +
+ s"${task.stageAttemptId} when count
spark.stage.maxConsecutiveAttempts " +
+ "as executor ${bmAddress.executorId} is decommissioned and " +
+ s" ${config.STAGE_IGNORE_DECOMMISSION_FETCH_FAILURE.key}=true")
+ }
+
val shouldAbortStage =
- failedStage.failedAttemptIds.size >= maxConsecutiveStageAttempts ||
+ (!ignoreStageFailure &&
+ failedStage.failedAttemptIds.size >=
maxConsecutiveStageAttempts) ||
disallowStageRetryForTest
Review Comment:
QQ: We are preventing the immediate failure from aborting the stage, but
might be effectively reducing the number of stage failures which can be
tolerated ?
For example:
attempt 0, attempt 1, attempt 2 failed due to decommission
attempt 3 failed for other reasons -> job failed (assuming
maxConsecutiveStageAttempts = 4)
Is this the behavior we will now exhibit ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]