mingkangli-db commented on code in PR #48551:
URL: https://github.com/apache/spark/pull/48551#discussion_r1814234935
##########
core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala:
##########
@@ -50,19 +63,31 @@ private[spark] class SparkUncaughtExceptionHandler(val
exitOnUncaughtException:
// We may have been called from a shutdown hook. If so, we must not call
System.exit().
// (If we do, we will deadlock.)
if (!ShutdownHookManager.inShutdown()) {
- exception match {
- case _: OutOfMemoryError =>
- System.exit(SparkExitCode.OOM)
- case e: SparkFatalException if
e.throwable.isInstanceOf[OutOfMemoryError] =>
- // SPARK-24294: This is defensive code, in case that
SparkFatalException is
- // misused and uncaught.
- System.exit(SparkExitCode.OOM)
- case _: KilledByTaskReaperException if exitOnUncaughtException =>
- System.exit(ExecutorExitCode.KILLED_BY_TASK_REAPER)
- case _ if exitOnUncaughtException =>
- System.exit(SparkExitCode.UNCAUGHT_EXCEPTION)
- case _ =>
- // SPARK-30310: Don't System.exit() when exitOnUncaughtException
is false
+ // Traverse the causes up to killOnFatalErrorDepth layers
Review Comment:
Oops, I think we should address this separately in a follow-up ticket
because I just realized another issue with uncaught exception handling. In
`Executor.scala`, the class `KilledByTaskReaperException(message: String)
extends SparkException(message)`, added in
[SPARK-48541](https://github.com/apache/spark/pull/46883), is not considered a
fatal error when checked through `Executor.isFatalError()`:
For example, running the following code returns `false`:
```scala
// Create an instance of the KilledByTaskReaperException
val exception = new KilledByTaskReaperException("Task was killed by the task
reaper")
// Result is false!
val result = Executor.isFatalError(exception, depthToCheck = 5)
```
However, in `SparkUncaughtExceptionHandler`, it is assumed to be a fatal
error (one of the cases in the handler):
```scala
case _: KilledByTaskReaperException if exitOnUncaughtException =>
System.exit(ExecutorExitCode.KILLED_BY_TASK_REAPER)
```
Additionally, we seem to assume it should be fatal error here in
`Executor.scala` as well, from the comments: [code
link](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L1043-L1048).
Seems like we expect the exception to bubble up to the
`uncaughtExceptionHandler` here: [code
link](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L861-L862),
but it won't because `Executor.isFatalError()` returns `false`!
This inconsistency arises because `Executor.isFatalError` doesn't treat
`KilledByTaskReaperException` as fatal (when it should!), but we assumed in
`Executor.scala` and in `SparkUncaughtExceptionHandler` that it is fatal. To
reuse the logic, we need to fix `Executor.isFatalError` to recognize
`KilledByTaskReaperException` as a fatal error first.
What do you think? @cloud-fan
Cc: @bozhang2820 (who added `KilledByTaskReaperException`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]