warrenzhu25 commented on a change in pull request #29082:
URL: https://github.com/apache/spark/pull/29082#discussion_r494468057
##########
File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
##########
@@ -597,6 +598,20 @@ private[spark] class AppStatusListener(
None
}
task.errorMessage = errorMessage
+ task.failureReason = event.reason match {
Review comment:
Yes, in previous version, I directly passed error message to build
failure reason, but I found it hard, complex and error-prone. This due to
different failure reason has different format:
1. `ExceptionFailure` is most common and easy one. The message is only
exception's message, we can parse it from errorString.
2. `ExecutorLostFailure` errorString has format like `ExecutorLostFailure
(executor 72 exited caused by one of the running tasks) Reason: Executor
heartbeat timed out after 123153 ms`. We can split the message out using
keyword `Reason:`
3. `FetchFailed` errorString has format like `FetchFailed($bmAddressString,
shuffleId=$shuffleId, mapId=$mapId, reduceId=$reduceId, message=\n$message\n)`.
We need to parse message from this.
In summary, parsing will rely on and couple with specific format of
different failure reason. If in the future, the format changed, it's easy to
break. So I chose the solution to directly get from TaskEndReason.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]