[GitHub] spark pull request #16639: [SPARK-19276][CORE] Fetch Failure handling robust...

squito Tue, 28 Feb 2017 09:09:28 -0800

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16639#discussion_r103503390
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -400,8 +410,16 @@ private[spark] class Executor(
             execBackend.statusUpdate(taskId, TaskState.FINISHED, 
serializedResult)
     
           } catch {
    -        case ffe: FetchFailedException =>
    -          val reason = ffe.toTaskFailedReason
    +        case t: Throwable if hasFetchFailure =>
    +          val reason = task.context.fetchFailed.get.toTaskFailedReason
    +          if (!t.isInstanceOf[FetchFailedException]) {
    +            // there was a fetch failure in the task, but some user code 
wrapped that exception
    +            // and threw something else.  Regardless, we treat it as a 
fetch failure.
    +            logWarning(s"TID ${taskId} encountered a 
${classOf[FetchFailedException]} and " +
    +              s"failed, but did not directly throw the 
${classOf[FetchFailedException]}.  " +
    +              s"Spark is still handling the fetch failure, but these 
exceptions should not be " +
    +              s"intercepted by user code.")
    --- End diff --
    
    Thanks, I like that msg better.  I changed it slightly so the original 
exception is at the end, otherwise its hard to tell where the original 
exception ends and you are back to the error msg.  Here's what the new msg 
looks like from the test case now:
    
    ```
    17/02/27 16:33:43.953 Executor task launch worker for task 0 WARN Executor: 
TID 0 encountered a org.apache.spark.shuffle.FetchFailedException and failed, 
but the org.apache.spark.shuffle.FetchFailedException was hidden by another 
exception.  Spark is handling this like a fetch failure and ignoring the other 
exception: java.lang.RuntimeException: User Exception that hides the original 
exception
    ```
    
    You have a good point about the uncaught exception handler, I have added 
that back.  I wondered whether I should add those lines inside the `case t: 
Throwable if hasFetchFailure ` block, or make it a condition for the case 
itself `case t: Throwable if hasFetchFailure && !Utils.isFatalError(t)`.  I 
decided to make it part of the condition, since that is more like the old 
behavior, and a fetch failure that happens during an OOM may not be real.
    
    I also looked into adding a unit test for this handling -- it requires some 
refactoring, potentially more work than its worth, so I put it in a separate 
commit.
    
     I'd rather avoid changing the behavior for accumulators here.  
Accumulators have such weird semantics its not clear what they should do, we 
can fix that separately if we really want to.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16639: [SPARK-19276][CORE] Fetch Failure handling robust...

Reply via email to