[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...

kayousterhout Mon, 27 Mar 2017 18:09:45 -0700

Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/17297
  
    To recap the issue that Imran and I discussed here, I think it can be 
summarized as follows:
    
    - A Fetch Failure happens at some time t and indicates that the map output 
on machine M has been lost
    - Consider some running task that's read x map outputs and still needs to 
process y map outputs
    - Scenario A: (PRO of this PR) If the output from M was in the x outputs 
that are already read, we should keep running the task (as this PR does), 
because the task already successfully fetched the output from the failed 
machine. We don't do this currently, meaning we're throwing away the wasted 
work.
    - Scenario B: (CON of this PR) If the output from M was in the y outputs 
that have not yet been read, then we should cancel the task, because the task 
won't learn about the new location for the re-generated output of M (IIUC, 
there's no functionality to do this now) so is going to fail later on.  The 
current code will re-run the task, which is what we should do.  This code will 
try to re-use the old task, which means the job will take longer to run because 
the task will fail later on and need to be re-started.
    
    If my description above is correct, then this PR is assuming that scenario 
A is more likely than scenario B, but it seems to me that these two scenarios 
are equally likely (in which case this PR provides no net benefit).  
@sitalkedia what are your thoughts here / did I miss something in my 
description above?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Reply via email to