Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    >> Also, separately from what approach is used, how do you deal with the 
following: suppose map task 1 loses its output (e.g., the reducer where that 
task is located dies). Now, suppose reduce task A gets a fetch failure for map 
task 1, triggering map task 1 to be re-run. Meanwhile, reduce task B is still 
running. Now the re-run map task 1 completes and the scheduler launches the 
reduce phase again. Suppose after that happens, task B fails (this is the old 
task B, that started before the fetch failure) because it can't get the data 
from map task 1, but that's because it still has the old location for map task 
1. My understanding is that, with the current code, that would cause the map 
stage to get re-triggered again, but really, reduce task B should be re-started 
with the correct location for the output from map 1.
    
    @kayousterhout  -How do you think we can handle this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to