Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/12436
>> Also, separately from what approach is used, how do you deal with the
following: suppose map task 1 loses its output (e.g., the reducer where that
task is located dies). Now, suppose reduce task A gets a fetch failure for map
task 1, triggering map task 1 to be re-run. Meanwhile, reduce task B is still
running. Now the re-run map task 1 completes and the scheduler launches the
reduce phase again. Suppose after that happens, task B fails (this is the old
task B, that started before the fetch failure) because it can't get the data
from map task 1, but that's because it still has the old location for map task
1. My understanding is that, with the current code, that would cause the map
stage to get re-triggered again, but really, reduce task B should be re-started
with the correct location for the output from map 1.
@kayousterhout -How do you think we can handle this issue?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]