Github user squito commented on the pull request:
https://github.com/apache/spark/pull/7699#issuecomment-134762560
@markhamstra @mateiz thanks for taking a look, I think I've addressed your
concerns.
However, the last round of comments made me realize that there is probably
still an issue -- after we register the map output for stage 1, and start
executing stage 2, I think we'll still have a pending task set for stage 1 that
is non-zombie. You'll probably get pretty confusing behavior if you still see
lots of tasks completing for stage 1, and you're very likely to run into
[SPARK-8029](https://issues.apache.org/jira/browse/SPARK-8029). On one hand,
we can't eliminate this completely, since both attempts can be running the same
partition at the same time (so no matter what SPARK-8029 is a possibility).
But I feel like we should at least mark the attempt as zombie to avoid running
even more tasks, just to reduce the possibility, make the output a little more
understandable, and also avoid wasting resources by running tasks that aren't
needed.
I think testing that is going to be a little tricky, since it involves
interaction between `DAGScheduler` and `TAskSetManager` that isn't possible
with the current way we've got tests setup in `DAGSchedulerSuite`. So I'd like
to tackle this in a separate task, since I think this is a strict improvement
in any case. I should be able to look at that right away, so shouldn't be
putting it off indefinitely. Thoughts?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]