Github user suyanNone commented on the pull request:
https://github.com/apache/spark/pull/4055#issuecomment-93883077
@andrewor14 that problem occurs while have stage-retry.
Our user had been meet that problem under having a Executor Lost because of
killed by yarn or sth, while we not fix with that patch.
as I refered before:
because, because Task not override hashcode and equal, so same partition
task in different TaskSet is different task. and `pendingTask` is clear when
retry map stage, so the `pendingTask` is always for the new retry TaskSet.
then the previous taskset complete some task which have same partition in
the latest taskSet.
and we use `stage.numAvailableOutputs` to decide if we submit next stage or
not.
use `stage.pengdingTask` to decide if we register MapOutputTracker or not
wihile we deal with task finished event,
`stage.pengdingTask -= task` not affect anything. but it affect
`stage.numAvailableOutputs`, because it just identified by `partitionId`.
So it may result in some stage have submit while his dependency map stage
have not registered its output in MapOutputTracker.
it will cause endless stage-retry...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]