Github user suyanNone commented on the pull request:

    https://github.com/apache/spark/pull/4055#issuecomment-93883077
  
    @andrewor14 that problem occurs while have stage-retry.
    Our user had been meet that problem under having a Executor Lost because of 
killed by yarn or sth, while we not fix with that patch.
    
    as I refered before:
    because, because Task not override hashcode and equal, so same partition 
task in different TaskSet is different task. and `pendingTask` is clear when 
retry map stage, so the `pendingTask` is always for the new retry TaskSet. 
    then the previous taskset complete some task which have same partition in 
the latest taskSet. 
    
    and we use `stage.numAvailableOutputs` to decide if we submit next stage or 
not.
    use `stage.pengdingTask` to decide if we register MapOutputTracker or not
    
    wihile we deal with task finished event,
    `stage.pengdingTask -= task` not affect anything. but it affect 
`stage.numAvailableOutputs`, because it just identified by `partitionId`. 
    So it may result in some stage have submit while his dependency map stage 
have not registered its output in MapOutputTracker.
    
    it will cause endless stage-retry...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to