Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21577
  
    A few notes about the latest updates:
    
    - I reverted the `TaskCommitDenied` changes so that this patch can be 
backported more easily. I'm not against the change but I think it'd be better 
if we made it only in master, so we can postpone it. The information there is 
also not critical, since the end reason is generally attached to a task, which 
has the needed info anyway.
    
    - I fixed the issue Mridul brought up, but I think the race that Tom 
describes still exists. I'm just not sure it would cause problems, since as far 
as I can tell it can only happen in a map stage, not a result stage.
    
    The test I added can sort of illustrate that if you look at what happens. 
There are two stages (map stage 2, result stage 3), and the fetch failure 
causes a retry of stage 3 *plus* a resubmission of stage 2 - which means that 
stage 2 is starting in the committer with a fresh list of committers.
    
    So if there's still a speculative task from stage 2 that hasn't been 
properly killed, it might be allowed to commit. But this being a map stage, I 
assume the map output tracker would take care of filtering out duplicates?
    
    That's obviously really hard to hit, but if it can be an issue we could 
look at it separately.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to