Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/14931
  
    LGTM.
    
    There's a slight change of behavior here for the corner-case scenario where 
the worker (not executor) dies and then is immediately recovered: prior to this 
patch, I believe that the old shuffle files would continue to be served by the 
restarted worker's shuffle service, but after this patch the MapOutputTracker 
entries will have been invalidated and the driver won't ask for shuffle files 
from that worker.
    
    In terms of default / common-case behaviors, I prefer the behavior 
implemented in this patch: when a worker disappears it seems reasonable to 
treat its map outputs as missing and if the worker happens to come back later 
then it would make more sense to explicitly re-register those outputs. Even if 
a worker will be eventually recovered it might take a long time for that to 
happen, leading to long hangs.
    
    If we decide that it's important to re-register map outputs after worker 
recovery then I think we can add  that explicitly in a separate patch.
    
    I'm going to merge this to master and will evaluate backporting to 
branch-2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to