Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/8427#issuecomment-174271331
  
    Just a note about MapOutputTracker - it is fairly trivial to make it use 
bare minimum amount of memory even if it does not get cleaned up for 'old' 
stages : using a disk backed map (mapdb for example) via LRU.
    Which keeps utmost current and previous map output in memory and everything 
else on disk (until there is a node failure requiring recomputation - which 
brings portions of this back into memory).
    
    This is what we used to do for production jobs in some earlier projects.
    
    
    I am not sure what the impact of the current proposal is from memory 
overhead pov  - map output was (obviously) expensive enough to attempt this and 
the affect was not pervasive/diffuse across the codebase for shuffle output 
tracking.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to