Github user mikeringenburg commented on the pull request:

    https://github.com/apache/spark/pull/5403#issuecomment-112938589
  
    We have one of the configurations to which @kayousterhout refers - a good 
deal of local memory, but no local file system, only a global parallel file 
system (Lustre).  Using Lustre for shuffle's temporary directory performs very 
poorly, and using a local ram disk is limiting due to one of the issues 
mentioned in the updated PR description - namely that shuffle data is cleaned 
up very slowly, meaning that we may run out of memory after a number of 
iterations.
    
    Thus, my feeling is that perhaps finding a way to more aggressively clean 
up the shuffle data might be a bigger priority - it would make something like 
this PR more suitable for production, and would also make using a ram disk for 
shuffle data more viable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to