[GitHub] spark pull request: [SPARK-3376] Add in-memory shuffle option.

pwendell Sat, 13 Jun 2015 09:34:40 -0700

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/5403#issuecomment-111727508
  
    Hey All,
    
    I would like to close this issue pending some further discussion, maybe 
offline. The main reason is that people keep asking me why we aren't merging 
in-memory shuffle into Spark when they see this patch, even though clearly the 
current patch here is not intended as a productionized implementation (but 
there is no such indication in the title, and it's targeted at SPARK-3376 which 
asks for a memory shuffle for production workloads).
    
    In terms of whether to have this or not. A key question IMO is whether disk 
plays a major performance role in shuffle write when workloads do fit in the 
disk buffer cache (these are the same workloads that would be optimized by 
memory shuffle). So it would be cool to see some results on that. I think 
probably it makes sense to just get together offline and discuss it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3376] Add in-memory shuffle option.

Reply via email to