Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/5403#issuecomment-111727508
Hey All,
I would like to close this issue pending some further discussion, maybe
offline. The main reason is that people keep asking me why we aren't merging
in-memory shuffle into Spark when they see this patch, even though clearly the
current patch here is not intended as a productionized implementation (but
there is no such indication in the title, and it's targeted at SPARK-3376 which
asks for a memory shuffle for production workloads).
In terms of whether to have this or not. A key question IMO is whether disk
plays a major performance role in shuffle write when workloads do fit in the
disk buffer cache (these are the same workloads that would be optimized by
memory shuffle). So it would be cool to see some results on that. I think
probably it makes sense to just get together offline and discuss it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]