Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/2679#issuecomment-58939148
  
    @uncleGen We implemented in-memory shuffle for benchmarking in 
5ec645d945783457baed9e151337b2735c1b307f around the time of Spark 0.9.1 but 
didn't merge it back, and Spark's shuffle code has changed substantially so the 
patch no longer applies. It simply stores shuffle output blocks in the 
BlockManager at an in-memory level rather than writing them to disk. I [ported 
it forward](https://github.com/ankurdave/spark/compare/in-memory-shuffle-3) a 
while ago, but it has again fallen a bit out of date.
    
    The main problem with merging it for general use is that it doesn't handle 
the memory-constrained case and may cause blocks of important RDDs to be 
dropped. To avoid this it requires the user to explicitly clear the shuffle 
outputs from memory, such as after each iteration in Pregel, though this 
interferes with fault recovery.
    
    It might be possible to solve this by introducing block-level priorities to 
the block manager to prevent it from dropping RDD blocks and maybe by 
asynchronously spilling the shuffle outputs to disk.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to