[GitHub] spark pull request: [SPARK-3376] Add in-memory shuffle option.

jerryshao Tue, 07 Apr 2015 19:12:12 -0700

Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/5403#issuecomment-90781460
  
    Thanks a lot for your reply. Just my rough thought, I think if full 
sort-based shuffle (with sort shuffle reader) enabled as 
[SPARK-2926](https://issues.apache.org/jira/browse/SPARK-2926) mentioned, the 
performance of sort-based shuffle in some cases like sort-by-key required 
(sort-merge-join) is still better than hash-based shuffle even in-memory as I 
think. But for now as you said hash-based shuffle in more better than 
sort-based shuffle for the current implementation. 
    
    Also I think if this patch focus on benchmark, we need to well tune to make 
no spill in disk, in the current implementation, there's still some spilled 
files in disk (like ExternalAppendOnlyMap), so it depends on how to say this, 
if we target on benchmark, then it would be better all the data are on memory, 
so using mem disk is the same as this solution, but probably will get better 
performance (GC issue).
    
    Just my instant thought, I've no concrete reason to debate on this, sorry 
for any misunderstanding :smiley: .



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3376] Add in-memory shuffle option.

Reply via email to