Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/5403#issuecomment-90781460
Thanks a lot for your reply. Just my rough thought, I think if full
sort-based shuffle (with sort shuffle reader) enabled as
[SPARK-2926](https://issues.apache.org/jira/browse/SPARK-2926) mentioned, the
performance of sort-based shuffle in some cases like sort-by-key required
(sort-merge-join) is still better than hash-based shuffle even in-memory as I
think. But for now as you said hash-based shuffle in more better than
sort-based shuffle for the current implementation.
Also I think if this patch focus on benchmark, we need to well tune to make
no spill in disk, in the current implementation, there's still some spilled
files in disk (like ExternalAppendOnlyMap), so it depends on how to say this,
if we target on benchmark, then it would be better all the data are on memory,
so using mem disk is the same as this solution, but probably will get better
performance (GC issue).
Just my instant thought, I've no concrete reason to debate on this, sorry
for any misunderstanding :smiley: .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]