[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...

shivaram Fri, 05 Jun 2015 14:05:39 -0700

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/6652#issuecomment-109439594
  
    @ash211 Yeah, so there are two sets of experiments that I've done which 
show benefits with this. 
    
    1. The first is in the [OSDI 2014 
paper](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-venkataraman.pdf).
 There are a bunch of things in that paper, but Figure 19 which compares 
Baseline to KMN-M/K=1.0 is very similar to the current patch.
    
    2. I've run a bunch of machine learning / linear algebra workloads which 
use aggregation trees implemented as a series of shuffles. In these scenarios 
you get ~50% reduction in data sent over the network for each stage of the 
reduction. Here is an example of a time-breakdown diagram 
[before](https://www.dropbox.com/s/vjkcu3bc9cscelr/120_waterfall.pdf?dl=0) and 
[after](https://www.dropbox.com/s/820vlt4xuefvr7f/1016_waterfall.pdf?dl=0). In 
the second case here we are able to completely avoid network transfer in the 
second stage by aggregating across multiple partitions in the same machine.
    
    3. @JoshRosen also included a couple of other papers which have done this 
in [the 
JIRA](https://issues.apache.org/jira/browse/SPARK-2774?focusedCommentId=14081300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14081300)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...

Reply via email to