GitHub user mengxr opened a pull request:

    https://github.com/apache/spark/pull/506

    [SPARK-1485][MLLIB] Implement Butterfly AllReduce

    The current implementations of machine learning algorithms rely on the 
driver for some computation and data broadcasting. This may create a bottleneck 
at the driver for both computation and communication, especially in multi-model 
training. An efficient implementation of AllReduce can help free up the driver. 
This PR contains a simple butterfly AllReduce implementation. Compared it with 
reduce + broadcast (http) on a 16-node EC2 cluster (with slow connection), and 
saw 2x speed-up on vectors of size 1k to 10m.
    
    Possible improvements:
    
    1. Each executor only needs one copy.
    2. Better handling when the number of partitions is not a power of two?  

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mengxr/spark butterfly

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/506.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #506
    
----
commit 76f4bb7b29e0b605561dac048535380b645839fd
Author: Xiangrui Meng <[email protected]>
Date:   2014-04-18T04:43:32Z

    init impl of allReduce

commit d14300540da65900a2a19f8edc5adc5ce9c3e72d
Author: Xiangrui Meng <[email protected]>
Date:   2014-04-22T21:11:25Z

    move allReduce to mllib

commit 98c329d5ac12ec341a9cf6355cd67ea031189a24
Author: Xiangrui Meng <[email protected]>
Date:   2014-04-23T09:44:51Z

    allow arbitrary number of partitions

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to