GitHub user yinxusen opened a pull request:

    https://github.com/apache/spark/pull/166

    [WIP] [MLLIB-28] An optimized GradientDescent implementation 

    New JIRA issue 
[MLLIB-28](https://spark-project.atlassian.net/browse/MLLIB-28) with this pull 
request bring a new implementation of `GradientDescent` named 
`GradientDescentWithLocalUpdate`. The `GradientDescentWithLocalUpdate` can 
outperform the original `GradientDescent` by about 1x ~ 4x without sacrificing 
accuracy, and can be easily adopted by most classification and regression 
algorithms in MLlib.
    
    Parallelism of many ML algorithms are limited by the sequential updating 
process of optimization algorithms they use. However, by carefully breaking the 
sequential chain, the updating process can be parallelized. In 
the`GradientDescentWithLocalUpdate` , we split the iteration loop into multiple 
supersteps. Within each superstep, an inner loop that runs a local optimization 
process is introduced into each partition. During the local optimization, only 
local data points in the partition are involved. Since different partitions are 
processed in parallel, the local optimization process is natually parallelized. 
Then, at the end of each superstep, all the gradients and loss histories 
computed from each partition are collected and merged in a bulk synchronous 
manner.
    
    Detailed experiments and results in the original [pull 
request](https://github.com/apache/incubator-spark/pull/407) and 
[comments](https://github.com/apache/incubator-spark/pull/407#issuecomment-33356196).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yinxusen/spark gradient-local-update

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/166.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #166
    
----
commit 9b8dd56663fae556549a61f265996cbd3414f35e
Author: Xusen Yin <[email protected]>
Date:   2014-03-03T10:30:30Z

    add new optimizer for GradientDescent, with local updater

commit 881ea122188562830f197392addc6f69232c8736
Author: Xusen Yin <[email protected]>
Date:   2014-03-18T05:49:27Z

    Merge branch 'master' into gradient-local-update

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to