GitHub user yinxusen opened a pull request:
https://github.com/apache/spark/pull/166
[WIP] [MLLIB-28] An optimized GradientDescent implementation
New JIRA issue
[MLLIB-28](https://spark-project.atlassian.net/browse/MLLIB-28) with this pull
request bring a new implementation of `GradientDescent` named
`GradientDescentWithLocalUpdate`. The `GradientDescentWithLocalUpdate` can
outperform the original `GradientDescent` by about 1x ~ 4x without sacrificing
accuracy, and can be easily adopted by most classification and regression
algorithms in MLlib.
Parallelism of many ML algorithms are limited by the sequential updating
process of optimization algorithms they use. However, by carefully breaking the
sequential chain, the updating process can be parallelized. In
the`GradientDescentWithLocalUpdate` , we split the iteration loop into multiple
supersteps. Within each superstep, an inner loop that runs a local optimization
process is introduced into each partition. During the local optimization, only
local data points in the partition are involved. Since different partitions are
processed in parallel, the local optimization process is natually parallelized.
Then, at the end of each superstep, all the gradients and loss histories
computed from each partition are collected and merged in a bulk synchronous
manner.
Detailed experiments and results in the original [pull
request](https://github.com/apache/incubator-spark/pull/407) and
[comments](https://github.com/apache/incubator-spark/pull/407#issuecomment-33356196).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yinxusen/spark gradient-local-update
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/166.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #166
----
commit 9b8dd56663fae556549a61f265996cbd3414f35e
Author: Xusen Yin <[email protected]>
Date: 2014-03-03T10:30:30Z
add new optimizer for GradientDescent, with local updater
commit 881ea122188562830f197392addc6f69232c8736
Author: Xusen Yin <[email protected]>
Date: 2014-03-18T05:49:27Z
Merge branch 'master' into gradient-local-update
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---