GitHub user rnowling opened a pull request:

    https://github.com/apache/spark/pull/1248

    [SPARK-2308][MLLIB] Add Mini-Batch KMeans Clustering method

    Mini-batch is a version of KMeans that uses a randomly-sampled subset of 
the data points in each iteration instead of the full set of data points, 
improving performance (and in some cases, accuracy). The mini-batch version is 
compatible with the KMeans|| initialization algorithm currently implemented in 
MLlib.
    
    This PR adds the KMeansMiniBatch clustering algorithm, tests, and updates 
docs.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rnowling/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1248.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1248
    
----
commit d56aa5b22829c47d7be5c6f9c3483209502c84cc
Author: RJ Nowling <[email protected]>
Date:   2014-06-27T18:31:22Z

    Added KMeansMiniBatch implementation

commit 54fabe1c7b158c64d860151ca77a410df66a6ac7
Author: RJ Nowling <[email protected]>
Date:   2014-06-27T18:36:47Z

    Updated KMeansMiniBatch docs

commit 2afee1af31aeb4a542ff24628f4ed89d46e3a06f
Author: RJ Nowling <[email protected]>
Date:   2014-06-27T18:49:05Z

    Added KMeansMiniBatch to docs

commit 0853adbf55a7452e1804d722b133b002d5c0ff19
Author: RJ Nowling <[email protected]>
Date:   2014-06-27T19:49:11Z

    Added overloaded alternative for train()

commit fc472ca867fbe2475cbd402f32a78c1e5cb3f060
Author: RJ Nowling <[email protected]>
Date:   2014-06-27T19:49:43Z

    Added KMeansMiniBatchSuite test

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to