GitHub user freeman-lab opened a pull request:
https://github.com/apache/spark/pull/2942
Streaming KMeans [MLLIB][SPARK-3254]
This adds a Streaming KMeans algorithm to MLlib. It uses an update rule
that generalizes the mini-batch KMeans update to incorporate a decay factor,
which allows past data to be forgotten. The decay factor can be specified
explicitly, or via a more intuitive "fractional decay" setting, in units of
either data points or batches.
The PR includes:
- StreamingKMeans algorithm with decay factor settings
- Usage example
- Additions to documentation clustering page
- Unit tests of basic behavior and decay behaviors
@tdas @mengxr @rezazadeh
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/freeman-lab/spark streaming-kmeans
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2942.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2942
----
commit b93350fce951d47e50fafda5bf066d5b29fe9803
Author: freeman <[email protected]>
Date: 2014-08-28T20:32:05Z
Streaming KMeans with decay
- Used trainOn and predictOn pattern, similar to
StreamingLinearAlgorithm
- Decay factor can be set explicitly, or via fractional decay
parameters expressed in units of number of batches, or number of points
- Unit tests for basic functionality and decay settings
commit 9fd9c155e956f274237ecfda69a83576975ad8a0
Author: freeman <[email protected]>
Date: 2014-10-22T05:05:43Z
Merge remote-tracking branch 'upstream/master' into streaming-kmeans
commit a0fd79017e74d3e4d519f507573e74d453aef0ee
Author: freeman <[email protected]>
Date: 2014-10-25T05:14:33Z
Merge remote-tracking branch 'upstream/master' into streaming-kmeans
commit b5b5f8d41dab067c0ed5b5b9de88d7613dda84ef
Author: freeman <[email protected]>
Date: 2014-10-25T06:17:56Z
Add better documentation
commit f33684b2e59593a71c577e48c4ab1356444c84d6
Author: freeman <[email protected]>
Date: 2014-10-25T08:03:35Z
Add explanation and example to docs
commit 5db7074cab7663cc88feeda8f61212ade48ca9a0
Author: freeman <[email protected]>
Date: 2014-10-25T08:03:51Z
Example usage for StreamingKMeans
commit 9facbe3ecbc14679b83053fa5f471dd50ab68fbd
Author: freeman <[email protected]>
Date: 2014-10-25T08:04:11Z
Bug fix
commit ea9877c242b06ba690e6237f095137efc2f76faa
Author: freeman <[email protected]>
Date: 2014-10-25T08:04:24Z
More documentation
commit 2086bdc56a29f63a1c2143f88303e1296df45260
Author: freeman <[email protected]>
Date: 2014-10-25T08:04:31Z
Log cluster center updates
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]