GitHub user freeman-lab opened a pull request:

    https://github.com/apache/spark/pull/2942

    Streaming KMeans [MLLIB][SPARK-3254]

    This adds a Streaming KMeans algorithm to MLlib. It uses an update rule 
that generalizes the mini-batch KMeans update to incorporate a decay factor, 
which allows past data to be forgotten. The decay factor can be specified 
explicitly, or via a more intuitive "fractional decay" setting, in units of 
either data points or batches.
    
    The PR includes:
    - StreamingKMeans algorithm with decay factor settings
    - Usage example
    - Additions to documentation clustering page
    - Unit tests of basic behavior and decay behaviors
    
    @tdas @mengxr @rezazadeh

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/freeman-lab/spark streaming-kmeans

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2942.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2942
    
----
commit b93350fce951d47e50fafda5bf066d5b29fe9803
Author: freeman <[email protected]>
Date:   2014-08-28T20:32:05Z

    Streaming KMeans with decay
    
    - Used trainOn and predictOn pattern, similar to
    StreamingLinearAlgorithm
    - Decay factor can be set explicitly, or via fractional decay
    parameters expressed in units of number of batches, or number of points
    - Unit tests for basic functionality and decay settings

commit 9fd9c155e956f274237ecfda69a83576975ad8a0
Author: freeman <[email protected]>
Date:   2014-10-22T05:05:43Z

    Merge remote-tracking branch 'upstream/master' into streaming-kmeans

commit a0fd79017e74d3e4d519f507573e74d453aef0ee
Author: freeman <[email protected]>
Date:   2014-10-25T05:14:33Z

    Merge remote-tracking branch 'upstream/master' into streaming-kmeans

commit b5b5f8d41dab067c0ed5b5b9de88d7613dda84ef
Author: freeman <[email protected]>
Date:   2014-10-25T06:17:56Z

    Add better documentation

commit f33684b2e59593a71c577e48c4ab1356444c84d6
Author: freeman <[email protected]>
Date:   2014-10-25T08:03:35Z

    Add explanation and example to docs

commit 5db7074cab7663cc88feeda8f61212ade48ca9a0
Author: freeman <[email protected]>
Date:   2014-10-25T08:03:51Z

    Example usage for StreamingKMeans

commit 9facbe3ecbc14679b83053fa5f471dd50ab68fbd
Author: freeman <[email protected]>
Date:   2014-10-25T08:04:11Z

    Bug fix

commit ea9877c242b06ba690e6237f095137efc2f76faa
Author: freeman <[email protected]>
Date:   2014-10-25T08:04:24Z

    More documentation

commit 2086bdc56a29f63a1c2143f88303e1296df45260
Author: freeman <[email protected]>
Date:   2014-10-25T08:04:31Z

    Log cluster center updates

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to