[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

mengxr Tue, 28 Oct 2014 10:55:36 -0700

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2942#discussion_r19490145
  
    --- Diff: docs/mllib-clustering.md ---
    @@ -153,3 +153,75 @@ provided in the [Self-Contained 
Applications](quick-start.html#self-contained-ap
     section of the Spark
     Quick Start guide. Be sure to also include *spark-mllib* to your build 
file as
     a dependency.
    +
    +## Streaming clustering
    +
    +When data arrive in a stream, we may want to estimate clusters 
dynamically, updating them as new data arrive. MLlib provides support for 
streaming KMeans clustering, with parameters to control the decay (or 
"forgetfulness") of the estimates. The algorithm uses a generalization of the 
mini-batch KMeans update rule. For each batch of data, we assign all points to 
their nearest cluster, compute new cluster centers, then update each cluster 
using:
    --- End diff --
    
    1. line too wide
    2. `KMeans` -> `k-means`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

Reply via email to