Github user freeman-lab commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2906#discussion_r22634674
  
    --- Diff: docs/mllib-clustering.md ---
    @@ -154,6 +156,175 @@ section of the Spark
     Quick Start guide. Be sure to also include *spark-mllib* to your build 
file as
     a dependency.
     
    +
    +### Hierarchical Clustering
    +
    +MLlib supports
    +[hierarchical 
clustering](http://en.wikipedia.org/wiki/Hierarchical_clustering), one of the 
most commonly used clustering algorithm which seeks to build a hierarchy of 
clusters.
    +Strategies for hierarchical clustering generally fall into two types.
    +One is the agglomerative clustering which is a "bottom up" approach: each 
observation starts in its own cluster, and pairs of clusters are merged as one 
moves up the hierarchy.
    +The other is the divisive clustering which is a "top down" approach: all 
observations start in one cluster, and splits are performed recursively as one 
moves down the hierarchy.
    +The MLlib implementation only includes a divisive hierarchical clustering 
algorithm.
    +
    +The implementation in MLlib has the following parameters:
    +
    +* *k* is the number of maximum desired clusters. 
    +* *subIterations* is the maximum number of iterations to split a cluster 
to its 2 sub clusters.
    +* *numRetries* is the maximum number of retries if a splitting doesn't 
work as expected.
    +* *epsilon* determines the saturate threshold to consider the splitting to 
have converged.
    +
    +
    +
    +### Hierarchical Clustering Example
    +
    +<div class="codetabs">
    +
    +<div data-lang="scala" markdown="1">
    +The following code snippets can be executed in `spark-shell`.
    +
    +In the following example after loading and parsing data, 
    +we use the hierarchical clustering object to cluster the sample data into 
three clusters. 
    --- End diff --
    
    Clarify that this means three clusters at the bottom-most levels of a 
hierarchical tree.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to