Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2906#discussion_r19288355
  
    --- Diff: docs/mllib-clustering.md ---
    @@ -153,3 +157,152 @@ provided in the [Self-Contained 
Applications](quick-start.html#self-contained-ap
     section of the Spark
     Quick Start guide. Be sure to also include *spark-mllib* to your build 
file as
     a dependency.
    +
    +
    +### Hierarchical Clustering
    +
    +MLlib supports
    +[hierarchical 
clustering](http://en.wikipedia.org/wiki/Hierarchical_clustering), one of the 
most commonly used clustering algorithm which seeks to build a hierarchy of 
clusters.
    +Strategies for hierarchical clustering generally fall into two types.
    +One is the agglomerative clustering which is a "bottom up" approach: each 
observation starts in its own cluster, and pairs of clusters are merged as one 
moves up the hierarchy.
    +The other is the divisive clustering which is a "top down" approach: all 
observations start in one cluster, and splits are performed recursively as one 
moves down the hierarchy.
    +The MLlib implementation only includes a divisive hierarchical clustering 
algorithm.
    +
    +The implementation in MLlib has the following parameters:
    +
    +* *k* is the number of maximum desired clusters. 
    +* *subIterations* is the maximum number of iterations to split a cluster 
to its 2 sub clusters.
    +* *numRetries* is the maximum number of retries if a splitting doesn't 
work as expected.
    +* *epsilon* determines the saturate threshold to consider the splitting to 
have converged.
    +
    +
    +
    +### Hierarchical Clustering Example
    +
    +<div class="codetabs">
    +
    +<div data-lang="scala" markdown="1">
    +The following code snippets can be executed in `spark-shell`.
    +
    +In the following example after loading and parsing data, 
    +we use the hierarchical clustering object to cluster the sample data into 
three clusters. 
    +The number of desired clusters is passed to the algorithm. 
    +Hoerver, even though the number of clusters is less than *k* in the middle 
of the clustering,
    --- End diff --
    
    Horever -> However, and 'not be splitted' -> 'not be split'


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to