Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/10244#discussion_r47393760
--- Diff: docs/ml-clustering.md ---
@@ -11,6 +11,77 @@ In this section, we introduce the pipeline API for
[clustering in mllib](mllib-c
* This will become a table of contents (this text will be scraped).
{:toc}
+## K-means
+
+K-means clustering with support for multiple parallel runs and a k-means++
like initialization mode
+(the k-means|| algorithm by Bahmani et al). When multiple concurrent runs
are requested,they are
+executed together with joint passes over the data for efficiency.
+
+`KMeans` is implemented as an `Estimator` and generates a `KMeansModel` as
the base models.
--- End diff --
"models" --> "model"
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]