Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13291#discussion_r64946619
--- Diff: python/pyspark/ml/clustering.py ---
@@ -64,6 +64,21 @@ class GaussianMixture(JavaEstimator, HasFeaturesCol,
HasPredictionCol, HasMaxIte
.. note:: Experimental
GaussianMixture clustering.
+ This class performs expectation maximization for multivariate Gaussian
+ Mixture Models (GMMs). A GMM represents a composite distribution of
+ independent Gaussian distributions with associated "mixing" weights
+ specifying each's contribution to the composite.
+
+ Given a set of sample points, this class will maximize the
log-likelihood
+ for a mixture of k Gaussians, iterating until the log-likelihood
changes by
+ less than convergenceTol, or until it has reached the max number of
iterations.
+ While this process is generally guaranteed to converge, it is not
guaranteed
+ to find a global optimum.
+
+ Note: For high-dimensional data (with many features), this algorithm
may perform poorly.
--- End diff --
Sounds like a good plan, once my other doc change PRs are in I'll go
through the PyDoc again and see if anything looks oddly formatted.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]