Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/12402#discussion_r60466864
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
@@ -105,6 +108,15 @@ class GaussianMixtureModel private[ml] (
def gaussians: Array[MultivariateGaussian] = parentModel.gaussians
@Since("2.0.0")
--- End diff --
I agree it simplifies things when the return result is either a basic type
or a DataFrame. I think it makes sense for topics and synonyms, for which
there is no "natural" representation. But for distributions, there is a
natural class to provide (the MultivariateGaussian class, with its associated
methods like pdf). I'd also be OK with providing only a dataframe for now in
Python, perhaps using a method called ```gaussiansDF```.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]