Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12402#discussion_r60466864
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
    @@ -105,6 +108,15 @@ class GaussianMixtureModel private[ml] (
       def gaussians: Array[MultivariateGaussian] = parentModel.gaussians
     
       @Since("2.0.0")
    --- End diff --
    
    I agree it simplifies things when the return result is either a basic type 
or a DataFrame.  I think it makes sense for topics and synonyms, for which 
there is no "natural" representation.  But for distributions, there is a 
natural class to provide (the MultivariateGaussian class, with its associated 
methods like pdf).  I'd also be OK with providing only a dataframe for now in 
Python, perhaps using a method called ```gaussiansDF```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to