[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

jkbradley Tue, 30 Dec 2014 13:04:55 -0800

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/3022#issuecomment-68398244
  
    @tgaloppo @FlytxtRnD  I made some JIRAs for the to-do items above.
    
    I'd say the most important are:
    * [Change predictMembership() to take an RDD, not the 
GMM.](https://issues.apache.org/jira/browse/SPARK-5020)
     * I did not notice that it took all of the GMM parameters.  It should be 
renamed and made internal, and a wrapper method predictMembership() should take 
an RDD only.
    * [Make MultivariateGaussian 
public](https://issues.apache.org/jira/browse/SPARK-5018)
    * [Update GMM API to use MultivariateGaussian instead of means, 
covariances](https://issues.apache.org/jira/browse/SPARK-5019)
    * (The Python API and user guide JIRAs from @mengxr should also be in this 
list.)
    
    It would be great to do:
    * [SVD for Gaussian 
initialization](https://issues.apache.org/jira/browse/SPARK-5017)
    
    Some less critical ones are:
    * [random seed](https://issues.apache.org/jira/browse/SPARK-5015)
    * [If numFeatures or k are large, distribute matrix inverses for Gaussian 
initialization.](https://issues.apache.org/jira/browse/SPARK-5016)
    * [Be faster for SparseVector 
inputs](https://issues.apache.org/jira/browse/SPARK-5021)
    
    I removed the NAN JIRAs, but we should investigate numerical stability at 
some point.
    
    Please let me know if you'd like any assigned to you, and thanks in advance 
for your work on this!  If I'm able to work on one of the JIRAs, I'll make a 
note on the JIRA page.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

Reply via email to