[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...

rnowling Sun, 18 Jan 2015 21:03:25 -0800

Github user rnowling commented on the pull request:

    https://github.com/apache/spark/pull/4087#issuecomment-70446766
  
    [~leahmcguire],
    
    Thanks for the patch!
    
    A few comments:
    1. PySpark calls the Scala API for MLlib, so for API compatibility, we 
can't use enumerations on the public APIs.  I suggest using a string for the 
train() functions but keeping the enumeration for the internal API.
    2. Can you create a new JIRA for updating the PySpark MLlib NB API?  I can 
post details on what needs to change there -- if you don't want to do the PR 
for that, I can.
    3. The populateMatrix function is verbose.  Breeze seems to support 
element-wise operations 
(https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet) which 
might be negate the need for the populateMatrix function.
    4. Can you update the MLlib docs in docs/mllib-naive-bayes.md ?
    
    Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...

Reply via email to