Github user rnowling commented on the pull request:
https://github.com/apache/spark/pull/4087#issuecomment-70446766
[~leahmcguire],
Thanks for the patch!
A few comments:
1. PySpark calls the Scala API for MLlib, so for API compatibility, we
can't use enumerations on the public APIs. I suggest using a string for the
train() functions but keeping the enumeration for the internal API.
2. Can you create a new JIRA for updating the PySpark MLlib NB API? I can
post details on what needs to change there -- if you don't want to do the PR
for that, I can.
3. The populateMatrix function is verbose. Breeze seems to support
element-wise operations
(https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet) which
might be negate the need for the populateMatrix function.
4. Can you update the MLlib docs in docs/mllib-naive-bayes.md ?
Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]