Github user acidghost commented on the pull request:

    https://github.com/apache/spark/pull/6761#issuecomment-115555334
  
    I found that e1071 uses a Gaussian distribution ([page 
34](http://cran.r-project.org/web/packages/e1071/e1071.pdf)), so I wouldn't use 
the results from that package.
    
    The mllib predictions test (sum to one and more that 80% correct 
predictions) both pass for Bernoulli and Multinomial.
    
    Comparing the scikit and mllib probabilities I have a stable result (all 
matches) only with the Bernoulli. With the Multinomial I get different results 
at every run. If I could use another library to compute the probabilities, I 
would compare those with the mllib ones, as you suggest. Do you know any with 
both Bernoulli and Multinomial models?
    
    Anyway is strange that only the Multinomial results are wrong. Might it be 
that the data generation function for Multinomial data is more random? Or is it 
the prediction algorithm?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to