[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

yinxusen Tue, 10 Nov 2015 20:42:07 -0800

Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-155661487
  
    @selvinsource Sorry for taking too long a time. I check the code and 
generated XML file carefully. The null pointer is caused by a mistake that I 
process continuous features into categorical ones.
    
    Actually, the naive bayes model generated in multinomial distribution 
should be treated as continuous features, and we should use 
    
    ```
    Continuous Input3   i3      mean[i3,t1],variance[i3,t1]     
mean[i3,t2],variance[i3,t2]     mean[i3,t3],variance[i3,t3]
    ```
    
    to generate the XML file, other than categorical ones.
    
    For model generated in Bernoulli way, we should treat its features 
categorically. I.e. use 
    
    ```
    Discrete Input2     i21     count[i21,t1]   count[i21,t2]   count[i21,t3]   
...
    i22 count[i22,t1]   count[i22,t2]   count[i22,t3]   ...
    i23 count[i23,t1]   count[i23,t2]   count[i23,t3]   ...
    ... ...     ...     ...
    ```




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Reply via email to