Github user selvinsource commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-152717699
  
    @yinxusen 
    If you look at
    
https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class
    I added a test for your naive bayes export.
    
    To generate the xml I used this code:
    
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/resources/spark_shell_exporter/naivebayes_iris.scala
    
    Here the xml model generated:
    
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/resources/exported_pmml_models/naivebayes_classification.xml
    
    If I run the jpmml evaluation I get this exception:
    java -jar 
target/spark-pmml-exporter-validator-1.1.0-SNAPSHOT-jar-with-dependencies.jar 
NaiveBayesClassificationModel
    NaiveBayesClassificationModel selected
    <code>
    Exception in thread "main" java.lang.NullPointerException
        at 
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at java.lang.Double.valueOf(Double.java:502)
        at org.jpmml.evaluator.TypeUtil.parseDouble(TypeUtil.java:136)
        at org.jpmml.evaluator.TypeUtil.parse(TypeUtil.java:78)
        at org.jpmml.evaluator.FieldValue.parseValue(FieldValue.java:107)
        at org.jpmml.evaluator.FieldValue.equalsString(FieldValue.java:54)
        at 
org.jpmml.evaluator.NaiveBayesModelEvaluator.getTargetValueCounts(NaiveBayesModelEvaluator.java:333)
        at 
org.jpmml.evaluator.NaiveBayesModelEvaluator.evaluateClassification(NaiveBayesModelEvaluator.java:154)
        at 
org.jpmml.evaluator.NaiveBayesModelEvaluator.evaluate(NaiveBayesModelEvaluator.java:94)
        at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:79)
        at 
org.selvinsource.spark_pmml_exporter_validator.SparkPMMLExporterValidator.evaluate(SparkPMMLExporterValidator.java:219)
        at 
org.selvinsource.spark_pmml_exporter_validator.SparkPMMLExporterValidator.evaluateMultiClassClassificationModelIris(SparkPMMLExporterValidator.java:130)
        at 
org.selvinsource.spark_pmml_exporter_validator.SparkPMMLExporterValidator.main(SparkPMMLExporterValidator.java:94)
    </code>
    
    I didn't look too much into the exception above, @vruusmann will probably 
confirm it, but I did spot some evident issue/inconsistencies in the xml 
exported.
    
    The definition:
    <code>
            <DataField name="target" optype="categorical" dataType="double">
                <Value value="0"/>
                <Value value="1"/>
                <Value value="2"/>
            </DataField>
    </code>
    should be changed to
    <code>
            <DataField name="class" optype="categorical" dataType="double">
                <Value value="0.0"/>
                <Value value="1.0"/>
                <Value value="2.0"/>
            </DataField>
    </code>
    Consequently 
    <code>
                <MiningField name="target" usageType="target"/>
    </code>
    to
    <code>
                <MiningField name="class" usageType="predicted"/>
    </code>
    
    While the above I don't think they cause the exception, but it would be 
nice to align to the conventions used by @JasmineGeorge,
    this following bit could potentially be the cause of the error:
    <code>
                            <TargetValueCount value="target_1" 
count="-0.8808827544295097"/>
    </code>
    should be
    <code>
                            <TargetValueCount value="1.0" 
count="-0.8808827544295097"/>
    </code>
    as target_1 is never defined and it should be 1.0 which is one of the class 
values.
    
    Please use the branch 
https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class
 to ensure the exported xml produce the correct scoring using jpmml.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to