Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-152717699
@yinxusen
If you look at
https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class
I added a test for your naive bayes export.
To generate the xml I used this code:
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/resources/spark_shell_exporter/naivebayes_iris.scala
Here the xml model generated:
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/resources/exported_pmml_models/naivebayes_classification.xml
If I run the jpmml evaluation I get this exception:
java -jar
target/spark-pmml-exporter-validator-1.1.0-SNAPSHOT-jar-with-dependencies.jar
NaiveBayesClassificationModel
NaiveBayesClassificationModel selected
<code>
Exception in thread "main" java.lang.NullPointerException
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.valueOf(Double.java:502)
at org.jpmml.evaluator.TypeUtil.parseDouble(TypeUtil.java:136)
at org.jpmml.evaluator.TypeUtil.parse(TypeUtil.java:78)
at org.jpmml.evaluator.FieldValue.parseValue(FieldValue.java:107)
at org.jpmml.evaluator.FieldValue.equalsString(FieldValue.java:54)
at
org.jpmml.evaluator.NaiveBayesModelEvaluator.getTargetValueCounts(NaiveBayesModelEvaluator.java:333)
at
org.jpmml.evaluator.NaiveBayesModelEvaluator.evaluateClassification(NaiveBayesModelEvaluator.java:154)
at
org.jpmml.evaluator.NaiveBayesModelEvaluator.evaluate(NaiveBayesModelEvaluator.java:94)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:79)
at
org.selvinsource.spark_pmml_exporter_validator.SparkPMMLExporterValidator.evaluate(SparkPMMLExporterValidator.java:219)
at
org.selvinsource.spark_pmml_exporter_validator.SparkPMMLExporterValidator.evaluateMultiClassClassificationModelIris(SparkPMMLExporterValidator.java:130)
at
org.selvinsource.spark_pmml_exporter_validator.SparkPMMLExporterValidator.main(SparkPMMLExporterValidator.java:94)
</code>
I didn't look too much into the exception above, @vruusmann will probably
confirm it, but I did spot some evident issue/inconsistencies in the xml
exported.
The definition:
<code>
<DataField name="target" optype="categorical" dataType="double">
<Value value="0"/>
<Value value="1"/>
<Value value="2"/>
</DataField>
</code>
should be changed to
<code>
<DataField name="class" optype="categorical" dataType="double">
<Value value="0.0"/>
<Value value="1.0"/>
<Value value="2.0"/>
</DataField>
</code>
Consequently
<code>
<MiningField name="target" usageType="target"/>
</code>
to
<code>
<MiningField name="class" usageType="predicted"/>
</code>
While the above I don't think they cause the exception, but it would be
nice to align to the conventions used by @JasmineGeorge,
this following bit could potentially be the cause of the error:
<code>
<TargetValueCount value="target_1"
count="-0.8808827544295097"/>
</code>
should be
<code>
<TargetValueCount value="1.0"
count="-0.8808827544295097"/>
</code>
as target_1 is never defined and it should be 1.0 which is one of the class
values.
Please use the branch
https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class
to ensure the exported xml produce the correct scoring using jpmml.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]