Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9397#issuecomment-160121192
@dbtsai thanks for the suggestion, rebasing from master seems to have fixed
it.
---
If your project is set up for it, you can reply to this email and have your
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9397#issuecomment-159380353
@dbtsai any advice on why it is failing? All the pmml tests passed and that
is the only thing I changed.
---
If your project is set up for it, you can reply
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156852089
@yinxusen
https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class
I tested both multinomial and bernoulli
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156492495
@yinxusen I will check out your branch and do some testing as well using
the validator.
From what I can see the exported xml seems correct :+1: .
---
If your
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156044525
@yinxusen for multinomial naive Bayes you could still use the inputs as
discrete as they should be frequency of the terms accordingly to the
documentation
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9397#issuecomment-155203822
@dbtsai I don't think the issue with failed Spark test is to do with my
code.
---
If your project is set up for it, you can reply to this email and have
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9397#issuecomment-153905098
@dbtsai
how do I fix the following?
```
[info] spark-mllib: found 1 potential binary incompatibilities (filtered 51)
[error] * class
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9397#issuecomment-153474883
Here the diff between the two files (BinaryClass vs Class):
http://www.mergely.com/w7ufbahQ/
Practically the difference between the Binary and the Class
GitHub user selvinsource opened a pull request:
https://github.com/apache/spark/pull/9397
[SPARK-7272] [MLLIB] PMML export for Logistic Regression Multiclass
Classification
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-152717699
@yinxusen
If you look at
https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class
I added a test for your
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-152173191
@JasmineGeorge, it would be great if you can add a test for the validator
to ensure the exported xml file can be loaded in JPMML and score the same
results
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-152340315
I will do it, no prob.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/7842#issuecomment-143407247
Thanks, looks good to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-143408820
@tvmanikandan
we are adding decision tree PMML support for multiclass classification, see
https://issues.apache.org/jira/browse/SPARK-8542.
Linear
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/7842#issuecomment-141826772
Exported models (regression and classification trees) look good now: the
validator generates the same results as spark.
Some minor comments:
- I would
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/7842#issuecomment-140540576
Yeah, I am planning to review what @JasmineGeorge did soon, I have been a
bit busy lately.
---
If your project is set up for it, you can reply to this email
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/7842#issuecomment-137925061
Also, did you verify the evaluation is correct by running
java -jar
target/spark-pmml-exporter-validator-1.1.0-SNAPSHOT-jar-with-dependencies.jar
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/7842#issuecomment-137924965
As explained in details above, the order should be the same as the input
vector (used for training the model).
If you look at
https://github.com
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/7842#issuecomment-133817289
Here some initial results from my tests (trying to load the exported xml
into JPMML for evaluation).
**Node with no predicate**
If you look
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/7842#discussion_r37694757
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/pmml/export/PMMLModelExportFactory.scala
---
@@ -55,6 +53,7 @@ private[mllib] object
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/7842#discussion_r37694766
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/pmml/export/DecisionTreePMMLModelExportSuite.scala
---
@@ -0,0 +1,364 @@
+/*
+ * Licensed
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/7842#discussion_r37694826
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/pmml/export/TreeModelUtils.scala ---
@@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/7842#discussion_r37694820
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/pmml/export/TreeModelUtils.scala ---
@@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/7842#discussion_r37694832
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/pmml/export/TreeModelUtils.scala ---
@@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/7842#issuecomment-133677161
As overall comment, to add to my specific comments above, I would say we
should create the field names once (possibly in the data dictionary as data
fields) and re
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/6219#issuecomment-103163614
Great!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/6219#issuecomment-102791867
The contribution is my original work and I license the work to the project
under the project's open source license.
---
If your project is set up for it, you can
GitHub user selvinsource opened a pull request:
https://github.com/apache/spark/pull/6219
Mllib pmml model export spark 7272
https://issues.apache.org/jira/browse/SPARK-7272
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-98086459
@mengxr jpmml evaluator submitted to http://spark-packages.org
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-97761392
@mengxr thanks for merging this PR!
I will look into submitting the evaluator at spark-packages if you think it
will be useful, I used it as side project
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-97341602
@mengxr updated based on your latest comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-96950208
@mengxr please review, it should work as expected now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-96516889
@mengxr for SVM, I manually tried what you suggested and it looks good.
I loaded the example below in JPMML and evaluated it as Classification map,
indeed
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-96544425
For binary logistic regression, using the same principle (intercept as
threshold), doing some maths, we could set:
`intercept = -ln(1/threshold - 1
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-96549372
Here my thinking.
When normalizationMethod = logit, the predicted value is computed as
`pj = 1 / ( 1 + exp( -yj ) )`
When we set `intercept = 0
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-96157430
@mengxr
3)
I have done point 3, if it is a multinomial logistic regression the export
will cause an IllegalArgumentException. I didn't realize now Logistic
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-95099568
@srowen that sounds good :+1:
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-95082764
@mengxr code updated, style should be consistent :)!
@vruusmann thanks for your test project, I did the same last night (mvn
dependency:tree) to find out
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-95086612
@srowen
https://github.com/apache/spark/blob/master/NOTICE already has a reference
to JAXB:
(CDDL 1.1) (GPL2 w/ CPE) JAXB API bundle for GlassFish V3
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-94653680
@mengxr I merged your PR, I will review the libraries and let you know.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-94968132
@mengxr @vruusmann, I tried the upgrade to 1.1.15 and it requires some
code change, I will look into that.
Once we have the new list of dependencies I
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-94758704
@mengxr
FastInfoset-1.2.12.jar - http://en.wikipedia.org/wiki/Fast_Infoset -
Apache License 2.0
istack-commons-runtime-2.16.jar - https://istack
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-94759397
@mengxr @vruusmann
I am using 1.1.7, therefore compatible with Java 6. Should I update it to
the latest Java 6 supported version 1.1.15? Any benefit in doing so
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-94287249
@mengxr I aligned the branch with the master and resolved a conflict so
that it should be easier for you to review.
---
If your project is set up for it, you can
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-75626745
Yeah that makes sense, I don't think the xml would be that big to require
multiple partitions:
sc.parallelize(Array(pmml),1).saveAsTextFile(path
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-75429812
@jkbradley I could add that too in addition or alternative to the local
file.
In terms of implementation, I was thinking that the quick way of doing
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-73426635
@mengxr @srowen
I created the PMMLExportable trait and moved all the code under the package
mllib.pmml.
Supported models implements now this trait
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/4233#discussion_r24066997
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
---
@@ -17,14 +17,17 @@
package
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-72551243
Either approach has its pros and cons and I don't have a strong argument in
favor of any of the two.
R, for instance, use a similar approach to what I have
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-70366247
@jkbradley
I use JPMML to verify the exported model produces the same results, here
the details of my tests:
https://github.com/selvinsource/spark-pmml
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-61792414
@srowen I added a wrapper called ModelExporter and changed everything else
to private[mllib], therefore this should be the only object exposed in the API.
I also
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/3062#discussion_r19870786
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/export/pmml/PMMLModelExport.scala
---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/3062#discussion_r19870913
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/export/pmml/PMMLModelExport.scala
---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-61812768
@srowen
For the fields I have the same opinion, the exporter shouldn't ask the user
to provide the fields name, the model should be aware of that and I am glad
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-61554631
Hi Sean,
first of all thanks for your time in reviewing this.
If you don't foresee any other export format I agree that the ModelExport
is not really
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/3062#discussion_r19769244
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/export/pmml/PMMLModelExport.scala
---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/3062#discussion_r19768883
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/export/pmml/PMMLModelExport.scala
---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/3062#discussion_r19769621
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/export/pmml/PMMLModelExport.scala
---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/3062#discussion_r19770371
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/export/pmml/KMeansPMMLModelExport.scala
---
@@ -0,0 +1,106 @@
+/*
+ * Licensed
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/3062#discussion_r19771068
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/export/pmml/KMeansPMMLModelExport.scala
---
@@ -0,0 +1,106 @@
+/*
+ * Licensed
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/3062#discussion_r19771397
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/export/pmml/KMeansPMMLModelExport.scala
---
@@ -0,0 +1,106 @@
+/*
+ * Licensed
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/3062#discussion_r19771997
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/export/pmml/KMeansPMMLModelExport.scala
---
@@ -0,0 +1,106 @@
+/*
+ * Licensed
Github user selvinsource commented on a diff in the pull request:
https://github.com/apache/spark/pull/3062#discussion_r19773642
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/export/ModelExport.scala ---
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software
GitHub user selvinsource opened a pull request:
https://github.com/apache/spark/pull/3062
[SPARK-1406] Mllib pmml model export
See PDF attached to the JIRA issue 1406.
The contribution is my original work and I license the work to the project
under the project's open
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/3062#issuecomment-61407049
https://issues.apache.org/jira/browse/SPARK-1406
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
65 matches
Mail list logo