[GitHub] spark pull request: [SPARK-11401] [MLLIB] PMML export for Logistic...

2015-11-27 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/9397#issuecomment-160121192 @dbtsai thanks for the suggestion, rebasing from master seems to have fixed it. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-11401] [MLLIB] PMML export for Logistic...

2015-11-24 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/9397#issuecomment-159380353 @dbtsai any advice on why it is failing? All the pmml tests passed and that is the only thing I changed. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

2015-11-15 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/9057#issuecomment-156852089 @yinxusen https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class I tested both multinomial and bernoulli

[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

2015-11-13 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/9057#issuecomment-156492495 @yinxusen I will check out your branch and do some testing as well using the validator. From what I can see the exported xml seems correct :+1: . --- If your

[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

2015-11-12 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/9057#issuecomment-156044525 @yinxusen for multinomial naive Bayes you could still use the inputs as discrete as they should be frequency of the terms accordingly to the documentation

[GitHub] spark pull request: [SPARK-11401] [MLLIB] PMML export for Logistic...

2015-11-09 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/9397#issuecomment-155203822 @dbtsai I don't think the issue with failed Spark test is to do with my code. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-11401] [MLLIB] PMML export for Logistic...

2015-11-04 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/9397#issuecomment-153905098 @dbtsai how do I fix the following? ``` [info] spark-mllib: found 1 potential binary incompatibilities (filtered 51) [error] * class

[GitHub] spark pull request: [SPARK-11401] [MLLIB] PMML export for Logistic...

2015-11-03 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/9397#issuecomment-153474883 Here the diff between the two files (BinaryClass vs Class): http://www.mergely.com/w7ufbahQ/ Practically the difference between the Binary and the Class

[GitHub] spark pull request: [SPARK-7272] [MLLIB] PMML export for Logistic ...

2015-11-01 Thread selvinsource
GitHub user selvinsource opened a pull request: https://github.com/apache/spark/pull/9397 [SPARK-7272] [MLLIB] PMML export for Logistic Regression Multiclass Classification You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

2015-10-31 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/9057#issuecomment-152717699 @yinxusen If you look at https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class I added a test for your

[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

2015-10-29 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/9057#issuecomment-152173191 @JasmineGeorge, it would be great if you can add a test for the validator to ensure the exported xml file can be loaded in JPMML and score the same results

[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

2015-10-29 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/9057#issuecomment-152340315 I will do it, no prob. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-09-26 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/7842#issuecomment-143407247 Thanks, looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-09-26 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-143408820 @tvmanikandan we are adding decision tree PMML support for multiclass classification, see https://issues.apache.org/jira/browse/SPARK-8542. Linear

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-09-20 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/7842#issuecomment-141826772 Exported models (regression and classification trees) look good now: the validator generates the same results as spark. Some minor comments: - I would

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-09-15 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/7842#issuecomment-140540576 Yeah, I am planning to review what @JasmineGeorge did soon, I have been a bit busy lately. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-09-05 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/7842#issuecomment-137925061 Also, did you verify the evaluation is correct by running java -jar target/spark-pmml-exporter-validator-1.1.0-SNAPSHOT-jar-with-dependencies.jar

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-09-05 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/7842#issuecomment-137924965 As explained in details above, the order should be the same as the input vector (used for training the model). If you look at https://github.com

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-08-23 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/7842#issuecomment-133817289 Here some initial results from my tests (trying to load the exported xml into JPMML for evaluation). **Node with no predicate** If you look

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-08-22 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/7842#discussion_r37694757 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/pmml/export/PMMLModelExportFactory.scala --- @@ -55,6 +53,7 @@ private[mllib] object

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-08-22 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/7842#discussion_r37694766 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/pmml/export/DecisionTreePMMLModelExportSuite.scala --- @@ -0,0 +1,364 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-08-22 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/7842#discussion_r37694826 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/pmml/export/TreeModelUtils.scala --- @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-08-22 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/7842#discussion_r37694820 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/pmml/export/TreeModelUtils.scala --- @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-08-22 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/7842#discussion_r37694832 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/pmml/export/TreeModelUtils.scala --- @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-08-22 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/7842#issuecomment-133677161 As overall comment, to add to my specific comments above, I would say we should create the field names once (possibly in the data dictionary as data fields) and re

[GitHub] spark pull request: [SPARK-7272] [MLLIB] User guide for PMML model...

2015-05-18 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/6219#issuecomment-103163614 Great! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: Mllib pmml model export spark 7272

2015-05-17 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/6219#issuecomment-102791867 The contribution is my original work and I license the work to the project under the project's open source license. --- If your project is set up for it, you can

[GitHub] spark pull request: Mllib pmml model export spark 7272

2015-05-17 Thread selvinsource
GitHub user selvinsource opened a pull request: https://github.com/apache/spark/pull/6219 Mllib pmml model export spark 7272 https://issues.apache.org/jira/browse/SPARK-7272 You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-05-01 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-98086459 @mengxr jpmml evaluator submitted to http://spark-packages.org --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-30 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-97761392 @mengxr thanks for merging this PR! I will look into submitting the evaluator at spark-packages if you think it will be useful, I used it as side project

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-29 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-97341602 @mengxr updated based on your latest comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-28 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-96950208 @mengxr please review, it should work as expected now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-27 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-96516889 @mengxr for SVM, I manually tried what you suggested and it looks good. I loaded the example below in JPMML and evaluated it as Classification map, indeed

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-27 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-96544425 For binary logistic regression, using the same principle (intercept as threshold), doing some maths, we could set: `intercept = -ln(1/threshold - 1

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-27 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-96549372 Here my thinking. When normalizationMethod = logit, the predicted value is computed as `pj = 1 / ( 1 + exp( -yj ) )` When we set `intercept = 0

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-25 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-96157430 @mengxr 3) I have done point 3, if it is a multinomial logistic regression the export will cause an IllegalArgumentException. I didn't realize now Logistic

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-22 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-95099568 @srowen that sounds good :+1: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-22 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-95082764 @mengxr code updated, style should be consistent :)! @vruusmann thanks for your test project, I did the same last night (mvn dependency:tree) to find out

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-22 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-95086612 @srowen https://github.com/apache/spark/blob/master/NOTICE already has a reference to JAXB: (CDDL 1.1) (GPL2 w/ CPE) JAXB API bundle for GlassFish V3

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-21 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-94653680 @mengxr I merged your PR, I will review the libraries and let you know. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-21 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-94968132 @mengxr @vruusmann, I tried the upgrade to 1.1.15 and it requires some code change, I will look into that. Once we have the new list of dependencies I

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-21 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-94758704 @mengxr FastInfoset-1.2.12.jar - http://en.wikipedia.org/wiki/Fast_Infoset - Apache License 2.0 istack-commons-runtime-2.16.jar - https://istack

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-21 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-94759397 @mengxr @vruusmann I am using 1.1.7, therefore compatible with Java 6. Should I update it to the latest Java 6 supported version 1.1.15? Any benefit in doing so

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-19 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-94287249 @mengxr I aligned the branch with the master and resolved a conflict so that it should be easier for you to review. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-02-23 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-75626745 Yeah that makes sense, I don't think the xml would be that big to require multiple partitions: sc.parallelize(Array(pmml),1).saveAsTextFile(path

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-02-22 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-75429812 @jkbradley I could add that too in addition or alternative to the local file. In terms of implementation, I was thinking that the quick way of doing

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-02-08 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-73426635 @mengxr @srowen I created the PMMLExportable trait and moved all the code under the package mllib.pmml. Supported models implements now this trait

[GitHub] spark pull request: [SPARK-4587] [mllib] ML model import/export

2015-02-03 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/4233#discussion_r24066997 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -17,14 +17,17 @@ package

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-02-02 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-72551243 Either approach has its pros and cons and I don't have a strong argument in favor of any of the two. R, for instance, use a similar approach to what I have

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-01-17 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-70366247 @jkbradley I use JPMML to verify the exported model produces the same results, here the details of my tests: https://github.com/selvinsource/spark-pmml

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-05 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-61792414 @srowen I added a wrapper called ModelExporter and changed everything else to private[mllib], therefore this should be the only object exposed in the API. I also

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-05 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/3062#discussion_r19870786 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/export/pmml/PMMLModelExport.scala --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-05 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/3062#discussion_r19870913 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/export/pmml/PMMLModelExport.scala --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-05 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-61812768 @srowen For the fields I have the same opinion, the exporter shouldn't ask the user to provide the fields name, the model should be aware of that and I am glad

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-03 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-61554631 Hi Sean, first of all thanks for your time in reviewing this. If you don't foresee any other export format I agree that the ModelExport is not really

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-03 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/3062#discussion_r19769244 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/export/pmml/PMMLModelExport.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-03 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/3062#discussion_r19768883 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/export/pmml/PMMLModelExport.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-03 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/3062#discussion_r19769621 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/export/pmml/PMMLModelExport.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-03 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/3062#discussion_r19770371 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/export/pmml/KMeansPMMLModelExport.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-03 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/3062#discussion_r19771068 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/export/pmml/KMeansPMMLModelExport.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-03 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/3062#discussion_r19771397 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/export/pmml/KMeansPMMLModelExport.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-03 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/3062#discussion_r19771997 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/export/pmml/KMeansPMMLModelExport.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-03 Thread selvinsource
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/3062#discussion_r19773642 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/export/ModelExport.scala --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-02 Thread selvinsource
GitHub user selvinsource opened a pull request: https://github.com/apache/spark/pull/3062 [SPARK-1406] Mllib pmml model export See PDF attached to the JIRA issue 1406. The contribution is my original work and I license the work to the project under the project's open

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2014-11-02 Thread selvinsource
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-61407049 https://issues.apache.org/jira/browse/SPARK-1406 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well