[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2015-07-06 Thread Vincenzo Selvaggio (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615481#comment-14615481
 ] 

Vincenzo Selvaggio commented on SPARK-1406:
---

After liaising with DMG I got MLlib listed in the powered and example pages:
http://www.dmg.org/products.html
http://www.dmg.org/pmml_examples/index.html



 PMML model evaluation support via MLib
 --

 Key: SPARK-1406
 URL: https://issues.apache.org/jira/browse/SPARK-1406
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Thomas Darimont
Assignee: Vincenzo Selvaggio
 Fix For: 1.4.0

 Attachments: MyJPMMLEval.java, SPARK-1406.pdf, SPARK-1406_v2.pdf, 
 kmeans.xml


 It would be useful if spark would provide support the evaluation of PMML 
 models (http://www.dmg.org/v4-2/GeneralStructure.html).
 This would allow to use analytical models that were created with a 
 statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
 would perform the actual model evaluation for a given input tuple. The PMML 
 model would then just contain the parameterization of an analytical model.
 Other projects like JPMML-Evaluator do a similar thing.
 https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2015-04-30 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520967#comment-14520967
 ] 

Xiangrui Meng commented on SPARK-1406:
--

The PMML model export was partially addressed in PR #3062. The PMML model 
evaluation part will live outside the Spark codebase, possibly on 
spark-packages.org, due to license issues with jpmml-evaluator. I closed this 
JIRA. Please create new JIRAs for PMML model export for other models if someone 
is interested. Thanks everyone for the discussion!

 PMML model evaluation support via MLib
 --

 Key: SPARK-1406
 URL: https://issues.apache.org/jira/browse/SPARK-1406
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Thomas Darimont
Assignee: Vincenzo Selvaggio
 Fix For: 1.4.0

 Attachments: MyJPMMLEval.java, SPARK-1406.pdf, SPARK-1406_v2.pdf, 
 kmeans.xml


 It would be useful if spark would provide support the evaluation of PMML 
 models (http://www.dmg.org/v4-2/GeneralStructure.html).
 This would allow to use analytical models that were created with a 
 statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
 would perform the actual model evaluation for a given input tuple. The PMML 
 model would then just contain the parameterization of an analytical model.
 Other projects like JPMML-Evaluator do a similar thing.
 https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-12-14 Thread Vincenzo Selvaggio (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245894#comment-14245894
 ] 

Vincenzo Selvaggio commented on SPARK-1406:
---

Scala examples on usage of ModelExporter.toPMML(model,path):
https://github.com/selvinsource/spark-pmml-exporter-validator/tree/master/src/main/resources/spark_shell_exporter

Exported PMML xml files:
https://github.com/selvinsource/spark-pmml-exporter-validator/tree/master/src/main/resources/exported_pmml_models

Evaluation using JPMML of the exported files:
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/java/org/selvinsource/spark_pmml_exporter_validator/SparkPMMLExporterValidator.java

 PMML model evaluation support via MLib
 --

 Key: SPARK-1406
 URL: https://issues.apache.org/jira/browse/SPARK-1406
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Thomas Darimont
Assignee: Vincenzo Selvaggio
 Attachments: MyJPMMLEval.java, SPARK-1406.pdf, SPARK-1406_v2.pdf, 
 kmeans.xml


 It would be useful if spark would provide support the evaluation of PMML 
 models (http://www.dmg.org/v4-2/GeneralStructure.html).
 This would allow to use analytical models that were created with a 
 statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
 would perform the actual model evaluation for a given input tuple. The PMML 
 model would then just contain the parameterization of an analytical model.
 Other projects like JPMML-Evaluator do a similar thing.
 https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-11-02 Thread Vincenzo Selvaggio (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193829#comment-14193829
 ] 

Vincenzo Selvaggio commented on SPARK-1406:
---

Hi, 
based on what Sean suggested I had a go at this requirement, in particular the 
export of models to pmml as I find useful to decouple the producer (spark) and 
consumer (an app) of mining models.

Attached details on the approach taken, if you think it is valid I could 
proceed with the implementation of the other exporter (so far only kmeans is 
supported). 

Also attached the pmml exported for kmeans using the compiled spark-shell.

 PMML model evaluation support via MLib
 --

 Key: SPARK-1406
 URL: https://issues.apache.org/jira/browse/SPARK-1406
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Thomas Darimont
 Attachments: SPARK-1406.pdf, kmeans.xml


 It would be useful if spark would provide support the evaluation of PMML 
 models (http://www.dmg.org/v4-2/GeneralStructure.html).
 This would allow to use analytical models that were created with a 
 statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
 would perform the actual model evaluation for a given input tuple. The PMML 
 model would then just contain the parameterization of an analytical model.
 Other projects like JPMML-Evaluator do a similar thing.
 https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-11-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193830#comment-14193830
 ] 

Apache Spark commented on SPARK-1406:
-

User 'selvinsource' has created a pull request for this issue:
https://github.com/apache/spark/pull/3062

 PMML model evaluation support via MLib
 --

 Key: SPARK-1406
 URL: https://issues.apache.org/jira/browse/SPARK-1406
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Thomas Darimont
 Attachments: SPARK-1406.pdf, kmeans.xml


 It would be useful if spark would provide support the evaluation of PMML 
 models (http://www.dmg.org/v4-2/GeneralStructure.html).
 This would allow to use analytical models that were created with a 
 statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
 would perform the actual model evaluation for a given input tuple. The PMML 
 model would then just contain the parameterization of an analytical model.
 Other projects like JPMML-Evaluator do a similar thing.
 https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-11-02 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194315#comment-14194315
 ] 

Sean Owen commented on SPARK-1406:
--

I put some comments on the PR. Thanks for starting on this. I think PMML 
interoperability is indeed helpful. 

So, one big issue here is that MLlib does not at the moment have any notion of 
a schema. PMML does, and this is vital to actually using the model elsewhere. 
You have to document what the variables are so they can be matched up with the 
same variables in another tool. So it's not possible now to do anything but 
make a model with field_1, field_2, ... This calls into question whether 
PMML can be meaningfully exported at this point from MLlib? Maybe it will have 
to wait until other PRs go in that start to add schema.

I also thought it would be a little better to separate the representation of a 
model, from utility methods to write the model to things like files. The latter 
can be at least separated out of the type hierarchy. I'm also wondering how 
much value it adds to design for non-PMML export at this stage.

(Finally I have some code lying around here that will translate the MLlib 
logistic regression model to PMML. I can put that in the pot at a suitable 
time.)

 PMML model evaluation support via MLib
 --

 Key: SPARK-1406
 URL: https://issues.apache.org/jira/browse/SPARK-1406
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Thomas Darimont
 Attachments: SPARK-1406.pdf, kmeans.xml


 It would be useful if spark would provide support the evaluation of PMML 
 models (http://www.dmg.org/v4-2/GeneralStructure.html).
 This would allow to use analytical models that were created with a 
 statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
 would perform the actual model evaluation for a given input tuple. The PMML 
 model would then just contain the parameterization of an analytical model.
 Other projects like JPMML-Evaluator do a similar thing.
 https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-08-09 Thread Vincenzo Selvaggio (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091738#comment-14091738
 ] 

Vincenzo Selvaggio commented on SPARK-1406:
---

I agree with Sean, I could see the export to PMML quite useful as it will 
decouple an application (wanting only to do scoring) from the evaluation of the 
model that can run on a full blown Spark cluster.

However, I am not sure about using JPMML to generate the PMML, for sure it will 
be the easier option, but what about licensing? 
https://github.com/jpmml/jpmml-model is BSD 3-Clause while of course Spark is 
Apache 2.0.

 PMML model evaluation support via MLib
 --

 Key: SPARK-1406
 URL: https://issues.apache.org/jira/browse/SPARK-1406
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Thomas Darimont

 It would be useful if spark would provide support the evaluation of PMML 
 models (http://www.dmg.org/v4-2/GeneralStructure.html).
 This would allow to use analytical models that were created with a 
 statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
 would perform the actual model evaluation for a given input tuple. The PMML 
 model would then just contain the parameterization of an analytical model.
 Other projects like JPMML-Evaluator do a similar thing.
 https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-08-09 Thread Vincenzo Selvaggio (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091747#comment-14091747
 ] 

Vincenzo Selvaggio commented on SPARK-1406:
---

Thanks for clarifying.

 PMML model evaluation support via MLib
 --

 Key: SPARK-1406
 URL: https://issues.apache.org/jira/browse/SPARK-1406
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Thomas Darimont

 It would be useful if spark would provide support the evaluation of PMML 
 models (http://www.dmg.org/v4-2/GeneralStructure.html).
 This would allow to use analytical models that were created with a 
 statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
 would perform the actual model evaluation for a given input tuple. The PMML 
 model would then just contain the parameterization of an analytical model.
 Other projects like JPMML-Evaluator do a similar thing.
 https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-07-15 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063107#comment-14063107
 ] 

Xiangrui Meng commented on SPARK-1406:
--

I don't know anyone who is working on this feature. I set the target version to 
v1.2.0 for now.

 PMML model evaluation support via MLib
 --

 Key: SPARK-1406
 URL: https://issues.apache.org/jira/browse/SPARK-1406
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Thomas Darimont

 It would be useful if spark would provide support the evaluation of PMML 
 models (http://www.dmg.org/v4-2/GeneralStructure.html).
 This would allow to use analytical models that were created with a 
 statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
 would perform the actual model evaluation for a given input tuple. The PMML 
 model would then just contain the parameterization of an analytical model.
 Other projects like JPMML-Evaluator do a similar thing.
 https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-06-26 Thread Lisa Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044836#comment-14044836
 ] 

Lisa Hua commented on SPARK-1406:
-

Hi, any progress on this issue now?


 PMML model evaluation support via MLib
 --

 Key: SPARK-1406
 URL: https://issues.apache.org/jira/browse/SPARK-1406
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Thomas Darimont

 It would be useful if spark would provide support the evaluation of PMML 
 models (http://www.dmg.org/v4-2/GeneralStructure.html).
 This would allow to use analytical models that were created with a 
 statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
 would perform the actual model evaluation for a given input tuple. The PMML 
 model would then just contain the parameterization of an analytical model.
 Other projects like JPMML-Evaluator do a similar thing.
 https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-04-09 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964638#comment-13964638
 ] 

Sean Owen commented on SPARK-1406:
--

Yes I understand transformations can be described in PMML. Do you mean parsing 
a transformation described in PMML and implementing the transformation? Yes 
that goes hand in hand with supporting import of a model in general.

I would merely suggest this is a step that comes after several others in order 
of priority, like:
- implementing feature transformations in the abstract in the code base, 
separately from the idea of PMML
- implementing some form of model import via JPMML
- implementing more functional in the Model classes to give a reason to want to 
import an external model into MLlib

... and to me this is less useful at this point than export too. I say this 
because the power of MLlib/Spark right now is perceived to be model building, 
making it more producer than consumer at this stage.

 PMML model evaluation support via MLib
 --

 Key: SPARK-1406
 URL: https://issues.apache.org/jira/browse/SPARK-1406
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Thomas Darimont

 It would be useful if spark would provide support the evaluation of PMML 
 models (http://www.dmg.org/v4-2/GeneralStructure.html).
 This would allow to use analytical models that were created with a 
 statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
 would perform the actual model evaluation for a given input tuple. The PMML 
 model would then just contain the parameterization of an analytical model.
 Other projects like JPMML-Evaluator do a similar thing.
 https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-04-07 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962048#comment-13962048
 ] 

Xiangrui Meng commented on SPARK-1406:
--

I think we should support PMML import/export in MLlib. PMML also provides 
feature transformations, which MLlib has very limited support at this time. The 
question is 1) how we take leverage on existing PMML packages, 2)  how many 
people volunteer.

Sean, it would be super helpful if you can share some experience on Oryx's PMML 
support, since I'm also not sure about whether this is the right time to start.

 PMML model evaluation support via MLib
 --

 Key: SPARK-1406
 URL: https://issues.apache.org/jira/browse/SPARK-1406
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Thomas Darimont

 It would be useful if spark would provide support the evaluation of PMML 
 models (http://www.dmg.org/v4-2/GeneralStructure.html).
 This would allow to use analytical models that were created with a 
 statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
 would perform the actual model evaluation for a given input tuple. The PMML 
 model would then just contain the parameterization of an analytical model.
 Other projects like JPMML-Evaluator do a similar thing.
 https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.2#6252)