[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615481#comment-14615481 ] Vincenzo Selvaggio commented on SPARK-1406: --- After liaising with DMG I got MLlib listed in the powered and example pages: http://www.dmg.org/products.html http://www.dmg.org/pmml_examples/index.html PMML model evaluation support via MLib -- Key: SPARK-1406 URL: https://issues.apache.org/jira/browse/SPARK-1406 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Thomas Darimont Assignee: Vincenzo Selvaggio Fix For: 1.4.0 Attachments: MyJPMMLEval.java, SPARK-1406.pdf, SPARK-1406_v2.pdf, kmeans.xml It would be useful if spark would provide support the evaluation of PMML models (http://www.dmg.org/v4-2/GeneralStructure.html). This would allow to use analytical models that were created with a statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which would perform the actual model evaluation for a given input tuple. The PMML model would then just contain the parameterization of an analytical model. Other projects like JPMML-Evaluator do a similar thing. https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520967#comment-14520967 ] Xiangrui Meng commented on SPARK-1406: -- The PMML model export was partially addressed in PR #3062. The PMML model evaluation part will live outside the Spark codebase, possibly on spark-packages.org, due to license issues with jpmml-evaluator. I closed this JIRA. Please create new JIRAs for PMML model export for other models if someone is interested. Thanks everyone for the discussion! PMML model evaluation support via MLib -- Key: SPARK-1406 URL: https://issues.apache.org/jira/browse/SPARK-1406 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Thomas Darimont Assignee: Vincenzo Selvaggio Fix For: 1.4.0 Attachments: MyJPMMLEval.java, SPARK-1406.pdf, SPARK-1406_v2.pdf, kmeans.xml It would be useful if spark would provide support the evaluation of PMML models (http://www.dmg.org/v4-2/GeneralStructure.html). This would allow to use analytical models that were created with a statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which would perform the actual model evaluation for a given input tuple. The PMML model would then just contain the parameterization of an analytical model. Other projects like JPMML-Evaluator do a similar thing. https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245894#comment-14245894 ] Vincenzo Selvaggio commented on SPARK-1406: --- Scala examples on usage of ModelExporter.toPMML(model,path): https://github.com/selvinsource/spark-pmml-exporter-validator/tree/master/src/main/resources/spark_shell_exporter Exported PMML xml files: https://github.com/selvinsource/spark-pmml-exporter-validator/tree/master/src/main/resources/exported_pmml_models Evaluation using JPMML of the exported files: https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/java/org/selvinsource/spark_pmml_exporter_validator/SparkPMMLExporterValidator.java PMML model evaluation support via MLib -- Key: SPARK-1406 URL: https://issues.apache.org/jira/browse/SPARK-1406 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Thomas Darimont Assignee: Vincenzo Selvaggio Attachments: MyJPMMLEval.java, SPARK-1406.pdf, SPARK-1406_v2.pdf, kmeans.xml It would be useful if spark would provide support the evaluation of PMML models (http://www.dmg.org/v4-2/GeneralStructure.html). This would allow to use analytical models that were created with a statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which would perform the actual model evaluation for a given input tuple. The PMML model would then just contain the parameterization of an analytical model. Other projects like JPMML-Evaluator do a similar thing. https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193829#comment-14193829 ] Vincenzo Selvaggio commented on SPARK-1406: --- Hi, based on what Sean suggested I had a go at this requirement, in particular the export of models to pmml as I find useful to decouple the producer (spark) and consumer (an app) of mining models. Attached details on the approach taken, if you think it is valid I could proceed with the implementation of the other exporter (so far only kmeans is supported). Also attached the pmml exported for kmeans using the compiled spark-shell. PMML model evaluation support via MLib -- Key: SPARK-1406 URL: https://issues.apache.org/jira/browse/SPARK-1406 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Thomas Darimont Attachments: SPARK-1406.pdf, kmeans.xml It would be useful if spark would provide support the evaluation of PMML models (http://www.dmg.org/v4-2/GeneralStructure.html). This would allow to use analytical models that were created with a statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which would perform the actual model evaluation for a given input tuple. The PMML model would then just contain the parameterization of an analytical model. Other projects like JPMML-Evaluator do a similar thing. https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193830#comment-14193830 ] Apache Spark commented on SPARK-1406: - User 'selvinsource' has created a pull request for this issue: https://github.com/apache/spark/pull/3062 PMML model evaluation support via MLib -- Key: SPARK-1406 URL: https://issues.apache.org/jira/browse/SPARK-1406 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Thomas Darimont Attachments: SPARK-1406.pdf, kmeans.xml It would be useful if spark would provide support the evaluation of PMML models (http://www.dmg.org/v4-2/GeneralStructure.html). This would allow to use analytical models that were created with a statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which would perform the actual model evaluation for a given input tuple. The PMML model would then just contain the parameterization of an analytical model. Other projects like JPMML-Evaluator do a similar thing. https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194315#comment-14194315 ] Sean Owen commented on SPARK-1406: -- I put some comments on the PR. Thanks for starting on this. I think PMML interoperability is indeed helpful. So, one big issue here is that MLlib does not at the moment have any notion of a schema. PMML does, and this is vital to actually using the model elsewhere. You have to document what the variables are so they can be matched up with the same variables in another tool. So it's not possible now to do anything but make a model with field_1, field_2, ... This calls into question whether PMML can be meaningfully exported at this point from MLlib? Maybe it will have to wait until other PRs go in that start to add schema. I also thought it would be a little better to separate the representation of a model, from utility methods to write the model to things like files. The latter can be at least separated out of the type hierarchy. I'm also wondering how much value it adds to design for non-PMML export at this stage. (Finally I have some code lying around here that will translate the MLlib logistic regression model to PMML. I can put that in the pot at a suitable time.) PMML model evaluation support via MLib -- Key: SPARK-1406 URL: https://issues.apache.org/jira/browse/SPARK-1406 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Thomas Darimont Attachments: SPARK-1406.pdf, kmeans.xml It would be useful if spark would provide support the evaluation of PMML models (http://www.dmg.org/v4-2/GeneralStructure.html). This would allow to use analytical models that were created with a statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which would perform the actual model evaluation for a given input tuple. The PMML model would then just contain the parameterization of an analytical model. Other projects like JPMML-Evaluator do a similar thing. https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091738#comment-14091738 ] Vincenzo Selvaggio commented on SPARK-1406: --- I agree with Sean, I could see the export to PMML quite useful as it will decouple an application (wanting only to do scoring) from the evaluation of the model that can run on a full blown Spark cluster. However, I am not sure about using JPMML to generate the PMML, for sure it will be the easier option, but what about licensing? https://github.com/jpmml/jpmml-model is BSD 3-Clause while of course Spark is Apache 2.0. PMML model evaluation support via MLib -- Key: SPARK-1406 URL: https://issues.apache.org/jira/browse/SPARK-1406 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Thomas Darimont It would be useful if spark would provide support the evaluation of PMML models (http://www.dmg.org/v4-2/GeneralStructure.html). This would allow to use analytical models that were created with a statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which would perform the actual model evaluation for a given input tuple. The PMML model would then just contain the parameterization of an analytical model. Other projects like JPMML-Evaluator do a similar thing. https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091747#comment-14091747 ] Vincenzo Selvaggio commented on SPARK-1406: --- Thanks for clarifying. PMML model evaluation support via MLib -- Key: SPARK-1406 URL: https://issues.apache.org/jira/browse/SPARK-1406 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Thomas Darimont It would be useful if spark would provide support the evaluation of PMML models (http://www.dmg.org/v4-2/GeneralStructure.html). This would allow to use analytical models that were created with a statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which would perform the actual model evaluation for a given input tuple. The PMML model would then just contain the parameterization of an analytical model. Other projects like JPMML-Evaluator do a similar thing. https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063107#comment-14063107 ] Xiangrui Meng commented on SPARK-1406: -- I don't know anyone who is working on this feature. I set the target version to v1.2.0 for now. PMML model evaluation support via MLib -- Key: SPARK-1406 URL: https://issues.apache.org/jira/browse/SPARK-1406 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Thomas Darimont It would be useful if spark would provide support the evaluation of PMML models (http://www.dmg.org/v4-2/GeneralStructure.html). This would allow to use analytical models that were created with a statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which would perform the actual model evaluation for a given input tuple. The PMML model would then just contain the parameterization of an analytical model. Other projects like JPMML-Evaluator do a similar thing. https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044836#comment-14044836 ] Lisa Hua commented on SPARK-1406: - Hi, any progress on this issue now? PMML model evaluation support via MLib -- Key: SPARK-1406 URL: https://issues.apache.org/jira/browse/SPARK-1406 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Thomas Darimont It would be useful if spark would provide support the evaluation of PMML models (http://www.dmg.org/v4-2/GeneralStructure.html). This would allow to use analytical models that were created with a statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which would perform the actual model evaluation for a given input tuple. The PMML model would then just contain the parameterization of an analytical model. Other projects like JPMML-Evaluator do a similar thing. https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964638#comment-13964638 ] Sean Owen commented on SPARK-1406: -- Yes I understand transformations can be described in PMML. Do you mean parsing a transformation described in PMML and implementing the transformation? Yes that goes hand in hand with supporting import of a model in general. I would merely suggest this is a step that comes after several others in order of priority, like: - implementing feature transformations in the abstract in the code base, separately from the idea of PMML - implementing some form of model import via JPMML - implementing more functional in the Model classes to give a reason to want to import an external model into MLlib ... and to me this is less useful at this point than export too. I say this because the power of MLlib/Spark right now is perceived to be model building, making it more producer than consumer at this stage. PMML model evaluation support via MLib -- Key: SPARK-1406 URL: https://issues.apache.org/jira/browse/SPARK-1406 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Thomas Darimont It would be useful if spark would provide support the evaluation of PMML models (http://www.dmg.org/v4-2/GeneralStructure.html). This would allow to use analytical models that were created with a statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which would perform the actual model evaluation for a given input tuple. The PMML model would then just contain the parameterization of an analytical model. Other projects like JPMML-Evaluator do a similar thing. https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962048#comment-13962048 ] Xiangrui Meng commented on SPARK-1406: -- I think we should support PMML import/export in MLlib. PMML also provides feature transformations, which MLlib has very limited support at this time. The question is 1) how we take leverage on existing PMML packages, 2) how many people volunteer. Sean, it would be super helpful if you can share some experience on Oryx's PMML support, since I'm also not sure about whether this is the right time to start. PMML model evaluation support via MLib -- Key: SPARK-1406 URL: https://issues.apache.org/jira/browse/SPARK-1406 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Thomas Darimont It would be useful if spark would provide support the evaluation of PMML models (http://www.dmg.org/v4-2/GeneralStructure.html). This would allow to use analytical models that were created with a statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which would perform the actual model evaluation for a given input tuple. The PMML model would then just contain the parameterization of an analytical model. Other projects like JPMML-Evaluator do a similar thing. https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.2#6252)