[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138686#comment-15138686 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chobeat commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-181790876 I agree with @sachingoel0101 on the import complexity but, from our point of view, Flink is the perfect platform to evaluate models in streaming and we are using it that way in our architecture. Why do you think it wouldn't be suitable? > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138722#comment-15138722 ] ASF GitHub Bot commented on FLINK-1966: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-181798643 That is a good point. In streaming setting, it does indeed make sense for the model to be available. However, in my opinion, then it would make sense to actually just use jppml and import the object, followed by extracting the model parameters. Granted, it is an added effort on the user side, but I still think it beats the complexity introduced by supporting imports directly. Furthermore, it would be a bad design to have to reject valid pmml models, just because a minor thing isn't supported in Flink. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138593#comment-15138593 ] ASF GitHub Bot commented on FLINK-1966: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-181772422 That said, just for a comparison purpose, spark has its own model export and import feature, along with pmml export. Hoping to fully support pmml import in a framework like flink or spark is a next to impossible thing which requires changes to the entire way our pipelines and datasets and represented. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138576#comment-15138576 ] ASF GitHub Bot commented on FLINK-1966: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-181771679 As the original author of this PR, I'd say this: I tried implementing the import features but they aren't worth it. You have to discard most of the valid pmml models because they don't fit in with the flink framework. Further, in my opinion, the use of flink is to train the model. Once we export that model in pmml, you can use it pretty much anywhere, say R or matlab, which support a complete pmml import and export functionality. The exported model is in most cases going to be used for testing, evaluating and predictions purposes, for which flink isn't a good platform to use anyway. This can be accomplished anywhere. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138728#comment-15138728 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chobeat commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-181799783 @sachingoel0101 I agree. Nonetheless, an easy way to store and move a model generated in batch to a streaming enviroment would be a really useful feature and we go back to what @chiwanpark was saying about a custom format internal to Flink. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138528#comment-15138528 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chobeat commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-181757426 Well that wouldn't be a problem for the export: you will create and therefore export only models that have `double` as datatype for parameters but that's not an issue. This would be a problem for import though because PMML does support a wider set of data types and model types but you can't really achieve any satisfying degree of support for PMML in a platform like Flink and that's why everyone use JPMML for evaluation. You will be able to only import compatible models with compatible data fields. This would require a simple validation at runtime on the model type and on fields' data types. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138751#comment-15138751 ] ASF GitHub Bot commented on FLINK-1966: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-181803637 I'm all for that. Flink's models should be transferable at least across flink. But that should be part of a separate PR, and not block this one as it has been for far too long. It should be pretty easy to accomplish > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137142#comment-15137142 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chobeat commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-181442715 Hello, any news on this PR? @smarthi PMML is actually an industry standard and widely used to support model portability in complex infrastructures. Assuming that is not adopted is a wrong assumption according to my knowledge and experience. There are for sure a lot of data scientists that never get in contact with this standard and I had never heard of it before my first job on a ML architecture but it's the best (and only) tool for this kind of job. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138450#comment-15138450 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-181739512 Hi @chobeat, thanks for leaving your comments. About compatibility with other system (such as R or MLlib), I meant that we cannot achieve compatibility with the systems even though we use PMML because there is difference between FlinkML and other systems. For example, FlinkML supports only `Double` as a data type. So we can achieve only partial support of PMML (especially importing model from the other systems). Is this sufficient to use in production? If yes, we would go for this. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137408#comment-15137408 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-181509602 Hi @chobeat, thanks for pinging this issue. I forgot sending a discuss email to mailing thread. I think we have to discuss about followings: * What is main purpose to support PMML? Is this feature for only model portability in FlinkML? If not, we have to support other systems such as R or Spark MLlib. * What about FlinkML only format? I think that support for distributed system in PMML is poor. XML-based format is hard to parallelize. I would like to create a general ML model importing/exporting framework. Then, we can easily add the PMML support based on the framework. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137745#comment-15137745 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chobeat commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-181578375 Hi @chiwanpark, > What is main purpose to support PMML? Is this feature for only model portability in FlinkML? I've used PMML extensively in a previous project and saw many application cases other than my own. PMML export is necessary for external portability: you may need to create a model in Flink and use it on local data using a data mining tool for example, or you could deploy it in a production pipeline developed with a totally different technological stack. PMML import is optional though: you can use JPMML (the reference implementation of PMML) to read a PMML file and perform the evaluation of the model locally to the node. Import from PMML to the native implementation of FlinkML may be a plus in terms of usability and probably performance but it's not really a blocking issue for a developer. > If not, we have to support other systems such as R or Spark MLlib. Support for R may be interesting by itself but I can't understand what do you mean. MLlib does support PMML export (even if somewhat bugged for a few models like Naive Bayes) so it is already possible to move models from MLlib to Flink. >What about FlinkML only format? I think that support for distributed system in PMML is poor. XML-based format is hard to parallelize. This could be interesting to guarantee the consistency of the models and to tune it to our needs. The complexity of PMML is due to the need of generality and consistency but it's often an overkill to describe simple models. Also it has only partial support for many models that we may want to implement: i.e. any of the online learning algorithms implemented in SAMOA or other online learning frameworks. I know we still miss a few pieces before reaching that point, but still... > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980911#comment-14980911 ] ASF GitHub Bot commented on FLINK-1966: --- Github user smarthi commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-152265442 Suggest that you see how PMML been's done on Oryx 2.0 (PMML in Spark followed Oryx 2.0). PMML support was discussed various times on the Mahout project and was never implemented in large part due to lack of actual PMML usage by Machine Learning Practitioners and Data Scientists. See this Mahout thread from last year and more specifically to Ted Dunning's comment in the thread - http://mail-archives.apache.org/mod_mbox/mahout-dev/201503.mbox/%3CCAJwFCa1%3DAw%2B3G54FgkYdTH%3DoNQBRqfeU-SS19iCFKMWbAfWzOQ%40mail.gmail.com%3E Given that PMML models could possibly get real huge, its a good practice to persist them in compressed format. It would also be good to be able to specify which features/fields are categorical/numeric (via a config file maybe). > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948428#comment-14948428 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1186#discussion_r41496687 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/MLUtils.scala --- @@ -39,6 +40,10 @@ import org.apache.flink.ml.math.SparseVector */ object MLUtils { + val flinkApp = new Application() --- End diff -- `flinkApp` is ambiguous. I would like to use `pmmlApp`. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948446#comment-14948446 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1186#discussion_r41497844 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala --- @@ -18,15 +18,12 @@ package org.apache.flink.ml.regression -import org.apache.flink.api.scala.DataSet +import org.apache.flink.api.scala.{DataSet, _} --- End diff -- Unnecessary import statement change > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948444#comment-14948444 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1186#discussion_r41497773 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala --- @@ -124,6 +121,52 @@ class MultipleLinearRegression extends Predictor[MultipleLinearRegression] { } } + + override def toPMML(): PMML = { +weightsOption match { + case None => { +throw new RuntimeException("The MultipleLinearRegression has not been fitted to the " + + "data. This is necessary to learn the weight vector of the linear function.") + } + case Some(weights) => { +val model = weights.collect().head +val pmml = new PMML() +pmml.setHeader(new Header().setDescription("Multiple Linear Regression")) + +// define the fields +val target = FieldName.create("prediction") +val fields = scala.Array.ofDim[FieldName](model.weights.size) +Range(0, model.weights.size).foreach(index => + fields(index) = FieldName.create("field_" + index) +) + +// define the data dictionary, mining schema and regression table +val dictionary = new DataDictionary() +val miningSchema = new MiningSchema() +val regressionTable = new RegressionTable().setIntercept(model.intercept) +Range(0, model.weights.size).foreach(index => { + miningSchema.addMiningFields( +new MiningField(fields(index)).setUsageType(FieldUsageType.ACTIVE) + ) + regressionTable.addNumericPredictors( +new NumericPredictor(fields(index), model.weights(index)) + ) + dictionary.addDataFields( +new DataField(fields(index), OpType.CONTINUOUS, DataType.DOUBLE) + ) +}) --- End diff -- We can simplify this using `zipWithIndex` method for `fields`. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948475#comment-14948475 ] ASF GitHub Bot commented on FLINK-1966: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-146501909 The PMML model is quite extensive, and there isn't enough support in the ML library for utilizing most of the things [like FieldUsageType, DataTypes etc.]. I had actually written the import functions for both SVM and MLR but decided to drop them. I mostly followed Spark's implementation for this, and it isn't supported there either. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948447#comment-14948447 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1186#discussion_r41497880 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala --- @@ -124,6 +121,52 @@ class MultipleLinearRegression extends Predictor[MultipleLinearRegression] { } } + + override def toPMML(): PMML = { +weightsOption match { + case None => { +throw new RuntimeException("The MultipleLinearRegression has not been fitted to the " + + "data. This is necessary to learn the weight vector of the linear function.") + } + case Some(weights) => { +val model = weights.collect().head +val pmml = new PMML() +pmml.setHeader(new Header().setDescription("Multiple Linear Regression")) + +// define the fields +val target = FieldName.create("prediction") +val fields = scala.Array.ofDim[FieldName](model.weights.size) +Range(0, model.weights.size).foreach(index => + fields(index) = FieldName.create("field_" + index) +) --- End diff -- We can make this more scalaesque: ```scala val fields = (0 until model.weights.size).map(i => FieldName.create("field_" + i.toString)) ``` > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948455#comment-14948455 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1186#discussion_r41498464 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/classification/SVM.scala --- @@ -228,6 +225,56 @@ class SVM extends Predictor[SVM] { parameters.add(OutputDecisionFunction, outputDecisionFunction) this } + + override def toPMML(): PMML = { +weightsOption match { + case None => { +throw new RuntimeException("The SVM model has not been trained. Call first fit" + + " before calling the export operation.") + } + case Some(weights) => { +val model = weights.collect().head +val pmml = new PMML() +pmml.setHeader(new Header().setDescription("Support Vector Machine")) + +// define the fields +val target = FieldName.create("prediction") +val fields = scala.Array.ofDim[FieldName](model.size) +Range(0, model.size).foreach(index => + fields(index) = FieldName.create("field_" + index) +) + +// define the data dictionary, mining schema and model +val dictionary = new DataDictionary() +val miningSchema = new MiningSchema() +val coefficients = new Coefficients() +Range(0, model.size).foreach(index => { + miningSchema.addMiningFields( +new MiningField(fields(index)).setUsageType(FieldUsageType.ACTIVE) + ) + coefficients.addCoefficients(new Coefficient().setValue(model.apply(index))) + dictionary.addDataFields( +new DataField(fields(index), OpType.CONTINUOUS, DataType.DOUBLE) + ) +}) --- End diff -- Please use `zipWithIndex` method. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948453#comment-14948453 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1186#discussion_r41498327 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala --- @@ -124,6 +121,52 @@ class MultipleLinearRegression extends Predictor[MultipleLinearRegression] { } } + + override def toPMML(): PMML = { +weightsOption match { + case None => { +throw new RuntimeException("The MultipleLinearRegression has not been fitted to the " + + "data. This is necessary to learn the weight vector of the linear function.") + } + case Some(weights) => { +val model = weights.collect().head +val pmml = new PMML() +pmml.setHeader(new Header().setDescription("Multiple Linear Regression")) + +// define the fields +val target = FieldName.create("prediction") +val fields = scala.Array.ofDim[FieldName](model.weights.size) +Range(0, model.weights.size).foreach(index => + fields(index) = FieldName.create("field_" + index) +) + +// define the data dictionary, mining schema and regression table +val dictionary = new DataDictionary() +val miningSchema = new MiningSchema() +val regressionTable = new RegressionTable().setIntercept(model.intercept) +Range(0, model.weights.size).foreach(index => { + miningSchema.addMiningFields( +new MiningField(fields(index)).setUsageType(FieldUsageType.ACTIVE) + ) + regressionTable.addNumericPredictors( +new NumericPredictor(fields(index), model.weights(index)) + ) + dictionary.addDataFields( +new DataField(fields(index), OpType.CONTINUOUS, DataType.DOUBLE) + ) +}) +dictionary.addDataFields(new DataField(target, OpType.CONTINUOUS, DataType.DOUBLE)) +miningSchema.addMiningFields(new MiningField(target).setUsageType(FieldUsageType.PREDICTED)) + +// define the model +val pmmlModel = new RegressionModel() + .setFunctionName(MiningFunctionType.REGRESSION) --- End diff -- Maybe we should add `.setModelType(RegressionModel.ModelType.LINEAR_REGRESSION)` after this line for future of other regression model. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948461#comment-14948461 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-146493431 Hi @sachingoel0101, Thanks for opening pull request. Great start! I have some comments for your request. 1. Some implementation is not scalaesque. 2. Lack of importing PMML interface I would prefer to cover only PMML interface (`toPMML`, `fromPMML` method) in this pull request. Covering implementation of the PMML interface in other issues would be better for me. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948785#comment-14948785 ] ASF GitHub Bot commented on FLINK-1966: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-146570848 Okay, We need some discussion in mailing list about ML model import/export feature. I think that PMML support is one of sub-issues related to the ML model import/export issue. I'll post the discussion thread in few days. > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)
[ https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933133#comment-14933133 ] ASF GitHub Bot commented on FLINK-1966: --- GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/1186 [FLINK-1966][ml]Add support for Predictive Model Markup Language 1. Adds an interface to allow exporting of models to PMML format. 2. Implements export methods for the existing SVM and Regression algorithms. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachingoel0101/flink pmml Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1186.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1186 commit a71640edd83b6fd1085935496c1dd2553bd42caa Author: Sachin GoelDate: 2015-09-27T13:04:17Z [FLINK-1966][ml]Add support for Predictive Model Markup Language > Add support for predictive model markup language (PMML) > --- > > Key: FLINK-1966 > URL: https://issues.apache.org/jira/browse/FLINK-1966 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Sachin Goel >Priority: Minor > Labels: ML > > The predictive model markup language (PMML) [1] is a widely used language to > describe predictive and descriptive models as well as pre- and > post-processing steps. That way it allows and easy way to export for and > import models from other ML tools. > Resources: > [1] > http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)