[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138686#comment-15138686
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chobeat commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181790876
  
I agree with @sachingoel0101 on the import complexity but, from our point 
of view, Flink is the perfect platform to evaluate models in streaming and we 
are using it that way in our architecture. Why do you think it wouldn't be 
suitable? 


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138722#comment-15138722
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181798643
  
That is a good point. In streaming setting, it does indeed make sense for 
the model to be available. However, in my opinion, then it would make sense to 
actually just use jppml and import the object, followed by extracting the model 
parameters. Granted, it is an added effort on the user side, but I still think 
it beats the complexity introduced by supporting imports directly. Furthermore, 
it would be a bad design to have to reject valid pmml models, just because a 
minor thing isn't supported in Flink. 


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138593#comment-15138593
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181772422
  
That said, just for a comparison purpose, spark has its own model export 
and import feature, along with pmml export. Hoping to fully support pmml import 
in a framework like flink or spark is a next to impossible thing which requires 
changes to the entire way our pipelines and datasets and represented. 


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138576#comment-15138576
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181771679
  
As the original  author of this PR, I'd say this:
I tried implementing the import features but they aren't worth it. You have 
to discard most of the valid pmml models because they don't fit in with the 
flink framework. 
Further, in my opinion, the use of flink is to train the model. Once we 
export that model in pmml, you can use it pretty much anywhere, say R or 
matlab, which support a complete pmml import and export functionality. The 
exported model is in most cases going to be used for testing, evaluating and 
predictions purposes, for which flink isn't a good platform to use anyway. This 
can be accomplished anywhere. 


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138728#comment-15138728
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chobeat commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181799783
  
@sachingoel0101 I agree. Nonetheless, an easy way to store and move a model 
generated in batch to a streaming enviroment would be a really useful feature 
and we go back to what @chiwanpark was saying about a custom format internal to 
Flink. 


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138528#comment-15138528
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chobeat commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181757426
  
Well that wouldn't be a problem for the export: you will create and 
therefore export only models that have `double` as datatype for parameters but 
that's not an issue. 

This would be a problem for import though because PMML does support a wider 
set of data types and model types but you can't really achieve any satisfying 
degree of support for PMML in a platform like Flink and that's why everyone use 
JPMML for evaluation. You will be able to only import compatible models with 
compatible data fields. This would require a simple validation at runtime on 
the model type and on fields' data types.


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138751#comment-15138751
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181803637
  
I'm all for that. Flink's models should be transferable at least across 
flink. But that should be part of a separate PR, and not block this one as it 
has been for far too long. 
It should be pretty easy to accomplish 


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137142#comment-15137142
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chobeat commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181442715
  
Hello,

any news on this PR? 

@smarthi PMML is actually an industry standard and widely used to support 
model portability in complex infrastructures. Assuming that is not adopted is a 
wrong assumption according to my knowledge and experience. There are for sure a 
lot of data scientists that never get in contact with this standard and I had 
never heard of it before my first job on a ML architecture but it's the best 
(and only) tool for this kind of job.


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138450#comment-15138450
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181739512
  
Hi @chobeat, thanks for leaving your comments.

About compatibility with other system (such as R or MLlib), I meant that we 
cannot achieve compatibility with the systems even though we use PMML because 
there is difference between FlinkML and other systems. For example, FlinkML 
supports only `Double` as a data type. So we can achieve only partial support 
of PMML (especially importing model from the other systems). Is this sufficient 
to use in production? If yes, we would go for this.


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137408#comment-15137408
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181509602
  
Hi @chobeat, thanks for pinging this issue. I forgot sending a discuss 
email to mailing thread. I think we have to discuss about followings:

* What is main purpose to support PMML? Is this feature for only model 
portability in FlinkML? If not, we have to support other systems such as R or 
Spark MLlib.
* What about FlinkML only format? I think that support for distributed 
system in PMML is poor. XML-based format is hard to parallelize.

I would like to create a general ML model importing/exporting framework. 
Then, we can easily add the PMML support based on the framework.


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137745#comment-15137745
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chobeat commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181578375
  
Hi @chiwanpark,

> What is main purpose to support PMML? Is this feature for only model 
portability in FlinkML?

I've used PMML extensively in a previous project and saw many application 
cases other than my own. PMML export is necessary for  external portability: 
you may need to create a model in Flink and use it on local data using a data 
mining tool for example, or you could deploy it in a production pipeline 
developed with a totally different technological stack. 
PMML import is optional though: you can use JPMML (the reference 
implementation of PMML) to read a PMML file and perform the evaluation of the 
model locally to the node. Import from PMML to the native implementation of 
FlinkML may be a plus in terms of usability and probably performance but it's 
not really a blocking issue for a developer.

> If not, we have to support other systems such as R or Spark MLlib.

Support for R may be interesting by itself but I can't understand what do 
you mean. MLlib does support PMML export (even if somewhat bugged for a few 
models like Naive Bayes) so it is already possible to move models from MLlib to 
Flink.

>What about FlinkML only format? I think that support for distributed 
system in PMML is poor. XML-based format is hard to parallelize.

This could be interesting to guarantee the consistency of the models and to 
tune it to our needs. The complexity of PMML is due to the need of generality 
and consistency but it's often an overkill to describe simple models. Also it 
has only partial support for many models that we may want to implement: i.e. 
any of the online learning algorithms implemented in SAMOA or other online 
learning frameworks. I know we still miss a few pieces before reaching that 
point, but still...




> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2015-10-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980911#comment-14980911
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user smarthi commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-152265442
  
Suggest that you see how PMML been's done on Oryx 2.0 (PMML in Spark 
followed Oryx 2.0).  PMML support was discussed various times on the Mahout 
project and was never implemented in large part due to lack of actual PMML 
usage by Machine Learning Practitioners and Data Scientists. 

See this Mahout thread from last year and more specifically to Ted 
Dunning's comment in the thread - 
http://mail-archives.apache.org/mod_mbox/mahout-dev/201503.mbox/%3CCAJwFCa1%3DAw%2B3G54FgkYdTH%3DoNQBRqfeU-SS19iCFKMWbAfWzOQ%40mail.gmail.com%3E

Given that PMML models could possibly get real huge, its a good practice to 
persist them in compressed format. It would also be good to be able to specify 
which features/fields are categorical/numeric (via a config file maybe). 









> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2015-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948428#comment-14948428
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1186#discussion_r41496687
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/MLUtils.scala ---
@@ -39,6 +40,10 @@ import org.apache.flink.ml.math.SparseVector
   */
 object MLUtils {
 
+  val flinkApp = new Application()
--- End diff --

`flinkApp` is ambiguous. I would like to use `pmmlApp`.


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2015-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948446#comment-14948446
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1186#discussion_r41497844
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala
 ---
@@ -18,15 +18,12 @@
 
 package org.apache.flink.ml.regression
 
-import org.apache.flink.api.scala.DataSet
+import org.apache.flink.api.scala.{DataSet, _}
--- End diff --

Unnecessary import statement change


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2015-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948444#comment-14948444
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1186#discussion_r41497773
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala
 ---
@@ -124,6 +121,52 @@ class MultipleLinearRegression extends 
Predictor[MultipleLinearRegression] {
 }
 
   }
+
+  override def toPMML(): PMML = {
+weightsOption match {
+  case None => {
+throw new RuntimeException("The MultipleLinearRegression has not 
been fitted to the " +
+  "data. This is necessary to learn the weight vector of the 
linear function.")
+  }
+  case Some(weights) => {
+val model = weights.collect().head
+val pmml = new PMML()
+pmml.setHeader(new Header().setDescription("Multiple Linear 
Regression"))
+
+// define the fields
+val target = FieldName.create("prediction")
+val fields = scala.Array.ofDim[FieldName](model.weights.size)
+Range(0, model.weights.size).foreach(index =>
+  fields(index) = FieldName.create("field_" + index)
+)
+
+// define the data dictionary, mining schema and regression table
+val dictionary = new DataDictionary()
+val miningSchema = new MiningSchema()
+val regressionTable = new 
RegressionTable().setIntercept(model.intercept)
+Range(0, model.weights.size).foreach(index => {
+  miningSchema.addMiningFields(
+new 
MiningField(fields(index)).setUsageType(FieldUsageType.ACTIVE)
+  )
+  regressionTable.addNumericPredictors(
+new NumericPredictor(fields(index), model.weights(index))
+  )
+  dictionary.addDataFields(
+new DataField(fields(index), OpType.CONTINUOUS, 
DataType.DOUBLE)
+  )
+})
--- End diff --

We can simplify this using `zipWithIndex` method for `fields`.


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2015-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948475#comment-14948475
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-146501909
  
The PMML model is quite extensive, and there isn't enough support in the ML 
library for utilizing most of the things [like FieldUsageType, DataTypes etc.]. 
I had actually written the import functions for both SVM and MLR but decided to 
drop them.
I mostly followed Spark's implementation for this, and it isn't supported 
there either.


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2015-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948447#comment-14948447
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1186#discussion_r41497880
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala
 ---
@@ -124,6 +121,52 @@ class MultipleLinearRegression extends 
Predictor[MultipleLinearRegression] {
 }
 
   }
+
+  override def toPMML(): PMML = {
+weightsOption match {
+  case None => {
+throw new RuntimeException("The MultipleLinearRegression has not 
been fitted to the " +
+  "data. This is necessary to learn the weight vector of the 
linear function.")
+  }
+  case Some(weights) => {
+val model = weights.collect().head
+val pmml = new PMML()
+pmml.setHeader(new Header().setDescription("Multiple Linear 
Regression"))
+
+// define the fields
+val target = FieldName.create("prediction")
+val fields = scala.Array.ofDim[FieldName](model.weights.size)
+Range(0, model.weights.size).foreach(index =>
+  fields(index) = FieldName.create("field_" + index)
+)
--- End diff --

We can make this more scalaesque:

```scala
val fields = (0 until model.weights.size).map(i => 
FieldName.create("field_" + i.toString))
```


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2015-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948455#comment-14948455
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1186#discussion_r41498464
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/classification/SVM.scala
 ---
@@ -228,6 +225,56 @@ class SVM extends Predictor[SVM] {
 parameters.add(OutputDecisionFunction, outputDecisionFunction)
 this
   }
+
+  override def toPMML(): PMML = {
+weightsOption match {
+  case None => {
+throw new RuntimeException("The SVM model has not been trained. 
Call first fit" +
+  " before calling the export operation.")
+  }
+  case Some(weights) => {
+val model = weights.collect().head
+val pmml = new PMML()
+pmml.setHeader(new Header().setDescription("Support Vector 
Machine"))
+
+// define the fields
+val target = FieldName.create("prediction")
+val fields = scala.Array.ofDim[FieldName](model.size)
+Range(0, model.size).foreach(index =>
+  fields(index) = FieldName.create("field_" + index)
+)
+
+// define the data dictionary, mining schema and model
+val dictionary = new DataDictionary()
+val miningSchema = new MiningSchema()
+val coefficients = new Coefficients()
+Range(0, model.size).foreach(index => {
+  miningSchema.addMiningFields(
+new 
MiningField(fields(index)).setUsageType(FieldUsageType.ACTIVE)
+  )
+  coefficients.addCoefficients(new 
Coefficient().setValue(model.apply(index)))
+  dictionary.addDataFields(
+new DataField(fields(index), OpType.CONTINUOUS, 
DataType.DOUBLE)
+  )
+})
--- End diff --

Please use `zipWithIndex` method.


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2015-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948453#comment-14948453
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1186#discussion_r41498327
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala
 ---
@@ -124,6 +121,52 @@ class MultipleLinearRegression extends 
Predictor[MultipleLinearRegression] {
 }
 
   }
+
+  override def toPMML(): PMML = {
+weightsOption match {
+  case None => {
+throw new RuntimeException("The MultipleLinearRegression has not 
been fitted to the " +
+  "data. This is necessary to learn the weight vector of the 
linear function.")
+  }
+  case Some(weights) => {
+val model = weights.collect().head
+val pmml = new PMML()
+pmml.setHeader(new Header().setDescription("Multiple Linear 
Regression"))
+
+// define the fields
+val target = FieldName.create("prediction")
+val fields = scala.Array.ofDim[FieldName](model.weights.size)
+Range(0, model.weights.size).foreach(index =>
+  fields(index) = FieldName.create("field_" + index)
+)
+
+// define the data dictionary, mining schema and regression table
+val dictionary = new DataDictionary()
+val miningSchema = new MiningSchema()
+val regressionTable = new 
RegressionTable().setIntercept(model.intercept)
+Range(0, model.weights.size).foreach(index => {
+  miningSchema.addMiningFields(
+new 
MiningField(fields(index)).setUsageType(FieldUsageType.ACTIVE)
+  )
+  regressionTable.addNumericPredictors(
+new NumericPredictor(fields(index), model.weights(index))
+  )
+  dictionary.addDataFields(
+new DataField(fields(index), OpType.CONTINUOUS, 
DataType.DOUBLE)
+  )
+})
+dictionary.addDataFields(new DataField(target, OpType.CONTINUOUS, 
DataType.DOUBLE))
+miningSchema.addMiningFields(new 
MiningField(target).setUsageType(FieldUsageType.PREDICTED))
+
+// define the model
+val pmmlModel = new RegressionModel()
+  .setFunctionName(MiningFunctionType.REGRESSION)
--- End diff --

Maybe we should add 
`.setModelType(RegressionModel.ModelType.LINEAR_REGRESSION)` after this line 
for future of other regression model.


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2015-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948461#comment-14948461
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-146493431
  
Hi @sachingoel0101, Thanks for opening pull request. Great start! I have 
some comments for your request.

1. Some implementation is not scalaesque.
2. Lack of importing PMML interface

I would prefer to cover only PMML interface (`toPMML`, `fromPMML` method) 
in this pull request. Covering implementation of the PMML interface in other 
issues would be better for me.


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2015-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948785#comment-14948785
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-146570848
  
Okay, We need some discussion in mailing list about ML model import/export 
feature. I think that PMML support is one of sub-issues related to the ML model 
import/export issue.

I'll post the discussion thread in few days.


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2015-09-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933133#comment-14933133
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

GitHub user sachingoel0101 opened a pull request:

https://github.com/apache/flink/pull/1186

[FLINK-1966][ml]Add support for Predictive Model Markup Language

1. Adds an interface to allow exporting of models to PMML format.
2. Implements export methods for the existing SVM and Regression algorithms.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachingoel0101/flink pmml

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1186.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1186


commit a71640edd83b6fd1085935496c1dd2553bd42caa
Author: Sachin Goel 
Date:   2015-09-27T13:04:17Z

[FLINK-1966][ml]Add support for Predictive Model Markup Language




> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)