[jira] [Commented] (SPARK-5598) Model import/export for ALS

2015-02-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308836#comment-14308836
 ] 

Apache Spark commented on SPARK-5598:
-

User 'mengxr' has created a pull request for this issue:
https://github.com/apache/spark/pull/4422

 Model import/export for ALS
 ---

 Key: SPARK-5598
 URL: https://issues.apache.org/jira/browse/SPARK-5598
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Assignee: Xiangrui Meng

 Please see parent JIRA for details on model import/export plans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5598) Model import/export for ALS

2015-02-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308692#comment-14308692
 ] 

Sean Owen commented on SPARK-5598:
--

[~mengxr] No, no other tool could usefully read such a PMML file. The only 
argument for it would be consistency: you probably need *some* file to hold 
some metadata about the model, so, you could just use PMML rather than also 
invent another format for that too. 

The actual data can't feasibly be serialized in PMML since it would be far too 
large as XML. I'm not suggesting that text-based serialization of the vectors 
should be used; I was pointing more to the PMML container idea.

Yes, if this only concerns data that will only be written/read by Spark, and is 
not intended for export, there isn't any value at all in PMML. I thought this 
might be covering model export, meaning, for some kind of external consumption. 
In that case, there's no good answer, but at least reusing PMML for the 
container could have small value.

 Model import/export for ALS
 ---

 Key: SPARK-5598
 URL: https://issues.apache.org/jira/browse/SPARK-5598
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Assignee: Xiangrui Meng

 Please see parent JIRA for details on model import/export plans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5598) Model import/export for ALS

2015-02-05 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308433#comment-14308433
 ] 

Xiangrui Meng commented on SPARK-5598:
--

[~srowen] Do you expect the produced PMML file to be consumed anywhere rather 
than calling `pmmlToMfModel` in Spark? PMML files are useful because they are 
portable. If the ALS model file can be only loaded in Spark, I don't see the 
advantage of using PMML, which doesn't provide any additional benefit over XML 
or JSON in this case. Also, I saw a text format is used to store features, 
which is a customized format that we need to maintain over time. I'm not 
against supporting PMML, but I don't see its value in large distributed models.

 Model import/export for ALS
 ---

 Key: SPARK-5598
 URL: https://issues.apache.org/jira/browse/SPARK-5598
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Assignee: Xiangrui Meng

 Please see parent JIRA for details on model import/export plans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5598) Model import/export for ALS

2015-02-04 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305978#comment-14305978
 ] 

Sean Owen commented on SPARK-5598:
--

For what it's worth, I completely made up a PMML-based serialization for ALS 
that involves recording the location of some serialization of the RDDs in a 
number of Extension elements. Literally just pointers. To the extent this is 
about any rational PMML serialization for ALS, well, there's a data point. 
https://github.com/OryxProject/oryx/blob/master/oryx-app-mllib/src/main/java/com/cloudera/oryx/app/mllib/als/ALSUpdate.java#L319

 Model import/export for ALS
 ---

 Key: SPARK-5598
 URL: https://issues.apache.org/jira/browse/SPARK-5598
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Assignee: Xiangrui Meng

 Please see parent JIRA for details on model import/export plans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org