[jira] [Commented] (SPARK-5598) Model import/export for ALS
[ https://issues.apache.org/jira/browse/SPARK-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308836#comment-14308836 ] Apache Spark commented on SPARK-5598: - User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/4422 Model import/export for ALS --- Key: SPARK-5598 URL: https://issues.apache.org/jira/browse/SPARK-5598 Project: Spark Issue Type: Sub-task Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Assignee: Xiangrui Meng Please see parent JIRA for details on model import/export plans. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5598) Model import/export for ALS
[ https://issues.apache.org/jira/browse/SPARK-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308692#comment-14308692 ] Sean Owen commented on SPARK-5598: -- [~mengxr] No, no other tool could usefully read such a PMML file. The only argument for it would be consistency: you probably need *some* file to hold some metadata about the model, so, you could just use PMML rather than also invent another format for that too. The actual data can't feasibly be serialized in PMML since it would be far too large as XML. I'm not suggesting that text-based serialization of the vectors should be used; I was pointing more to the PMML container idea. Yes, if this only concerns data that will only be written/read by Spark, and is not intended for export, there isn't any value at all in PMML. I thought this might be covering model export, meaning, for some kind of external consumption. In that case, there's no good answer, but at least reusing PMML for the container could have small value. Model import/export for ALS --- Key: SPARK-5598 URL: https://issues.apache.org/jira/browse/SPARK-5598 Project: Spark Issue Type: Sub-task Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Assignee: Xiangrui Meng Please see parent JIRA for details on model import/export plans. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5598) Model import/export for ALS
[ https://issues.apache.org/jira/browse/SPARK-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308433#comment-14308433 ] Xiangrui Meng commented on SPARK-5598: -- [~srowen] Do you expect the produced PMML file to be consumed anywhere rather than calling `pmmlToMfModel` in Spark? PMML files are useful because they are portable. If the ALS model file can be only loaded in Spark, I don't see the advantage of using PMML, which doesn't provide any additional benefit over XML or JSON in this case. Also, I saw a text format is used to store features, which is a customized format that we need to maintain over time. I'm not against supporting PMML, but I don't see its value in large distributed models. Model import/export for ALS --- Key: SPARK-5598 URL: https://issues.apache.org/jira/browse/SPARK-5598 Project: Spark Issue Type: Sub-task Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Assignee: Xiangrui Meng Please see parent JIRA for details on model import/export plans. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5598) Model import/export for ALS
[ https://issues.apache.org/jira/browse/SPARK-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305978#comment-14305978 ] Sean Owen commented on SPARK-5598: -- For what it's worth, I completely made up a PMML-based serialization for ALS that involves recording the location of some serialization of the RDDs in a number of Extension elements. Literally just pointers. To the extent this is about any rational PMML serialization for ALS, well, there's a data point. https://github.com/OryxProject/oryx/blob/master/oryx-app-mllib/src/main/java/com/cloudera/oryx/app/mllib/als/ALSUpdate.java#L319 Model import/export for ALS --- Key: SPARK-5598 URL: https://issues.apache.org/jira/browse/SPARK-5598 Project: Spark Issue Type: Sub-task Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Assignee: Xiangrui Meng Please see parent JIRA for details on model import/export plans. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org