[ https://issues.apache.org/jira/browse/SPARK-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045874#comment-14045874 ]
Sean Owen edited comment on SPARK-2293 at 6/27/14 8:53 PM: ----------------------------------------------------------- I can make a PR for the example changes since I was already looking at this, unless you've already got it done. As for a new method -- kind of a toss-up between the small added convenience and adding another method to the API. For my part I found it clear to just write a map call. https://github.com/apache/spark/pull/1250 was (Author: srowen): I can make a PR for the example changes since I was already looking at this, unless you've already got it done. As for a new method -- kind of a toss-up between the small added convenience and adding another method to the API. For my part I found it clear to just write a map call. > Replace RDD.zip usage by map with predict inside. > ------------------------------------------------- > > Key: SPARK-2293 > URL: https://issues.apache.org/jira/browse/SPARK-2293 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Xiangrui Meng > Priority: Minor > > In our guide, we use > {code} > val prediction = model.predict(test.map(_.features)) > val predictionAndLabel = prediction.zip(test.map(_.label)) > {code} > This is not efficient because test will be computed twice. We should change > it to > {code} > val predictionAndLabel = test.map(p => (model.predict(p.features), p.label)) > {code} > It is also nice to add a `predictWith` method to predictive models. > {code} > def predictWith[V](RDD[(Vector, V)]): RDD[(Double, V)] > {code} > But I'm not sure whether this is a good name. `predictWithValue`? -- This message was sent by Atlassian JIRA (v6.2#6252)