[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-09 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17090 As for the API - I'm ok with having the "user-facing" version differ from the `transform` version. Though it may lead to some confusion. In this case, it's probably best to have `transform` only

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-09 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17090 Good point for copying some detail to JIRA, will do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 Thanks @MLnick for the explanation. This is what I'd understood from your similar description on the JIRA, but definitely more in-depth. (It might be good to copy to JIRA, or even a design doc

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-06 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17090 I commented further on the [JIRA](https://issues.apache.org/jira/browse/SPARK-14409?focusedCommentId=15898855=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15898855).

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 @MLnick OK I think I misunderstood some of your comments above then. I see the proposal in SPARK-14409 differs from this PR, so I agree it'd be nice to resolve it. We can make changes to this

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-06 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17090 @jkbradley I've put my updated proposal for ranking evaluation [on SPARK-14409

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 I'll merge this with master now Thanks @sueann and @MLnick for feedback. I'll prioritize helping with your work on transform, metrics, and tuning for ALS next. --- If your project is set

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 LGTM Any other comments before we merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73866/ Test PASSed. ---

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73866/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73866/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73787 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73787/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73787/ Test PASSed. ---

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73787 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73787/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 It's a good point about making an implicit decision. We could deprecate these methods in favor of transform-based ones in the future---we have done this in the past---but it does push the

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-01 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17090 It could also be I'm overthinking things - and we can mould the `RankingEvaluator` to accept both types of input - the array version: `(Array(predictions), Array(labels))` or the "exploded" version:

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-01 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17090 Isn't deciding on the output schema for these methods essentially the same as deciding on transform semantics in #12574 (apart from the issue of how, or if, to have transform generate the "ground

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73628/ Test PASSed. ---

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73628/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73624/ Test PASSed. ---

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73623/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73624/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73623/ Test PASSed. ---

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73628/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73624/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73623/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread sueann
Github user sueann commented on the issue: https://github.com/apache/spark/pull/17090 The output in https://github.com/apache/spark/pull/12574/ looks like a DataFrame with Row(srcCol: Int, "recommendations": Array[(Int, Float)]) so I think this PR as is matches the output type -

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 @MLnick Thanks *a lot* for the detailed tests! I really appreciate it. In this case, are you OK with the approach in the current PR (pending reviews)? One thing we should confirm is

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17090 Finally, I've done some work related to [SPARK-11968](https://issues.apache.org/jira/browse/SPARK-11968) and have a potential solution that seems to be pretty good. In this case it should be more

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17090 I should note that I've found the performance of "recommend all" to be very dependent on number of partitions since it controls the memory consumption per task (which can easily explode in the

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17090 The performance of #12574 is not better than the existing `mllib` recommend-all - since it wraps the functionality it's roughly on par. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17090 Fitting into the CV / evaluator is actually fairly straightforward. It's just that the semantics of `transform` for top-k recommendation must fit into whatever we decide on for `RankingEvaluator`,

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 @MLnick Thanks for showing those comparison numbers. If your implementation is faster, then I'm happy going with it. I do wonder if we might hit scalability issues with RDDs which we would not

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17090 @jkbradley do we propose to add further methods to support recommending for all users (or items) in an input DF? like `recommendForAllUsers(dataset: DataFrame, num: Int)`? --- If your project is

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17090 For performance tests, I've been using the MovieLens `ml-latest` dataset [here](https://grouplens.org/datasets/movielens/). It has `24,404,096` ratings with `259,137` users and `39,443` movies.

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 I'd been following the long discussions about a transform-based solution, but those had not seemed to have converged to a clear design. If you feel they have in your PR, then I'll spend some

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17090 #12574 is a comprehensive solution that also intends to support cross-validation as well as recommending for a subset (or any arbitrary set) of users/items. So it solves

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 @hhbyyh This is different from https://github.com/apache/spark/pull/12574 since it sidesteps the ongoing design discussions about input and output schema. Eventually, I'd like us to proceed

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17090 the same as https://github.com/apache/spark/pull/12574 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17090 cc @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73553/ Test PASSed. ---

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73553 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73553/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73553 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73553/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73543/ Test PASSed. ---

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73543 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73543/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73543/testReport)** for PR 17090 at commit

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17090 **[Test build #73540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73540/testReport)** for PR 17090 at commit