Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17090
As for the API - I'm ok with having the "user-facing" version differ from
the `transform` version. Though it may lead to some confusion. In this case,
it's probably best to have `transform` only
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17090
Good point for copying some detail to JIRA, will do that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17090
Thanks @MLnick for the explanation. This is what I'd understood from your
similar description on the JIRA, but definitely more in-depth. (It might be
good to copy to JIRA, or even a design doc
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17090
I commented further on the
[JIRA](https://issues.apache.org/jira/browse/SPARK-14409?focusedCommentId=15898855=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15898855).
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17090
@MLnick OK I think I misunderstood some of your comments above then. I
see the proposal in SPARK-14409 differs from this PR, so I agree it'd be nice
to resolve it. We can make changes to this
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17090
@jkbradley I've put my updated proposal for ranking evaluation [on
SPARK-14409
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17090
I'll merge this with master now
Thanks @sueann and @MLnick for feedback. I'll prioritize helping with
your work on transform, metrics, and tuning for ALS next.
---
If your project is set
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17090
LGTM
Any other comments before we merge?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73866/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73866 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73866/testReport)**
for PR 17090 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73866 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73866/testReport)**
for PR 17090 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73787 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73787/testReport)**
for PR 17090 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73787/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73787 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73787/testReport)**
for PR 17090 at commit
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17090
It's a good point about making an implicit decision. We could deprecate
these methods in favor of transform-based ones in the future---we have done
this in the past---but it does push the
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17090
It could also be I'm overthinking things - and we can mould the
`RankingEvaluator` to accept both types of input - the array version:
`(Array(predictions), Array(labels))` or the "exploded" version:
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17090
Isn't deciding on the output schema for these methods essentially the same
as deciding on transform semantics in #12574 (apart from the issue of how, or
if, to have transform generate the "ground
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73628/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73628 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73628/testReport)**
for PR 17090 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73624/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73623 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73623/testReport)**
for PR 17090 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73624 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73624/testReport)**
for PR 17090 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73623/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73628 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73628/testReport)**
for PR 17090 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73624 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73624/testReport)**
for PR 17090 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73623 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73623/testReport)**
for PR 17090 at commit
Github user sueann commented on the issue:
https://github.com/apache/spark/pull/17090
The output in https://github.com/apache/spark/pull/12574/ looks like a
DataFrame with Row(srcCol: Int, "recommendations": Array[(Int, Float)]) so I
think this PR as is matches the output type -
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17090
@MLnick Thanks *a lot* for the detailed tests! I really appreciate it. In
this case, are you OK with the approach in the current PR (pending reviews)?
One thing we should confirm is
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17090
Finally, I've done some work related to
[SPARK-11968](https://issues.apache.org/jira/browse/SPARK-11968) and have a
potential solution that seems to be pretty good. In this case it should be more
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17090
I should note that I've found the performance of "recommend all" to be very
dependent on number of partitions since it controls the memory consumption per
task (which can easily explode in the
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17090
The performance of #12574 is not better than the existing `mllib`
recommend-all - since it wraps the functionality it's roughly on par.
---
If your project is set up for it, you can reply to this
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17090
Fitting into the CV / evaluator is actually fairly straightforward. It's
just that the semantics of `transform` for top-k recommendation must fit into
whatever we decide on for `RankingEvaluator`,
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17090
@MLnick Thanks for showing those comparison numbers. If your
implementation is faster, then I'm happy going with it. I do wonder if we
might hit scalability issues with RDDs which we would not
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17090
@jkbradley do we propose to add further methods to support recommending for
all users (or items) in an input DF? like `recommendForAllUsers(dataset:
DataFrame, num: Int)`?
---
If your project is
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17090
For performance tests, I've been using the MovieLens `ml-latest` dataset
[here](https://grouplens.org/datasets/movielens/). It has `24,404,096` ratings
with `259,137` users and `39,443` movies.
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17090
I'd been following the long discussions about a transform-based solution,
but those had not seemed to have converged to a clear design. If you feel they
have in your PR, then I'll spend some
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17090
#12574 is a comprehensive solution that also intends to support
cross-validation as well as recommending for a subset (or any arbitrary set) of
users/items. So it solves
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17090
@hhbyyh This is different from https://github.com/apache/spark/pull/12574
since it sidesteps the ongoing design discussions about input and output
schema. Eventually, I'd like us to proceed
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17090
the same as https://github.com/apache/spark/pull/12574 ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17090
cc @MLnick
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73553/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73553 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73553/testReport)**
for PR 17090 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73553 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73553/testReport)**
for PR 17090 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17090
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73543/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73543 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73543/testReport)**
for PR 17090 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73543 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73543/testReport)**
for PR 17090 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17090
**[Test build #73540 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73540/testReport)**
for PR 17090 at commit
54 matches
Mail list logo