Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17090
Thanks @MLnick for the explanation. This is what I'd understood from your
similar description on the JIRA, but definitely more in-depth. (It might be
good to copy to JIRA, or even a design doc at this point.)
As I'd said, I haven't done a literature review on this, so it's hard for
me to judge what schema evaluators should be given. I see some implicit
decisions, such as evaluators using implicit ratings (using rows missing either
a label or a prediction) and us not computing predictions for all (user,item)
pairs with labels.
However, assuming the schema you've selected is best for evaluation, then I
think this highlights 2 distinct needs for top K: (a) a user-friendly API (this
PR) and (b) an evaluator-friendly API (your design). For (a), many users have
requested recommendForAll* methods matching the RDD-based equivalents, and this
schema provides top K recommendations in an analogous and friendly schema. If
evaluator needs a less user-friendly schema, that's OK, but then I think it
should be considered an internal/dev schema which can differ from the
user-friendly version.
What do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]