Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17090
  
    Thanks @MLnick for the explanation.  This is what I'd understood from your 
similar description on the JIRA, but definitely more in-depth.  (It might be 
good to copy to JIRA, or even a design doc at this point.)
    
    As I'd said, I haven't done a literature review on this, so it's hard for 
me to judge what schema evaluators should be given.  I see some implicit 
decisions, such as evaluators using implicit ratings (using rows missing either 
a label or a prediction) and us not computing predictions for all (user,item) 
pairs with labels.
    
    However, assuming the schema you've selected is best for evaluation, then I 
think this highlights 2 distinct needs for top K: (a) a user-friendly API (this 
PR) and (b) an evaluator-friendly API (your design).  For (a), many users have 
requested recommendForAll* methods matching the RDD-based equivalents, and this 
schema provides top K recommendations in an analogous and friendly schema.  If 
evaluator needs a less user-friendly schema, that's OK, but then I think it 
should be considered an internal/dev schema which can differ from the 
user-friendly version.
    
    What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to