Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1687#issuecomment-50775834
Thanks @mengxr I agree with all of that and will update the PR. `Rating` is
a good solution; there's a redundant field but very few objects are returned
anyway. Sorry I'm being dense but which RDD should be set to `MEMORY_AND_DISK`?
the `scored` RDD in my PR? and how would you set partitions?
Yes if there were a topByKey it would be natural to expose a small batch
recommend feature here. There are other possible operations here like
`mostSimilar` but we can leave that for another PR after discussing what the
metric should be -- cosine similarity? etc.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---