Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/18748
**Note 1** this implementation must perform a `distinct` on the input data
frame id column to guarantee correct results, since otherwise multiple "copies"
of the same recommendations would be generated for duplicate ids, and the
resulting recommendations contain duplicates. This could alternatively be left
to the user to handle, and assume that the input data frame contains no
duplicates. But for now I've opted for the safest option even if it introduces
this inefficiency.
**Note 2** This does not support `coldStartStrategy`. Therefore no
recommendations will be returned for ids in the input dataframe that are not
contained in the model (this is analogous to `coldStartStrategy=drop` for
`transform`). I believe this makes most sense, since supporting something like
the `na` option would be a bit involved and not add that much value. However it
could be done (but would need to return `null` rows in the `recommendation`
column for these cases). Later, when other cold start strategies might be
supported (e.g. average factor vectors), this method could return
recommendations even for ids that are not contained in the model.
cc @srowen @jkbradley @yanboliang @mpjlu @sethah
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]