GitHub user MLnick opened a pull request:
https://github.com/apache/spark/pull/18748
[SPARK-20679][ML] Support recommending for a subset of users/items in
ALSModel
This PR adds methods `recommendForUserSubset` and `recommendForItemSubset`
to `ALSModel`. These allow recommending for a specified set of user / item ids
rather than for every user / item (as in the `recommendForAllX` methods).
The subset methods take a `DataFrame` as input, containing ids in the
column specified by the param `userCol` or `itemCol`. The model will generate
recommendations for each _unique_ id in this input dataframe.
## How was this patch tested?
New unit tests in `ALSSuite`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MLnick/spark als-recommend-df
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18748.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18748
----
commit 860bc2ce5f290a042756d2569eb215eee6a1fdad
Author: Nick Pentreath <[email protected]>
Date: 2017-03-16T07:07:06Z
wip
commit 8cd9edd5e2440da15f677828cda5207e6a40be31
Author: Nick Pentreath <[email protected]>
Date: 2017-05-04T07:40:22Z
further wip
commit 76fb332aa5e8483590ebb4305901f5c3e5c73c15
Author: Nick Pentreath <[email protected]>
Date: 2017-05-09T09:19:38Z
Update doc
commit 6539d294c5dac499d106f7346f496dac8fee24e8
Author: Nick Pentreath <[email protected]>
Date: 2017-05-09T09:20:55Z
Update doc
commit c723dff8a9f125ce4d69574f47c74aaf0df7a9da
Author: Nick Pentreath <[email protected]>
Date: 2017-05-09T09:23:20Z
Update doc
commit 0004d1c9ea5074965d234fa7833450de3ffa871b
Author: Nick Pentreath <[email protected]>
Date: 2017-05-10T09:15:35Z
wip on tests
commit 53229a1abc860aa8fb3c0d933fdbcef4d47f0508
Author: Nick Pentreath <[email protected]>
Date: 2017-05-12T10:42:23Z
Clean up docs and further tests
commit 5a8c4216ce636dea3ba67baa9b169db7486f37f2
Author: Nick Pentreath <[email protected]>
Date: 2017-07-27T08:28:11Z
Explicitly handle duplicate ids with distinct. Update tests
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]