[GitHub] spark issue #18748: [SPARK-20679][ML] Support recommending for a subset of u...

MLnick Thu, 27 Jul 2017 02:00:48 -0700

Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/18748
  
    **Note 1** this implementation must perform a `distinct` on the input data 
frame id column to guarantee correct results, since otherwise multiple "copies" 
of the same recommendations would be generated for duplicate ids, and the 
resulting recommendations contain duplicates. This could alternatively be left 
to the user to handle, and assume that the input data frame contains no 
duplicates. But for now I've opted for the safest option even if it introduces 
this inefficiency.
    
    **Note 2** This does not support `coldStartStrategy`. Therefore no 
recommendations will be returned for ids in the input dataframe that are not 
contained in the model (this is analogous to `coldStartStrategy=drop` for 
`transform`). I believe this makes most sense, since supporting something like 
the `na` option would be a bit involved and not add that much value. However it 
could be done (but would need to return `null` rows in the `recommendation` 
column for these cases). Later, when other cold start strategies might be 
supported (e.g. average factor vectors), this method could return 
recommendations even for ids that are not contained in the model.
    
    cc @srowen @jkbradley @yanboliang @mpjlu @sethah



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18748: [SPARK-20679][ML] Support recommending for a subset of u...

Reply via email to