Thanks for the comments - all useful. Seems as always a bit of experimentation
is in order to try the view-vs-purchase filtering, vs heuristic post
reordering, vs potentially some metadata-based approach.
One of our challenges is we are indeed trying to generalise as much as
possible since we have a "recommender as a a service" type offering. So
catering to edge cases is indeed not the way to go. But potentially a
heuristic-style approach that can be somewhat learned from data/recommender
performance, vua split testing and offline testing, might be the way to go.
—
Sent from Mailbox for iPhone
On Thu, Sep 5, 2013 at 8:53 PM, Dmitriy Lyubimov <[email protected]>
wrote:
> FWIW our marketing people call it "cross-sell" and "upsell". i.e.
> selling stuff from different categories vs. offering more behaviorally
> similar items to currently browsed category optimized to speicifc
> target (revenue,sales event etc.) in either case, preexisting (or
> inferred from side data via clustering) labelling helps to discern
> between "upsell" and "cross-sell" scores.
> On Thu, Sep 5, 2013 at 11:22 AM, Dominik Hübner <[email protected]> wrote:
>>> As far as implementation is concerned, I think that it is very important to
>>> not distort the basic recommendation algorithm with business rules like
>>> this. It is much better to post-process the results to impose your will
>>> directly. One exception to this is that I think it is reasonable to use
>>> ordered cooccurrence and also repeated cooccurrence here for some hints
>>> here. This lets you determine likely accessories (purchased after the main
>>> item, mostly) and also find razor-blades (highly repetitive purchases).
>>> You still have the problem of flooding with similar items.
>>
>> +1 for keeping business rules out of your recommendations. I think
>> integrating too many edge cases will never generalize for all users and
>> debugging becomes nothing but a pain.
>>
>>> My approach in the past was to define heuristic definitions for "too
>>> similar" and do a pass over the sorted recommendation results giving each
>>> item that passes the too-similar criterion a penalty score. When done with
>>> this, I re-sort the results and the duplicative content falls to the bottom
>>> of the recommendations.
>>>
>>
>> I recently was working on some recommendations for a fashion brand.
>> Filtering too similar items was indeed crucial. I observed a common pattern
>> of users viewing products only varying in their color or other "minor"
>> features. I think it ultimately depends on the environment you are
>> displaying your recommendations. If you actually try to show related
>> products, those really similar items (like color variations) might not be
>> the worst thing. Building some sort of product mash-up probably should be
>> more diverse, just like Ted mentioned with flooding the first few pages. But
>> …. there they are again, those edge-cases I mentioned. Pre-sale
>> recommendations might be less diverse than after purchase recommendations. I
>> just depends on the domain you are working in I guess.
>>
>> On Sep 5, 2013, at 7:38 PM, Ted Dunning <[email protected]> wrote:
>>
>>> I think that Dominik's comments are exactly on target.
>>>
>>> As far as implementation is concerned, I think that it is very important to
>>> not distort the basic recommendation algorithm with business rules like
>>> this. It is much better to post-process the results to impose your will
>>> directly. One exception to this is that I think it is reasonable to use
>>> ordered cooccurrence and also repeated cooccurrence here for some hints
>>> here. This lets you determine likely accessories (purchased after the main
>>> item, mostly) and also find razor-blades (highly repetitive purchases).
>>> You still have the problem of flooding with similar items.
>>>
>>> The diversity that you are talking about is a critical quality in
>>> recommendation results. The basic intuition is that recommendation results
>>> are not individual recommendations, but are included in a portfolio of
>>> recommendations. You need the diversity in this portfolio because if you
>>> are wrong about an item, the likelihood of being wrong about very similar
>>> items is high. If you flood the first and second pages with these similar
>>> items, then you don't have room for the alternative items that might well
>>> be correct.
>>>
>>> My approach in the past was to define heuristic definitions for "too
>>> similar" and do a pass over the sorted recommendation results giving each
>>> item that passes the too-similar criterion a penalty score. When done with
>>> this, I re-sort the results and the duplicative content falls to the bottom
>>> of the recommendations.
>>>
>>>
>>>
>>> On Thu, Sep 5, 2013 at 1:15 AM, Dominik Hübner <[email protected]> wrote:
>>>
>>>> Just a quick a assumption, maybe I have not thought this through enough:
>>>>
>>>> 1. Users probably tend to compare products => similar VIEWS
>>>> 2. User as well might tend to PURCHASE accessory products, like the laptop
>>>> bag you mentioned
>>>>
>>>> May be you could filter out products that have a similarity computed from
>>>> the product views, but leave those similar, based on purchases, in your
>>>> recommendation set?
>>>>
>>>> Nevertheless, I guess this will be strongly depending on the domain the
>>>> data comes from.
>>>>
>>>>
>>>> On Sep 5, 2013, at 10:07 AM, Nick Pentreath <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi all
>>>>>
>>>>> Say I have a set of ecommerce data (views, purchases etc). I've built my
>>>>> model using implicit feedback ALS. Now, I want to add a little bit of
>>>>> "smart filtering".
>>>>>
>>>>> Filtering based on not recommending something that has been purchased is
>>>>> straightforward, but I'd like to also filter so as not to recommend
>>>> "highly
>>>>> similar" items to someone who has purchased an item.
>>>>>
>>>>> In other words, if someone has just purchased a laptop, then I'd like to
>>>>> not recommend other laptops. Ideally while still recommending "related"
>>>>> items such as laptop bags, mouse etc etc. (this is just an example).
>>>>>
>>>>> Now, I could filter based on metadata tags like "category", but assuming
>>>> I
>>>>> don't always have that data, then simplistically I have the option of
>>>>> filtering out products based on those that have high cosine similarity to
>>>>> the purchased products. However, this risks filtering out "good" similar
>>>>> products (like the laptop bags) as well as the "bad" similar products.
>>>>>
>>>>> I'm experimenting with building a second variant of the model that
>>>>> effectively downweights "views" to near zero, hence leaving something
>>>> sort
>>>>> of like a "purchased together" model variant. Then recommendations can be
>>>>> made using this model when a user purchases an item (or perhaps a
>>>> re-scorer
>>>>> that is a weighted variant of model A and model B but that tends to
>>>> weight
>>>>> model B - the purchased together model - higher)
>>>>>
>>>>> Are there other mechanisms to tweak the ALS model such that it tends
>>>>> towards recommending "related products" (but not "highly similar of the
>>>>> exact same narrow product type")?
>>>>>
>>>>> Any other ideas about how best to go about this?
>>>>>
>>>>> Many thanks
>>>>> Nick
>>>>
>>>>
>>