I've been looking at examples of recommenders with an eye to reverse 
engineering what's good and bad. Hard to say with any certainty, of course.

Netflix: has a bunch of different recommendation lists, some personalized, some 
based on different forms of popularity or item similarity. The one consistent 
thing is a rich set of categories that they derive algorithmically (this I 
heard from a Netflix preso). The focus of recs based on appropriate categories 
makes the recs seem much more relevant. When they are spread across all genres, 
as they currently are in my demo, they seem somewhat random. Similarity of 
taste often crosses genre's so recs can too, if no metadata is taken into 
account. Netflix is experimenting with a from of online "pick a few--get recs" 
with the Max project. Don't have a Sony PS so haven't actually used it. 

Amazon:  They also break up recs by category--using their catalog for 
categories. This works well for my case where my wife uses my Amazon Prime 
account to buy things that I wouldn't. Separating by category in a practical 
sense means they make recs for her and me separately. They do a good job of 
differentiating between modes of taste. They may be separating preference data 
by category (though I suspect not) or they may be filtering recs by category. 
If you put items into your wishlist you will see instant recs based on those. 
Not sure if these are personalized or something involving merging similar item 
lists from the things in the wish-list. The former would have to be online, the 
later mostly pre-calculated. The use of a wishlist for recs is very effective 
for shopping assistance and most closely matches what would be nice to do in 
the demo's session-based browsing recommender.

These cases make strong use of predicting the categories you will be most 
interested in, which gets back to what you were saying about interacting with 
metadata. They are in effect recommending categories, then recommending items 
within the categories. Since the categories seem human understandable I wonder 
if they have much to do with clustering or factor extraction. The Netflix 
algorithmically derived categories seem to incorporate sentiment analysis 
because they are often made of words like Witty, Gritty, etc. These aren't in 
their category lists.

Another thing that they both use is context, which they seem to use as a proxy 
for intent. The fact that I'm looking at a comedy often means the item-item 
similarities or behavior-based recs are skewed toward comedy. Hard to say how 
much they are using context in other places.

Other recommenders that don't use categories and context seem weak and random 
by comparison--at least when using my inscrutable eyeball test.


On Sep 7, 2013, at 3:07 PM, Ted Dunning <[email protected]> wrote:

On Sat, Sep 7, 2013 at 2:35 PM, Pat Ferrel <[email protected]> wrote:

> ...
>> 
>> Clustering can be done by doing SVD or ALS on the user x thing matrix
> first
>> or by directly clustering the columns of the user x thing matrix after
> some
>> kind of IDF weighting.  I think that only the streaming k-means currently
>> does well on sparse vectors.
>> 
> 
> Was thinking about filtering out all but the top x% of items to get things
> the user is likely to have heard about if not seen. Do this before any
> factorizing or clustering.
> 

Hmm...

My reflex would be to trim *after* clustering so that clustering has the
benefit of the long-tail.


> ...>
>> For #2, I think that this is a great example of multi-modal
>> recommendations.  You have browsing behavior and your tomatoes-reviews
>> behavior.  Combining that allows you to recommend for people who have
> only
>> one kind of behavior.  Of course, our viewing behavior will be very
> sparse
>> to start.
> 
> Yes, that's why I'm not convinced it will be useful but an interesting
> experiment now that we have the online Solr recommender. Soon we'll have
> category and description metadata from the crawler. We can experiment with
> things like category boosting if a category trend emerges during the
> browsing session and I suspect it often does--maybe release date etc. The
> ease of mixing metadata with behavior is another thing worth experimenting
> with.
> 

Cool.

And remember meta-data becomes behavior when you interact with an item
since you have just interacted with the meta-data as well.

Btw... I am spinning up a team internally and a team at a partner site to
help with the Mahout demo.  I am trying to generate realistic music
consumption data this weekend as well.

Reply via email to