Re: Combining knowledge of users with ratings

Pat Ferrel Sat, 13 Sep 2014 18:47:56 -0700

We’ve developed the idea that multiple indicators can be blended at 
recommendation time.

Imagine a catalog of items. Each item has many indicators of similarity. One 
indicator might be similar items measured by users who preferred the item. This 
is classical cooccurrence/collaborative filtering indicators. However if you 
have a near realtime similarity engine (search engine) you can mix all sorts of 
indicators and boost each to your purpose. The user is the query—in terms of 
which indicators apply to the user, which items they prefer for the simple 
collaborative filtering case but indicators can be of many flavors.

For instance you could track the content of the articles a user reads. Then 
calculate (often) for each item which other items have similar content in your 
catalog. So a user’s articles read can be a content based indicator or 
preference and using the content based similar items you can recommend new 
articles based on content.

Back to the catalog. Imagine it has indicators for view (CF type indicator), 
category, “hotness”, and content (similar articles based on important terms). 
Create a user based on the items they have viewed, the category they are 
currently looking at (or prefer depending on your application). Now use this as 
a query to the similarity engine. Each field of the query maps to different 
indicators. The user’s views map to the CF indicators _and_ content indicator, 
perhaps with different boosts. The category maps to the category of the items. 
Perform the query and order by “hotness”. Or you can try putting the hotness in 
the query and boost that part of the query to favor hot articles.

There are also techniques to use user attributes or actions that may seem 
unrelated to the action you want to recommend (views in your case?). For 
instance if you want to recommend view but you also have “thumbs-down". You use 
something like spark-itemsimilarity to create cross-action indicators. Put them 
in your catalog in a separate field and when you create the user (aka 
similarity engine query) make sure to include their “ thumbs-down” history 
mapped to the share cross-inicator. This won’t always work if there is no 
correlation but it’s ok to include it because the tools you use from Mahout 
will generally discover non-corelation.

What you know about the user may be spotty or even empty. But the blended CF, 
metadata, and content recommender described above will be able to make 
recommendations even for new users and new content. In this degenerate case 
you’ll get hot articles in the category the user is viewing. When you know more 
about the user the more you will get personalized recommendations.

Currently we have spark-itemsimilarity and spark-rowsimilarity to create CF 
type indicators, cross-indicators, and content indicators. The metadata can be 
taken as-is for indicators in your catalog (categories, color, hotness, etc.) 

We’re starting to document and make these techniques easier to use. See Ted’s 
book here: https://www.mapr.com/practical-machine-learning and some related 
blog posts here: 
http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/

On Sep 13, 2014, at 4:25 PM, Peter Wolf <[email protected]> wrote:

Awesome!  Thank you very much Ted.  I'll try that

As I am just getting started with Mahout, can you recommend any good
example code that does something similar?

On Sat, Sep 13, 2014 at 3:38 PM, Ted Dunning <[email protected]> wrote:

> Rebuilding every day works very well in practice, but it captures a moving
> average, not a good estimate of the current popularity of items.
> 
> A simple hack is to implement a search based recommender and simply put an
> empirically scaled boost on items which are rising rapidly in popularity.
> Of course you should also have specialized pages that show popular items
> and another that shows rapidly rising items.
> 
> The simplest approach to marking rapidly rising items that I know is to use
> the log of recent plays over less plays, offsetting both counts in a manner
> similar to Laplace correction.  The philosophy behind the score is that for
> power law play counts, log play count is proportional to -log rank.  Then,
> the thought is something that rises from 2000-th rank to 1000-th rank is
> rising as significantly as something going from 100-th to 50-th.
> 
> 
> 
> 
> 
> 
> On Sat, Sep 13, 2014 at 11:25 AM, Peter Wolf <[email protected]> wrote:
> 
>> Thanks Dmitriy,
>> 
>> Is anyone working on an open source version of RLFM?
>> 
>> For the moment, I have few enough classes of users that I can just build
>> multiple recommenders.  For example, one for men and one for women.
>> 
>> What about adaptive on-line algorithms?  Just like Agarwal's Yahoo
> research
>> my items may rise and fall in popularity over time.  In fact, time may be
>> more important than user preferences in my application.
>> 
>> Do I just rebuild every day with a window of recent data, or does Mahout
>> have something better?
>> 
>> On Sat, Sep 13, 2014 at 12:26 PM, Dmitriy Lyubimov <[email protected]>
>> wrote:
>> 
>>> Afaik mahout doesnt have these algorthms. Agarwal's RLFM is one of the
>> more
>>> promising while sitll simple enough things to implement  at scale that
>> does
>>> that.
>>> On Sep 13, 2014 9:07 AM, "Peter Wolf" <[email protected]> wrote:
>>> 
>>>> Hello, I am new to Mahout but not ML in general
>>>> 
>>>> I want to create a Recommender that combines things I know about
> Users
>>> with
>>>> their Ratings.
>>>> 
>>>> For example, perhaps I know the sex, age and nationality of my users.
>>> I'd
>>>> like to use that information to improve the recommendations.
>>>> 
>>>> How is this information represented in the Mahout API?  I have not
> been
>>>> able to find any documentation or examples about this.
>>>> 
>>>> Thanks
>>>> Peter
>>>> 
>>> 
>> 
>

Re: Combining knowledge of users with ratings

Reply via email to