We’ve developed the idea that multiple indicators can be blended at 
recommendation time. 

Imagine a catalog of items. Each item has many indicators of similarity. One 
indicator might be similar items measured by users who preferred the item. 
These are the classical cooccurrence/collaborative filtering indicators. 
However if you have a near realtime similarity engine (search engine) you can 
mix all sorts of indicators and boost each to your purpose. The user is the 
query—in terms of which indicators apply to the user, which items they prefer 
for the simple collaborative filtering case but indicators can be of many 
flavors.

For instance you could use the content of the articles a user reads. Calculate 
(often) for each item which other items have similar content and add those as a 
new indicator field in your catalog. So a user’s articles viewed can be a 
content based indicator or preference and using the content based similar items 
you can recommend new articles based on content alone.

Let’s create the catalog with indicators for view (CF type indicator, similar 
articles by who viewed them), category, “hotness”, and content (similar 
articles by content). The user has indicators like the items they have viewed, 
the category they are currently looking at (or prefer depending on your 
application). Now use this as a query to the similarity engine. Each field of 
the query maps to one or more indicators. The user’s views map to the CF 
indicators _and_ content indicators, perhaps with different boosts. The 
category maps to the category field of the articles. Perform the query and 
order by “hotness”. Or put hotness  in the query and boost that part of the 
query to favor hot articles.

There are also techniques to use user attributes or actions that may seem 
unrelated to the action you want to recommend (views in your case?). For 
instance if you want to recommend view but you also have “thumbs-down". You use 
something like spark-itemsimilarity to create cross-action indicators. Put them 
in your catalog in a separate field and when you create the user (aka 
similarity engine query) make sure to include their “ thumbs-down” history 
mapped to the thumbs-down cross-inicator. This won’t always work if there is no 
correlation but it’s ok to include it because the tools you use from Mahout 
will generally discover non-corelation.

What you know about the user may be spotty or even empty. But the blended CF, 
metadata, and content recommender described above will be able to make 
recommendations even for new users and new content. In this degenerate case 
you’ll get hot articles in the category the user is viewing. When you know more 
about the user the more you will get personalized recommendations. Boosting the 
importance of CF indicators over content indicators means you are favoring CF 
but this is not a filter so depending on how large the boost some content based 
recs may get in if they are strong enough. Flip this to favor content over CF.

Currently we have spark-itemsimilarity and spark-rowsimilarity to create CF 
type indicators, cross-indicators, and content indicators. The metadata can be 
taken as-is for indicators in your catalog (categories, color, tags, hotness, 
etc.) 

We’re starting to document and make these techniques easier to use. See Ted’s 
book here: https://www.mapr.com/practical-machine-learning and some related 
blog posts here: 
http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/

There are a lot of moving parts so I’d take each of the things you want to 
affect the recs and think about how to create an indicator with them. 

We also have a way to create a multi-armed bandit affect but I’ve already 
blathered on for long enough.

On Sep 13, 2014, at 4:25 PM, Peter Wolf <[email protected]> wrote:

Awesome!  Thank you very much Ted.  I'll try that

As I am just getting started with Mahout, can you recommend any good
example code that does something similar?

On Sat, Sep 13, 2014 at 3:38 PM, Ted Dunning <[email protected]> wrote:

> Rebuilding every day works very well in practice, but it captures a moving
> average, not a good estimate of the current popularity of items.
> 
> A simple hack is to implement a search based recommender and simply put an
> empirically scaled boost on items which are rising rapidly in popularity.
> Of course you should also have specialized pages that show popular items
> and another that shows rapidly rising items.
> 
> The simplest approach to marking rapidly rising items that I know is to use
> the log of recent plays over less plays, offsetting both counts in a manner
> similar to Laplace correction.  The philosophy behind the score is that for
> power law play counts, log play count is proportional to -log rank.  Then,
> the thought is something that rises from 2000-th rank to 1000-th rank is
> rising as significantly as something going from 100-th to 50-th.
> 
> 
> 
> 
> 
> 
> On Sat, Sep 13, 2014 at 11:25 AM, Peter Wolf <[email protected]> wrote:
> 
>> Thanks Dmitriy,
>> 
>> Is anyone working on an open source version of RLFM?
>> 
>> For the moment, I have few enough classes of users that I can just build
>> multiple recommenders.  For example, one for men and one for women.
>> 
>> What about adaptive on-line algorithms?  Just like Agarwal's Yahoo
> research
>> my items may rise and fall in popularity over time.  In fact, time may be
>> more important than user preferences in my application.
>> 
>> Do I just rebuild every day with a window of recent data, or does Mahout
>> have something better?
>> 
>> On Sat, Sep 13, 2014 at 12:26 PM, Dmitriy Lyubimov <[email protected]>
>> wrote:
>> 
>>> Afaik mahout doesnt have these algorthms. Agarwal's RLFM is one of the
>> more
>>> promising while sitll simple enough things to implement  at scale that
>> does
>>> that.
>>> On Sep 13, 2014 9:07 AM, "Peter Wolf" <[email protected]> wrote:
>>> 
>>>> Hello, I am new to Mahout but not ML in general
>>>> 
>>>> I want to create a Recommender that combines things I know about
> Users
>>> with
>>>> their Ratings.
>>>> 
>>>> For example, perhaps I know the sex, age and nationality of my users.
>>> I'd
>>>> like to use that information to improve the recommendations.
>>>> 
>>>> How is this information represented in the Mahout API?  I have not
> been
>>>> able to find any documentation or examples about this.
>>>> 
>>>> Thanks
>>>> Peter
>>>> 
>>> 
>> 
> 

Reply via email to