We’ve developed the idea that multiple indicators can be blended at recommendation time.
Imagine a catalog of items. Each item has many indicators of similarity. One indicator might be similar items measured by users who preferred the item. This is classical cooccurrence/collaborative filtering indicators. However if you have a near realtime similarity engine (search engine) you can mix all sorts of indicators and boost each to your purpose. The user is the query—in terms of which indicators apply to the user, which items they prefer for the simple collaborative filtering case but indicators can be of many flavors. For instance you could track the content of the articles a user reads. Then calculate (often) for each item which other items have similar content in your catalog. So a user’s articles read can be a content based indicator or preference and using the content based similar items you can recommend new articles based on content. Back to the catalog. Imagine it has indicators for view (CF type indicator), category, “hotness”, and content (similar articles based on important terms). Create a user based on the items they have viewed, the category they are currently looking at (or prefer depending on your application). Now use this as a query to the similarity engine. Each field of the query maps to different indicators. The user’s views map to the CF indicators _and_ content indicator, perhaps with different boosts. The category maps to the category of the items. Perform the query and order by “hotness”. Or you can try putting the hotness in the query and boost that part of the query to favor hot articles. There are also techniques to use user attributes or actions that may seem unrelated to the action you want to recommend (views in your case?). For instance if you want to recommend view but you also have “thumbs-down". You use something like spark-itemsimilarity to create cross-action indicators. Put them in your catalog in a separate field and when you create the user (aka similarity engine query) make sure to include their “ thumbs-down” history mapped to the share cross-inicator. This won’t always work if there is no correlation but it’s ok to include it because the tools you use from Mahout will generally discover non-corelation. What you know about the user may be spotty or even empty. But the blended CF, metadata, and content recommender described above will be able to make recommendations even for new users and new content. In this degenerate case you’ll get hot articles in the category the user is viewing. When you know more about the user the more you will get personalized recommendations. Currently we have spark-itemsimilarity and spark-rowsimilarity to create CF type indicators, cross-indicators, and content indicators. The metadata can be taken as-is for indicators in your catalog (categories, color, hotness, etc.) We’re starting to document and make these techniques easier to use. See Ted’s book here: https://www.mapr.com/practical-machine-learning and some related blog posts here: http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/ On Sep 13, 2014, at 4:25 PM, Peter Wolf <[email protected]> wrote: Awesome! Thank you very much Ted. I'll try that As I am just getting started with Mahout, can you recommend any good example code that does something similar? On Sat, Sep 13, 2014 at 3:38 PM, Ted Dunning <[email protected]> wrote: > Rebuilding every day works very well in practice, but it captures a moving > average, not a good estimate of the current popularity of items. > > A simple hack is to implement a search based recommender and simply put an > empirically scaled boost on items which are rising rapidly in popularity. > Of course you should also have specialized pages that show popular items > and another that shows rapidly rising items. > > The simplest approach to marking rapidly rising items that I know is to use > the log of recent plays over less plays, offsetting both counts in a manner > similar to Laplace correction. The philosophy behind the score is that for > power law play counts, log play count is proportional to -log rank. Then, > the thought is something that rises from 2000-th rank to 1000-th rank is > rising as significantly as something going from 100-th to 50-th. > > > > > > > On Sat, Sep 13, 2014 at 11:25 AM, Peter Wolf <[email protected]> wrote: > >> Thanks Dmitriy, >> >> Is anyone working on an open source version of RLFM? >> >> For the moment, I have few enough classes of users that I can just build >> multiple recommenders. For example, one for men and one for women. >> >> What about adaptive on-line algorithms? Just like Agarwal's Yahoo > research >> my items may rise and fall in popularity over time. In fact, time may be >> more important than user preferences in my application. >> >> Do I just rebuild every day with a window of recent data, or does Mahout >> have something better? >> >> On Sat, Sep 13, 2014 at 12:26 PM, Dmitriy Lyubimov <[email protected]> >> wrote: >> >>> Afaik mahout doesnt have these algorthms. Agarwal's RLFM is one of the >> more >>> promising while sitll simple enough things to implement at scale that >> does >>> that. >>> On Sep 13, 2014 9:07 AM, "Peter Wolf" <[email protected]> wrote: >>> >>>> Hello, I am new to Mahout but not ML in general >>>> >>>> I want to create a Recommender that combines things I know about > Users >>> with >>>> their Ratings. >>>> >>>> For example, perhaps I know the sex, age and nationality of my users. >>> I'd >>>> like to use that information to improve the recommendations. >>>> >>>> How is this information represented in the Mahout API? I have not > been >>>> able to find any documentation or examples about this. >>>> >>>> Thanks >>>> Peter >>>> >>> >> >
