Wow! What a great community :-D Working through all this will take me a little while. Thank you and I'll probably have more questions once I'm up to speed.
Thanks again Peter On Sun, Sep 14, 2014 at 11:22 AM, Pat Ferrel <[email protected]> wrote: > We’ve developed the idea that multiple indicators can be blended at > recommendation time. > > Imagine a catalog of items. Each item has many indicators of similarity. > One indicator might be similar items measured by users who preferred the > item. These are the classical cooccurrence/collaborative filtering > indicators. However if you have a near realtime similarity engine (search > engine) you can mix all sorts of indicators and boost each to your purpose. > The user is the query—in terms of which indicators apply to the user, which > items they prefer for the simple collaborative filtering case but > indicators can be of many flavors. > > For instance you could use the content of the articles a user reads. > Calculate (often) for each item which other items have similar content and > add those as a new indicator field in your catalog. So a user’s articles > viewed can be a content based indicator or preference and using the content > based similar items you can recommend new articles based on content alone. > > Let’s create the catalog with indicators for view (CF type indicator, > similar articles by who viewed them), category, “hotness”, and content > (similar articles by content). The user has indicators like the items they > have viewed, the category they are currently looking at (or prefer > depending on your application). Now use this as a query to the similarity > engine. Each field of the query maps to one or more indicators. The user’s > views map to the CF indicators _and_ content indicators, perhaps with > different boosts. The category maps to the category field of the articles. > Perform the query and order by “hotness”. Or put hotness in the query and > boost that part of the query to favor hot articles. > > There are also techniques to use user attributes or actions that may seem > unrelated to the action you want to recommend (views in your case?). For > instance if you want to recommend view but you also have “thumbs-down". You > use something like spark-itemsimilarity to create cross-action indicators. > Put them in your catalog in a separate field and when you create the user > (aka similarity engine query) make sure to include their “ thumbs-down” > history mapped to the thumbs-down cross-inicator. This won’t always work if > there is no correlation but it’s ok to include it because the tools you use > from Mahout will generally discover non-corelation. > > What you know about the user may be spotty or even empty. But the blended > CF, metadata, and content recommender described above will be able to make > recommendations even for new users and new content. In this degenerate case > you’ll get hot articles in the category the user is viewing. When you know > more about the user the more you will get personalized recommendations. > Boosting the importance of CF indicators over content indicators means you > are favoring CF but this is not a filter so depending on how large the > boost some content based recs may get in if they are strong enough. Flip > this to favor content over CF. > > Currently we have spark-itemsimilarity and spark-rowsimilarity to create > CF type indicators, cross-indicators, and content indicators. The metadata > can be taken as-is for indicators in your catalog (categories, color, tags, > hotness, etc.) > > We’re starting to document and make these techniques easier to use. See > Ted’s book here: https://www.mapr.com/practical-machine-learning and some > related blog posts here: > http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/ > > There are a lot of moving parts so I’d take each of the things you want to > affect the recs and think about how to create an indicator with them. > > We also have a way to create a multi-armed bandit affect but I’ve already > blathered on for long enough. > > On Sep 13, 2014, at 4:25 PM, Peter Wolf <[email protected]> wrote: > > Awesome! Thank you very much Ted. I'll try that > > As I am just getting started with Mahout, can you recommend any good > example code that does something similar? > > On Sat, Sep 13, 2014 at 3:38 PM, Ted Dunning <[email protected]> > wrote: > > > Rebuilding every day works very well in practice, but it captures a > moving > > average, not a good estimate of the current popularity of items. > > > > A simple hack is to implement a search based recommender and simply put > an > > empirically scaled boost on items which are rising rapidly in popularity. > > Of course you should also have specialized pages that show popular items > > and another that shows rapidly rising items. > > > > The simplest approach to marking rapidly rising items that I know is to > use > > the log of recent plays over less plays, offsetting both counts in a > manner > > similar to Laplace correction. The philosophy behind the score is that > for > > power law play counts, log play count is proportional to -log rank. > Then, > > the thought is something that rises from 2000-th rank to 1000-th rank is > > rising as significantly as something going from 100-th to 50-th. > > > > > > > > > > > > > > On Sat, Sep 13, 2014 at 11:25 AM, Peter Wolf <[email protected]> wrote: > > > >> Thanks Dmitriy, > >> > >> Is anyone working on an open source version of RLFM? > >> > >> For the moment, I have few enough classes of users that I can just build > >> multiple recommenders. For example, one for men and one for women. > >> > >> What about adaptive on-line algorithms? Just like Agarwal's Yahoo > > research > >> my items may rise and fall in popularity over time. In fact, time may > be > >> more important than user preferences in my application. > >> > >> Do I just rebuild every day with a window of recent data, or does Mahout > >> have something better? > >> > >> On Sat, Sep 13, 2014 at 12:26 PM, Dmitriy Lyubimov <[email protected]> > >> wrote: > >> > >>> Afaik mahout doesnt have these algorthms. Agarwal's RLFM is one of the > >> more > >>> promising while sitll simple enough things to implement at scale that > >> does > >>> that. > >>> On Sep 13, 2014 9:07 AM, "Peter Wolf" <[email protected]> wrote: > >>> > >>>> Hello, I am new to Mahout but not ML in general > >>>> > >>>> I want to create a Recommender that combines things I know about > > Users > >>> with > >>>> their Ratings. > >>>> > >>>> For example, perhaps I know the sex, age and nationality of my users. > >>> I'd > >>>> like to use that information to improve the recommendations. > >>>> > >>>> How is this information represented in the Mahout API? I have not > > been > >>>> able to find any documentation or examples about this. > >>>> > >>>> Thanks > >>>> Peter > >>>> > >>> > >> > > > >
