As many of you know Mahout-Samsara includes an interesting and important extension to cooccurrence similarity, which supports cross-coossurrence and log-likelihood downsampling. This, when combined with a search engine, gives us a multimodal recommender. Some of us integrated Mahout with a DB and search engine to create what we call (humbly) the Universal Recommender.
We just completed a tool that measures the effects of what we call secondary events or indicators using the Universal Recommender. It calculates a ranking based precision metric called mean average precision—MAP@k. We took a dataset from the Rotten Tomatoes web site of “fresh”, and “rotten” reviews and combined that with data about the genres, casts, directors, and writers of the various video items. This gave us the indicators below: like, video-id <== primary indicator dislike, video-id like-genre, genre-id dislike-genre, genre-id like-director, director-id dislike-director, director-id like-writer, writer-id dislike-writer, writer-id like-cast, cast-member-id dislike-cast, cast-member-id These aren’t necessarily what we would have chosen if we were designing something from scratch but are possible to gather from public data. We have only ~5000 mostly professional reviewers with ~250k video items in this dataset but have a larger one we are integrating. We are also writing a white paper and blog post with some deeper analysis. There are several tidbits of insight when you look deeper. The bottom line is that using most of the above indicators we were able to get a 26% increase in MAP@1 over using only “like”. This is important because the vast majority of recommenders can only really ingest one type of indicator. http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html <http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html> https://github.com/actionml/template-scala-parallel-universal-recommendation <https://github.com/actionml/template-scala-parallel-universal-recommendation>
