On Sun, May 19, 2013 at 6:26 PM, Pat Ferrel <[email protected]> wrote:
> Using a Hadoop version of a Mahout recommender will create some number of > recs for all users as its output. Sean is talking about Myrrix I think > which uses factorization to get much smaller models and so can calculate > the recs at runtime for fairly large user sets. > The Mahout recommender can also produce a model in the form of item-item matrices that can be used to produce recommendations on the fly from memory-based model. However if you are using Mahout and Hadoop the question is how to store and > lookup recommendations in the quickest scalable way. You will have a user > ID and perhaps an item ID as a key to the list of recommendations. The > fastest thing to do is have a hashmap in memory, perhaps read in from HDFS. Or just use SolR and create the recommendations on the fly. > Remember that Mahout will output the recommendations with internal Mahout > IDs so you will have to r eplace these in the data with your actual user and item ids. > This can be repaired a index time using a search engine as well. > I use a NoSQL DB, either MongoDB or Cassandra but others are fine too, > even MySQL if you can scale it to meet your needs. I end up with two > tables, one has my user ID as a key and recommendations with my item IDs > either ordered or with strengths. The second table has my item ID as the > key with a list of similar items (again sorted or with strengths). At > runtime I may have both a user ID and an item ID context so I get a list > from both tables and combine them at runtime. > MapR has a large bank as a client who used this approach. Exporting recs took 8 hours. Switching to Solr to compute the recommendations decreased export time to under 3 minutes. >
