Re: Which database should I use with Mahout

Ted Dunning Sun, 19 May 2013 19:49:34 -0700

On Sun, May 19, 2013 at 6:26 PM, Pat Ferrel <[email protected]> wrote:


> Using a Hadoop version of a Mahout recommender will create some number of
> recs for all users as its output. Sean is talking about Myrrix I think
> which uses factorization to get much smaller models and so can calculate
> the recs at runtime for fairly large user sets.
>

The Mahout recommender can also produce a model in the form of item-item
matrices that can be used to produce recommendations on the fly from
memory-based model.

However if you are using Mahout and Hadoop the question is how to store and
> lookup recommendations in the quickest scalable way. You will have a user
> ID and perhaps an item ID as a key to the list of recommendations. The
> fastest thing to do is have a hashmap in memory, perhaps read in from HDFS.


Or just use SolR and create the recommendations on the fly.


> Remember that Mahout will output the recommendations with internal Mahout
> IDs so you will have to r

eplace these in the data with your actual user and item ids.
>

This can be repaired a index time using a search engine as well.


> I use a NoSQL DB, either MongoDB or Cassandra but others are fine too,
> even MySQL if you can scale it to meet your needs. I end up with two
> tables, one has my user ID as a key and recommendations with my item IDs
> either ordered or with strengths. The second table has my item ID as the
> key with a list of similar items (again sorted or with strengths). At
> runtime I may have both a user ID and an item ID context so I get a list
> from both tables and combine them at runtime.
>

MapR has a large bank as a client who used this approach.  Exporting recs
took 8 hours.  Switching to Solr to compute the recommendations decreased
export time to under 3 minutes.


>

Re: Which database should I use with Mahout

Reply via email to