Content based item similarity is a fine thing to include in a separate field.
In addition, it is reasonable to describe a person's history in terms of the meta-data on the items they have interacted with. That allows you to build a set of socially driven meta-data indicators as well. This can be useful in the restaurant example where you might find that "elegant" or "home-style" might be good indicators for different restaurants even if those terms don't appear in a restaurant description. Sent from my iPhone On Jul 23, 2013, at 18:26, Pat Ferrel <[email protected]> wrote: > Honestly not trying to make this more complicated but… > > In the purely Mahout cross-recommender we got a ranked list of similar items > for any item so we could combine personal history-based recs with > non-personalized item similarity-based recs wherever we had an item context. > In a past ecom case the item similarity recs were quite useful when a user > was looking at an item already. In that case even if the user was unknown we > could make item similarity-based recs. > > How about if we order the items in the doc by rank in the existing fields > since they are just text? Then we would do user-history-based queries on the > fields for recs and docs[itemID].field to get the ordered list of items out > of any doc. Doing an ensemble would require weights though. Unless someone > knows a rank based method for combining results. I guess you could vote or > add rank numbers of like items or the log thereof... > > I assume the combination of results from [B'B] and [B'A] will be a query over > both fields with some boost or other to handle ensemble weighting. But if you > want to add item similarity recs another method must be employed, no? > > From past experience I strongly suspect item similarity rank is not something > we want to lose so unless someone has a better idea I'll just order the IDs > in the fields and call it good for now. > > > On Jul 23, 2013, at 12:03 PM, Pat Ferrel <[email protected]> wrote: > > Will do. > > For what it's worth… > > The project I'm working on is an online recommender for video content. You go > to a site I'm creating, make some picks and get recommendations immediately > online. The training data comes from mining rotten tomatoes for critics > reviews. There are two actions, rotten & fresh. Was planning to toss the > 'rotten' except for filtering them out of any recs but maybe they would work > as A with an ensemble weight of -1? New thumbs up or down data would be put > into the training set periodically--not online--using the process outlined > below. > > On Jul 23, 2013, at 10:37 AM, Ted Dunning <[email protected]> wrote: > > > This sounds great. Go for it. Put a comment on the design doc with a > pointer to text that I should import. > > > > > On Tue, Jul 23, 2013 at 9:39 AM, Pat Ferrel <[email protected]> wrote: > I can supply: > > 1) a Maven based project in a public github repo as a baseline that creates > the following > 2) ingest and split actions, in-memory, single process, from text file, one > line per preference > 3) create DistributedRowMatrixes one per action (max of 3) with unified item > and user space > 4) create the 'similarity matrix' for [B'B] using LLR and [B'A] using matrix > multiply/cooccurrence. > 5) can take a stab at loading Solr. I know the Mahout side and the internal > to external ID translation. The Solr side sounds pretty simple for this case. > > This pipeline lacks downsampling since I had to replace > PreparePreferenceMatrixJob and potentially LLR for [B'A]. I assume Sebastian > is the person to talk to about these bits? > > The job this creates uses the hadoop script to launch. Each job extends > AbstractJob so runs locally or using HDFS or mapreduce (at least for the > Mahout parts). > > I have some obligations coming up so if you want this I'll need to get > moving. I can have the project ready on github in a day or two. May take > longer to do the Solr integration and if someone has a passion for taking > that bit on--great. This work is in my personal plans for the next couple > weeks as it happens anyway. > > Let me know if you want me to proceed. > > On Jul 22, 2013, at 3:42 PM, Ted Dunning <[email protected]> wrote: > > On Mon, Jul 22, 2013 at 12:40 PM, Pat Ferrel <[email protected]> wrote: > >> Yes. And the combined recommender would query on both at the same time. >> >> Pat-- doesn't it need ensemble type weighting for each recommender >> component? Probably a wishlist item for later? > > > Yes. Weighting different fields differently is a very nice (and very easy > feature). > > > >
