Re: Setting up a recommender

Ted Dunning Wed, 24 Jul 2013 19:21:18 -0700

Content based item similarity is a fine thing to include in a separate field.


In addition, it is reasonable to describe a person's history in terms of the 
meta-data on the items they have interacted with.  That allows you to build a 
set of socially driven meta-data indicators as well.  This can be useful in the 
restaurant example where you might find that "elegant" or "home-style" might be 
good indicators for different restaurants even if those terms don't appear in a 
restaurant description.  

Sent from my iPhone

On Jul 23, 2013, at 18:26, Pat Ferrel <[email protected]> wrote:

> Honestly not trying to make this more complicated but…
> 
> In the purely Mahout cross-recommender we got a ranked list of similar items 
> for any item so we could combine personal history-based recs with 
> non-personalized item similarity-based recs wherever we had an item context. 
> In a past ecom case the item similarity recs were quite useful when a user 
> was looking at an item already. In that case even if the user was unknown we 
> could make item similarity-based recs.
> 
> How about if we order the items in the doc by rank in the existing fields 
> since they are just text? Then we would do user-history-based queries on the 
> fields for recs and docs[itemID].field to get the ordered list of items out 
> of any doc. Doing an ensemble would require weights though. Unless someone 
> knows a rank based method for combining results. I guess you could vote or 
> add rank numbers of like items or the log thereof...
> 
> I assume the combination of results from [B'B] and [B'A] will be a query over 
> both fields with some boost or other to handle ensemble weighting. But if you 
> want to add item similarity recs another method must be employed, no?
> 
> From past experience I strongly suspect item similarity rank is not something 
> we want to lose so unless someone has a better idea I'll just order the IDs 
> in the fields and call it good for now.
> 
> 
> On Jul 23, 2013, at 12:03 PM, Pat Ferrel <[email protected]> wrote:
> 
> Will do.
> 
> For what it's worth…
> 
> The project I'm working on is an online recommender for video content. You go 
> to a site I'm creating, make some picks and get recommendations immediately 
> online. The training data comes from mining rotten tomatoes for critics 
> reviews. There are two actions, rotten & fresh. Was planning to toss the 
> 'rotten' except for filtering them out of any recs but maybe they would work 
> as A with an ensemble weight of -1? New thumbs up or down data would be put 
> into the training set periodically--not online--using the process outlined 
> below.
> 
> On Jul 23, 2013, at 10:37 AM, Ted Dunning <[email protected]> wrote:
> 
> 
> This sounds great.  Go for it.  Put a comment on the design doc with a 
> pointer to text that I should import.
> 
> 
> 
> 
> On Tue, Jul 23, 2013 at 9:39 AM, Pat Ferrel <[email protected]> wrote:
> I can supply:
> 
> 1) a Maven based project in a public github repo as a baseline that creates 
> the following
> 2) ingest and split actions, in-memory, single process, from text file, one 
> line per preference
> 3) create DistributedRowMatrixes one per action (max of 3) with unified item 
> and user space
> 4) create the 'similarity matrix' for [B'B] using LLR and [B'A] using matrix 
> multiply/cooccurrence.
> 5) can take a stab at loading Solr.  I know the Mahout side and the internal 
> to external ID translation. The Solr side sounds pretty simple for this case.
> 
> This pipeline lacks downsampling since I had to replace 
> PreparePreferenceMatrixJob and potentially LLR for [B'A]. I assume Sebastian 
> is the person to talk to about these bits?
> 
> The job this creates uses the hadoop script to launch. Each job extends 
> AbstractJob so runs locally or using HDFS or mapreduce (at least for the 
> Mahout parts).
> 
> I have some obligations coming up so if you want this I'll need to get 
> moving. I can have the project ready on github in a day or two. May take 
> longer to do the Solr integration and if someone has a passion for taking 
> that bit on--great. This work is in my personal plans for the next couple 
> weeks as it happens anyway.
> 
> Let me know if you want me to proceed.
> 
> On Jul 22, 2013, at 3:42 PM, Ted Dunning <[email protected]> wrote:
> 
> On Mon, Jul 22, 2013 at 12:40 PM, Pat Ferrel <[email protected]> wrote:
> 
>> Yes.  And the combined recommender would query on both at the same time.
>> 
>> Pat-- doesn't it need ensemble type weighting for each recommender
>> component? Probably a wishlist item for later?
> 
> 
> Yes.  Weighting different fields differently is a very nice (and very easy
> feature).
> 
> 
> 
>

Re: Setting up a recommender

Reply via email to