inline

BTW if there is an LLR cross-similarity job (replacing [B'A] it is easy to 
integrate.


On Jul 22, 2013, at 12:09 PM, Ted Dunning <[email protected]> wrote:

On Mon, Jul 22, 2013 at 9:20 AM, Pat Ferrel <[email protected]> wrote:

> +10
> 
> Love the academics but I agree with this. Recently saw a VP from Netflix
> plead with the audience (mostly academics) to move past RMSE--focus on
> maximizing correct ranking, not rating prediction.
> 
> Anyway I have a pipeline that does *[ingest, prepare, row-similarity, not
> in m/r]*
> 

Is this available?

Pat-- Can quickly be. In Github. I'd have to clean up a bit.


replaces PreparePreferenceMatrixJob to create n matrixes depending on the
> number of actions you are splitting out. This job also creates external <->
> internal item and user id BiHashMaps for going back and forth between the
> log's IDs and Mahout internal IDs. It guarantees a uniform item and user ID
> space and sparse matrix ranks by creating one from all actions. Not
> completely scalable since it is not done in m/r though it uses HDFS--I have
> a plan to m/r the process and get rid of the hashmap.
> 

Frankly, doing it outside of map-reduce is good for a start and should be
preserved for later.  It makes on-boarding new folks much easier.

Pat-- It uses the hadoop version of the matrix mult and RowSimilarityJob in 
later steps but they work without a cluster in local mode.


> performs the RowSimilarityJob on the primary matrix "B" and does B'A to
> create a cooccurrence matrix for primary and secondary actions.
> 

What code do you use for B'A?

Pat-- matrix transposes and multiply from Mahout.

> Stores all recs from all models in a NoSQL DB.
> 

I recommend not doing this for the demo, but rather storing rows of B'A and
B'B as fields in Solr.

Pat-- yes, just explaining for completeness

> At rec request time it does a linear combination of req and cross-rec to
> return the highest scored ones.


Should be integrated into the query.

Pat-- yes, just explaining for completeness

> Does 1-3 fit the first part of 'offline to Solr'? The IDs can be written
> to Solr as the original external IDs from the log files, which were
> strings. This allows them to be treated as terms by Solr.
> 

Yes.  These early steps are very much what I was aiming for.

Pat-- OK, happy to contribute if possible let me know who to coordinate with.

> My understanding of the Solr proposal puts B's row similarity matrix in a
> vector per item.


For a particular item document, the corresponding row of B'A and the
corresponding row of B'B go into separate fields.  I think you mean B'B
when you say "B's row similarity matrix".  Just checking.

Pat-- yes, exactly

> That means each row is turned into "terms" = external IDs--not sure how
> the weights of each term are encoded.


Again, I just use native Solr weighting.

Pat-- good, that makes this fairly simple I expect. Just fields with bags of 
term strings.


> So the cross-recommender would just put the cross-action similarity matrix
> in other field(s) on the same itemID/docID, right?
> 

Yes.  Exactly.


> 
> Then the straight out recommender queries on the B'B field(s) and the
> cross-recommender queries on the B'A field(s). I suppose to keep it simple
> the cross-action similarity matrix could be put in a separate index.  Is
> this about right?
> 

Yes.  And the combined recommender would query on both at the same time.

Pat-- doesn't it need ensemble type weighting for each recommender component? 
Probably a wishlist item for later?

Reply via email to