This sounds great. Go for it. Put a comment on the design doc with a pointer to text that I should import.
On Tue, Jul 23, 2013 at 9:39 AM, Pat Ferrel <[email protected]> wrote: > I can supply: > > 1) a Maven based project in a public github repo as a baseline that > creates the following > 2) ingest and split actions, in-memory, single process, from text file, > one line per preference > 3) create DistributedRowMatrixes one per action (max of 3) with unified > item and user space > 4) create the 'similarity matrix' for [B'B] using LLR and [B'A] using > matrix multiply/cooccurrence. > 5) can take a stab at loading Solr. I know the Mahout side and the > internal to external ID translation. The Solr side sounds pretty simple for > this case. > > This pipeline lacks downsampling since I had to replace > PreparePreferenceMatrixJob and potentially LLR for [B'A]. I assume > Sebastian is the person to talk to about these bits? > > The job this creates uses the hadoop script to launch. Each job extends > AbstractJob so runs locally or using HDFS or mapreduce (at least for the > Mahout parts). > > I have some obligations coming up so if you want this I'll need to get > moving. I can have the project ready on github in a day or two. May take > longer to do the Solr integration and if someone has a passion for taking > that bit on--great. This work is in my personal plans for the next couple > weeks as it happens anyway. > > Let me know if you want me to proceed. > > On Jul 22, 2013, at 3:42 PM, Ted Dunning <[email protected]> wrote: > > On Mon, Jul 22, 2013 at 12:40 PM, Pat Ferrel <[email protected]> > wrote: > > > Yes. And the combined recommender would query on both at the same time. > > > > Pat-- doesn't it need ensemble type weighting for each recommender > > component? Probably a wishlist item for later? > > > Yes. Weighting different fields differently is a very nice (and very easy > feature). > >
