Precisely; that's the one, thanks!
On Fri, Feb 14, 2014 at 12:45 PM, Pat Ferrel <[email protected]> wrote: > Note sure if this is what you are looking for. I assume you are talking > about Ted's paper describing a Solr based recommender pipeline? > > Much of the paper was implemented in the solr-recommender referenced > below, which has a fairly flexible parallel version of a logfile reader > that uses Cascading for mapreduce. It picks out columns in delimited text > files. You can choose a constant string for your action id, like "purchase" > or "thumbs-up". Then specify the field index for user, item, and action. It > assumes strings for all these inputs and creates > string-id->Mahout-Integer-id->string-id bidriectional hashmaps as > dictionary and reverse dictionary. Everything is scalable except the > BiHashmaps, which are in-memory. They aren't usually too big for that. > There is also a pattern for the input log file names and they are searched > for recursively from some root directory. > > Caveat emptor: not all the options are implemented or tested. One person > has already implemented a scaffolded option and their pull request was > merged so feel free to contribute. > > It is an example of how to digest logfiles, build Mahout data, and run the > recommender. It creates Solr indexing data too but the output of the > recommender is up to you to implement. It is a Solr query or a lookup in > the Mahout recommender DRM output. > > https://github.com/pferrel/solr-recommender > > > On Feb 14, 2014, at 12:39 PM, Ted Dunning <[email protected]> wrote: > > Yes! > > But it is very hard to find the time. > > > > On Fri, Feb 14, 2014 at 11:51 AM, Andrew Musselman < > [email protected]> wrote: > > > I'd like to see cross-recommendations added too. > > > > But I also want some automation of the steps required to build a simple > > recommender like the solr/mahout example Ted and Ellen have in their > > pamphlet. > > > > Lowering the barrier to entry by providing a sample pipeline would help a > > lot of folks get started and hopefully would keep them interested. > Perhaps > > in examples/bin? > > > > > > On Fri, Feb 14, 2014 at 10:56 AM, Pat Ferrel <[email protected]> > > wrote: > > > >> There's been work done on the cross-recommender. There is a Mahout-style > >> XRecommenderJob that has two preference models for two actions or > >> preference types. It uses matrix multiply to get a cooccurrence type > >> similarity matrix. If we had a cross-row-similarity-job, it could pretty > >> easily be integrated and I'd volunteer to integrate it. The XRSJ is > >> probably beyond me right now so if we can scare up someone to do that > > we'd > >> be a long way down the road. > >> > >> I'll put a feature request into Jira and take this to the dev list > >> > >> BTW this is already integrated with the solr-recommender. > >> > >> On Feb 8, 2014, at 7:19 PM, Ted Dunning <[email protected]> wrote: > >> > >> I have different opinions about each piece. > >> > >> I think that cross recommendation is as core as RowSimilarityJob and > > should > >> be a parallel implementation or integrated. Parallel is probably > easier. > >> It is even plausible to have a version of RowSimilarityJob that doesn't > >> support all the different distance measures but does support multiple > > cross > >> and direct processing using LLR or related cooccurrence based measures. > > It > >> would be very cool if a single pass over the data could do many kinds of > > co > >> or cross occurrence operations. > >> > >> For dithering, it really is post processing. That said, it is also the > >> single largest improvement that anybody typically gets when testing > >> different options so it is a bit goofy to not have good support for some > >> kinds of dithering. > >> > >> For Thompson sampled recommenders, I am not sure where to start hacking > > on > >> our current code. > >> > >> > >> > >> > >> > >> > >> On Sat, Feb 8, 2014 at 4:53 PM, Pat Ferrel <[email protected]> > > wrote: > >> > >>> That was by no means to criticize effort level, which has been > > impressive > >>> especially during the release. > >>> > >>> It was more a question about the best place to add these things and > >>> whether they are important. Whether people see these things as custom > >> post > >>> processing or core. > >>> > >>> On Feb 8, 2014, at 12:13 PM, Ted Dunning <[email protected]> > > wrote: > >>> > >>> ... > >>> > >>> The reason that we aren't adding this like cross-rec and other things > > is > >>> that "we" have full-time jobs, mostly. Suneel is full-time on Mahout, > >> but > >>> the rest are not. You seem more active than most. > >>> > >>> > >>> > >> > >> > > > >
