Mike, Thanks for the vote of confidence!
On Wed, Oct 9, 2013 at 6:13 AM, Michael Sokolov < [email protected]> wrote: > Just to add a note of encouragement for the idea of better integration > between Mahout and Solr: > > On safariflow.com, we've recently converted our recommender, which > computes similarity scores w/Mahout, from storing scores and running > queries w/Postgres, to doing all that in Solr. It's been a big > improvement, both in terms of indexing speed, and more importantly, the > flexibility of the queries we can write. I believe that having scoring > built in to the query engine is a key feature for recommendations. More > and more I am coming to believe that recommendation should just be > considered as another facet of search: as one among many variables the > system may take into account when presenting relevant information to the > user. In our system, we still clearly separate search from > recommendations, and we probably will always do that to some extent, but I > think we will start to blend the queries more so that there will be > essentially a continuum of query options including more or less "user > preference" data. > > I think what I'm talking about may be a bit different than what Pat is > describing (in implementation terms), since we do LLR calculations off-line > in Mahout and then bulk load them into Solr. We took one of Ted's earlier > suggestions to heart, and simply ignored the actual numeric scores: we > index the top N similar items for each item. Later we may incorporate > numeric scores in Solr as term weights. If people are looking for things > to do :) I think that would be a great software contribution that could > spur this effort onward since it's difficult to accomplish right now given > the Solr/Lucene indexing interfaces, but is already supported by the > underlying data model and query engine. > > > -Mike > > > On 10/2/13 12:19 PM, Pat Ferrel wrote: > >> Excellent. From Ellen's description the first Music use may be an >> implicit preference based recommender using synthetic data? I'm quickly >> discovering how flexible Solr use is in many of these cases. >> >> Here's another use you may have thought of: >> >> Shopping cart recommenders, as goes the intuition, are best modeled as >> recommending from similar item-sets. If you store all shopping carts as >> your training data (play lists, watch lists etc.) then as a user adds >> things to their cart you query for the most similar past carts. Combine the >> results intelligently and you'll have an item set recommender. Solr is >> built to do this item-set similarity. We tried to do this for a ecom site >> with pure Mahout but the similarity calc in real time stymied us. We knew >> we'd need Solr but couldn't devote the resources to spin it up. >> >> On the Con-side Solr has a lot of stuff you have to work around. It also >> does not have the ideal similarity measure for many uses (cosine is ok but >> llr would probably be better). You don't want stop word filtering, >> stemming, white space based tokenizing or n-grams. You would like explicit >> weighting. A good thing about Solr is how well it integrates with virtually >> any doc store independent of the indexing and query. A bit of an oval peg >> for a round hole. >> >> It looks like the similarity code is replaceable if not pluggable. Much >> of the rest could be trimmed away by config or adherence to conventions I >> suspect. In the demo site I'm working on I've had to adopt some slightly >> hacky conventions that I'll describe some day. >> >> On Oct 1, 2013, at 10:38 PM, Ted Dunning <[email protected]> wrote: >> >> >> Pat, >> >> Ellen and some folks in Britain have been working with some data I >> produced from synthetic music fans. >> >> >> On Tue, Oct 1, 2013 at 2:22 PM, Pat Ferrel <[email protected]> wrote: >> Hi Ellen, >> >> >> On Oct 1, 2013, at 12:38 PM, Ted Dunning <[email protected]> wrote: >> >> >> As requested, >> >> Pat, meet Ellen. >> >> Ellen, meet Pat. >> >> >> >> >> On Tue, Oct 1, 2013 at 8:46 AM, Pat Ferrel <[email protected]> wrote: >> Tunneling (rat-holing?) into the cross-recommender and Solr+Mahout >> version. >> >> Things to note: >> 1) The pure Mahout XRecommenderJob needs a cross-LLR or a >> cross-similairty job. Currently there is only cooccurrence for >> sparsification, which is far from optimal. This might take the form of a >> cross RSJ with two DRMs as input. I can't commit to this but would commit >> to adding it to the XRecommenderJob. >> 2) output to Solr needs a lot of options implemented and tested. The >> hand-run test should be made into some junits. I'm slowly doing this. >> 3) the Solr query API is unimplemented unless someone else is working on >> that. I'm building one in a demo site but it looks to me like a static >> recommender API is not going to be all that useful and maybe a document >> describing how to do it with the Solr query interface would be best, >> especially for a first step. The reasoning here is that it is so tempting >> to mix in metadata to the recommendation query that a static API is not so >> obvious. For the demo site the recommender API will be prototyped in a >> bunch of ways using models and controllers in Rails. If I'm the one to do >> the a Java Solr-recommender query API it will be after experimenting a bit. >> >> Can someone introduce me to Ellen and Tim? >> >> On Sep 28, 2013, at 10:59 AM, Ted Dunning <[email protected]> wrote: >> >> The one large-ish feature that I think would find general use would be a >> high performance classifier trainer. >> >> Flor cleanup sort of thing it would be good to fully integrate the >> streaming k-means into the normal clustering commands while revamping the >> command line API. >> >> Dmitriy's recent scala work would help quite a bit before 1.0. Not sure >> it can make 0.9. >> >> For recommendations, I think that the demo system that pat started with >> the elaborations by Ellen an Tim would be very good to have. >> >> I would be happy to collaborate with somebody on these but am not at all >> likely to have time to actually do them end to end. >> >> Sent from my iPhone >> >> On Sep 28, 2013, at 12:40, Grant Ingersoll <[email protected]> wrote: >> >> Moving closer to 1.0, removing cruft, etc. Do we have any more major >>> features planned for 1.0? I think we said during 0.8 that we would try to >>> follow pretty quickly w/ another release. >>> >>> -Grant >>> >>> On Sep 28, 2013, at 12:33 PM, Ted Dunning <[email protected]> wrote: >>> >>> Sounds right in principle but perhaps a bit soon. >>>> >>>> What would define the release? >>>> >>>> Sent from my iPhone >>>> >>>> On Sep 27, 2013, at 7:48, Grant Ingersoll <[email protected]> wrote: >>>> >>>> Anyone interested in thinking about 0.9 in the early Nov. time frame? >>>>> >>>>> -Grant >>>>> >>>> ------------------------------**-------------- >>> Grant Ingersoll | @gsingers >>> http://www.lucidworks.com >>> >>> >>> >>> >>> >>> >> >> >> >> >> >
