Mike,

Thanks for the vote of confidence!


On Wed, Oct 9, 2013 at 6:13 AM, Michael Sokolov <
[email protected]> wrote:

> Just to add a note of encouragement for the idea of better integration
> between Mahout and Solr:
>
> On safariflow.com, we've recently converted our recommender, which
> computes similarity scores w/Mahout, from storing scores and running
> queries w/Postgres, to doing all that in Solr.  It's been a big
> improvement, both in terms of indexing speed, and more importantly, the
> flexibility of the queries we can write.  I believe that having scoring
> built in to the query engine is a key feature for recommendations.  More
> and more I am coming to believe that recommendation should just be
> considered as another facet of search: as one among many variables the
> system may take into account when presenting relevant information to the
> user.  In our system, we still clearly separate search from
> recommendations, and we probably will always do that to some extent, but I
> think we will start to blend the queries more so that there will be
> essentially a continuum of query options including more or less "user
> preference" data.
>
> I think what I'm talking about may be a bit different than what Pat is
> describing (in implementation terms), since we do LLR calculations off-line
> in Mahout and then bulk load them into Solr.  We took one of Ted's earlier
> suggestions to heart, and simply ignored the actual numeric scores: we
> index the top N similar items for each item.  Later we may incorporate
> numeric scores in Solr as term weights.  If people are looking for things
> to do :) I think that would be a great software contribution that could
> spur this effort onward since it's difficult to accomplish right now given
> the Solr/Lucene indexing interfaces, but is already supported by the
> underlying data model and query engine.
>
>
> -Mike
>
>
> On 10/2/13 12:19 PM, Pat Ferrel wrote:
>
>> Excellent. From Ellen's description the first Music use may be an
>> implicit preference based recommender using synthetic  data? I'm quickly
>> discovering how flexible Solr use is in many of these cases.
>>
>> Here's another use you may have thought of:
>>
>> Shopping cart recommenders, as goes the intuition, are best modeled as
>> recommending from similar item-sets. If you store all shopping carts as
>> your training data (play lists, watch lists etc.) then as a user adds
>> things to their cart you query for the most similar past carts. Combine the
>> results intelligently and you'll have an item set recommender. Solr is
>> built to do this item-set similarity. We tried to do this for a ecom site
>> with pure Mahout but the similarity calc in real time stymied us. We knew
>> we'd need Solr but couldn't devote the resources to spin it up.
>>
>> On the Con-side Solr has a lot of stuff you have to work around. It also
>> does not have the ideal similarity measure for many uses (cosine is ok but
>> llr would probably be better). You don't want stop word filtering,
>> stemming, white space based tokenizing or n-grams. You would like explicit
>> weighting. A good thing about Solr is how well it integrates with virtually
>> any doc store independent of the indexing and query. A bit of an oval peg
>> for a round hole.
>>
>> It looks like the similarity code is replaceable if not pluggable. Much
>> of the rest could be trimmed away by config or adherence to conventions I
>> suspect. In the demo site I'm working on I've had to adopt some slightly
>> hacky conventions that I'll describe some day.
>>
>> On Oct 1, 2013, at 10:38 PM, Ted Dunning <[email protected]> wrote:
>>
>>
>> Pat,
>>
>> Ellen and some folks in Britain have been working with some data I
>> produced from synthetic music fans.
>>
>>
>> On Tue, Oct 1, 2013 at 2:22 PM, Pat Ferrel <[email protected]> wrote:
>> Hi Ellen,
>>
>>
>> On Oct 1, 2013, at 12:38 PM, Ted Dunning <[email protected]> wrote:
>>
>>
>> As requested,
>>
>> Pat, meet Ellen.
>>
>> Ellen, meet Pat.
>>
>>
>>
>>
>> On Tue, Oct 1, 2013 at 8:46 AM, Pat Ferrel <[email protected]> wrote:
>> Tunneling (rat-holing?) into the cross-recommender and Solr+Mahout
>> version.
>>
>> Things to note:
>> 1) The pure Mahout XRecommenderJob needs a cross-LLR or a
>> cross-similairty job. Currently there is only cooccurrence for
>> sparsification, which is far from optimal. This might take the form of a
>> cross RSJ with two DRMs as input. I can't commit to this but would commit
>> to adding it to the XRecommenderJob.
>> 2) output to Solr needs a lot of options implemented and tested. The
>> hand-run test should be made into some junits. I'm slowly doing this.
>> 3) the Solr query API is unimplemented unless someone else is working on
>> that. I'm building one in a demo site but it looks to me like a static
>> recommender API is not going to be all that useful and maybe a document
>> describing how to do it with the Solr query interface would be best,
>> especially for a first step. The reasoning here is that it is so tempting
>> to mix in metadata to the recommendation query that a static API is not so
>> obvious. For the demo site the recommender API will be prototyped in a
>> bunch of ways using models and controllers in Rails. If I'm the one to do
>> the a Java Solr-recommender query API it will be after experimenting a bit.
>>
>> Can someone introduce me to Ellen and Tim?
>>
>> On Sep 28, 2013, at 10:59 AM, Ted Dunning <[email protected]> wrote:
>>
>> The one large-ish feature that I think would find general use would be a
>> high performance classifier trainer.
>>
>> Flor cleanup sort of thing it would be good to fully integrate the
>> streaming k-means into the normal clustering commands while revamping the
>> command line API.
>>
>> Dmitriy's recent scala work would help quite a bit before 1.0. Not sure
>> it can make 0.9.
>>
>> For recommendations, I think that the demo system that pat started with
>> the elaborations by Ellen an Tim would be very good to have.
>>
>> I would be happy to collaborate with somebody on these but am not at all
>> likely to have time to actually do them end to end.
>>
>> Sent from my iPhone
>>
>> On Sep 28, 2013, at 12:40, Grant Ingersoll <[email protected]> wrote:
>>
>>  Moving closer to 1.0, removing cruft, etc.  Do we have any more major
>>> features planned for 1.0?  I think we said during 0.8 that we would try to
>>> follow pretty quickly w/ another release.
>>>
>>> -Grant
>>>
>>> On Sep 28, 2013, at 12:33 PM, Ted Dunning <[email protected]> wrote:
>>>
>>>  Sounds right in principle but perhaps a bit soon.
>>>>
>>>> What would define the release?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Sep 27, 2013, at 7:48, Grant Ingersoll <[email protected]> wrote:
>>>>
>>>>  Anyone interested in thinking about 0.9 in the early Nov. time frame?
>>>>>
>>>>> -Grant
>>>>>
>>>> ------------------------------**--------------
>>> Grant Ingersoll | @gsingers
>>> http://www.lucidworks.com
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>

Reply via email to