That shouldn't technically matter, my thought is to create a spring based elasticsearch recommender that leverages spark cooccurrence underneath.
Sent from my iPad > On Apr 26, 2014, at 10:07 AM, "Pat Ferrel" <[email protected]> wrote: > > Oh, and the example is old hadoop mapreduce, we’re redoing this with the new > Spark cooccurrence code, which will replace ItemSimilarity job. > > On Apr 26, 2014, at 10:03 AM, Pat Ferrel <[email protected]> wrote: > > If you want, fork the github repo, do the integration and create a pull > request. If the pull is accepted it will automatically be included in the > Mahout build’s examples. > > Some things to consider: > 1) It is actually easier to use either Solr/Lucid/ElasticSearch’s web GUI for > bare-bones illustration purposes. You’d have to enter the recs query by hand. > For demo purposes some example queries could be created ahead of time to > illustrate the recs generating queries. I did this myself but didn’t include > it in the example. I’d actually recommend this as a simple illustration. > 2) I’d suspect the Solr+DB integration route would be the most common way > people would actually use this but I could be wrong. This is what I did on > the demo site but far beyond what you’d put in an example. > 3) What data to use? Unless the data has human readable item ids, the demo is > not as compelling > > I can’t give you the demo site’s data since I mined the web for it, which > allows me to use it but I don’t think I can republish it. Data actually > gathered on the site by users I could share but there isn’t enough to work > with. Maybe Ted has some from his demo. > > On Apr 26, 2014, at 9:18 AM, Saikat Kanjilal <[email protected]> wrote: > > > > Sent from my iPad > >> On Apr 26, 2014, at 9:18 AM, "Saikat Kanjilal" <[email protected]> wrote: >> >> Is it worth it to add in the elasticsearch piece into the demo and tie that >> into a generic mvc framework like spring, in fact we could leverage spring >> data's elasticsearch plugin. >> >> Sent from my iPad >> >>> On Apr 26, 2014, at 9:08 AM, "Pat Ferrel" <[email protected]> wrote: >>> >>> Yes, it already does. It’s not named well, all it really does is create an >>> indicator matrix (item-item similarity using LLR) in a form that is >>> digestible by a text indexer. You could use Solr or ElasticSearch to do the >>> indexing and queries. >>> >>> In the actual installation on the demo site https://guide.finderbots.com >>> the indicator matrix is put into a DB and Solr is used to index the item >>> collection’s similarity data field. The queries are handled by the web app >>> framework. If I swapped out Solr for ElasticSearch for indexing the DB, it >>> would work just fine and I looked into how to integrate it with my web app >>> framework (RoR). The integration methods were significantly different >>> though so I chose not to do both. >>> >>> The reason I chose to put the indicator matrix in the DB is because it >>> makes it very convenient to mix metadata into the recs queries. In the case >>> of the demo site where the items are videos I have a bunch of >>> recommendation types: >>> 1) user-history based reqs—query is recent user “likes” history, the query >>> is on the videos collection specifying the similar items field, which is a >>> list of video id strings. This is most usually what people think a >>> recommender does but is only the start. >>> 2-9 are use various methods of biasing the results by genre metadata. >>> Search engines also allow filtering by fields so you can specify videos >>> filtered by source. So you can get comedies based on your “likes” filtered >>> by source = Netflix. in fact when you set the source filter to Netflix >>> every set of recs will contain only those on Netflix >>> >>> There are so many ways to combine bias with filter and what you use as the >>> query, that putting the fields in a DB made the most sense. I am still >>> thinking of new ways to use this. For instance item-set similarity, which >>> is used to give shopping cart recs in some systems. On the demo site you >>> could do the same with the watchlist if there were enough watchlists. Use >>> the user’s watchlist as query against all otehr watchlists and get back an >>> ordered set of watchlists most similar to yours, take recs from there. >>> >>> Some day I’ll write some blog posts about it but I’d encourage anyone with >>> data to try the DB route rather than raw indexing of the text files just >>> for the amazing flexibility and convenience it brings. >>> >>> On Apr 26, 2014, at 8:25 AM, Saikat Kanjilal <[email protected]> wrote: >>> >>> Pat, >>> I was wondering if you'd given any thought to genericizing the Solr >>> recommender to work with both Solr and elasticsearch, namely are there >>> pieces of the recommender that could plug into or be lifted above a search >>> engine ( or in the case of elasticsearch a set of rest APIs). I would be >>> very interested in helping out with this. >>> >>> Thoughts? >>> >>> Sent from my iPad >>> > >
