The Lucene/Solr MoreLikeThis feature is (I believe) a cosine distance search across multiple fields of documents. Depending on the domain, its results may be useful or surreal.
Lance On Fri, Oct 21, 2011 at 1:20 AM, Sean Owen <[email protected]> wrote: > Great point, yes, you could easily use a text search engine to come up > with a similarity, if the things are text-like documents. > These aren't recs by themselves, but the similarities can plug in to > the item-based recommender easily. > > On Fri, Oct 21, 2011 at 4:12 AM, Octavian Covalschi > <[email protected]> wrote: > > I'm not an expert but I do have a comment on B). Similarity between meta > > data can be achieved by using some kind of search engine. For this kind > of > > functionality I'm using SOLR (http://wiki.apache.org/solr/MoreLikeThis), > it > > has a builtin feature that would give ya similar documents. All you have > to > > give it is a doc id... However I think this won't be a real > recommendation, > > since similar items may not be something that user want... for example if > I > > bought an expensive camera, I may not need any more similar items, right? > > But in the same time, if I'm buying batteries every half a year.. I may > be > > interested in similar products.... so it depends. > > > > Just a thought. > > > > > > On Thu, Oct 20, 2011 at 4:30 PM, Sean Owen <[email protected]> wrote: > > > >> On Thu, Oct 20, 2011 at 10:13 PM, Camilo Rostoker > >> <[email protected]> wrote: > >> > A) Use an item-based recommender, with the rating being the number of > >> times they bought the item (perhaps normalize the data between 1-10). > >> > >> Yes, good. My first reaction might be to use the logarithm of number > >> of purchases, or ignore it altogether and just record the association > >> (a 'boolean' pref) regardless of the purchase count. This only makes a > >> complete system together with B) or C) though. > >> > >> > > >> > B) Use the meta-data to generate similarities between the items, then > >> simply recommend to a user the top N items that are similar to one that > >> they've previously purchased. This could be implemented in Mahout by > >> overriding the ItemSimilarity (as described in this post: > >> > http://lucene.472066.n3.nabble.com/Content-based-Recommender-Implementation-td913981.html > ). > >> Obviously the challenging part here is figuring out how to generate a > >> similarity score for the two items using the meta-data. > >> > >> Exactly. You can plug in whatever you logic you want there, but > >> equally you have to make up that logic. To start, you can experiment > >> with simplistic rules like considering only items in the same category > >> "similar". It might do reasonably well as a start. > >> > >> You can of course just use purchases, pure collaborative filtering, to > >> generate similarity. For instance log-likelihood similarity works > >> well. > >> > >> > >> > > >> > C) Use frequent item-sets to figure out other items that are usually > >> bought with that one, and recommend those. > >> > >> You could use frequent item sets to determine item-item similarity, as > >> in B). That's kind of what log-likelihood is doing. This would then be > >> a plug-in similarity to your item-based algorithm in A). > >> > >> If you mean you just want to start with an *item*, and find similar > >> items, sure you can do that. This is simpler than the full recommender > >> problem. > >> > > > -- Lance Norskog [email protected]
