Sort of, there is a separate job to compute all item-item similarities under a variety of metrics. This is what Sebastian wrote. It's not used in the co-occurrence recommender (but could be -- vaguely a to-do here.)
But sure if you're willing to think of a doc as an "item vector" of "preferences" from "words" then this works fine to compute doc similarity under these metrics. On Wed, Jun 9, 2010 at 12:52 AM, Jake Mannix <[email protected]> wrote: > The code in mahout CF is doing that? I don't think that's right, we don't > do anything that fancy right now, do we Sean? > > -jake > > On Tue, Jun 8, 2010 at 3:39 PM, Sebastian Schelter > <[email protected]>wrote: > >> Hi Kris, >> >> actually the code to compute the item-to-item similarities in the >> collaborative filtering part of mahout (which at the first look seems to be >> a totally different problem than yours) is based on a paper that deals with >> computing the pairwise similarity of text documents in a very simple way. >> Maybe that could be helpful to you: >> >> Elsayed et al: Pairwise Document Similarity in Large Collections with >> MapReduce >> >> http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf >> < >> http://www.umiacs.umd.edu/%7Ejimmylin/publications/Elsayed_etal_ACL2008_short.pdf >> > >> >> -sebastian >> >> >> 2010/6/8 Kris Jack <[email protected]> >> >> > Hi everyone, >> > >> > I currently use lucene's moreLikeThis function through solr to find >> > documents that are related to one another. A single call, however, takes >> > around 4 seconds to complete and I would like to reduce this. I got to >> > thinking that I might be able to use Mahout to generate a document >> > similarity matrix offline that could then be looked-up in real time for >> > serving. Is this a reasonable use of Mahout? If so, what functions will >> > generate a document similarity matrix? Also, I would like to be able to >> > keep the text processing advantages provided through lucene so it would >> > help >> > if I could still use my lucene index. If not, then could you recommend >> any >> > alternative solutions please? >> > >> > Many thanks, >> > Kris >> > >> >
