The code in mahout CF is doing that? I don't think that's right, we don't do anything that fancy right now, do we Sean?
-jake On Tue, Jun 8, 2010 at 3:39 PM, Sebastian Schelter <[email protected]>wrote: > Hi Kris, > > actually the code to compute the item-to-item similarities in the > collaborative filtering part of mahout (which at the first look seems to be > a totally different problem than yours) is based on a paper that deals with > computing the pairwise similarity of text documents in a very simple way. > Maybe that could be helpful to you: > > Elsayed et al: Pairwise Document Similarity in Large Collections with > MapReduce > > http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf > < > http://www.umiacs.umd.edu/%7Ejimmylin/publications/Elsayed_etal_ACL2008_short.pdf > > > > -sebastian > > > 2010/6/8 Kris Jack <[email protected]> > > > Hi everyone, > > > > I currently use lucene's moreLikeThis function through solr to find > > documents that are related to one another. A single call, however, takes > > around 4 seconds to complete and I would like to reduce this. I got to > > thinking that I might be able to use Mahout to generate a document > > similarity matrix offline that could then be looked-up in real time for > > serving. Is this a reasonable use of Mahout? If so, what functions will > > generate a document similarity matrix? Also, I would like to be able to > > keep the text processing advantages provided through lucene so it would > > help > > if I could still use my lucene index. If not, then could you recommend > any > > alternative solutions please? > > > > Many thanks, > > Kris > > >
