I did not wanna say you can use the item-item-similarity code from CF for
computing the document similarities, I just wanted to point out that these
problems are closely related and that the paper which the CF code is based
on is dealing with the computation of pairwise document similarities and
could therefore be helpful.

-sebastian

2010/6/9 Jake Mannix <[email protected]>

> The code in mahout CF is doing that?  I don't think that's right, we don't
> do anything that fancy right now, do we Sean?
>
>  -jake
>
> On Tue, Jun 8, 2010 at 3:39 PM, Sebastian Schelter
> <[email protected]>wrote:
>
> > Hi Kris,
> >
> > actually the code to compute the item-to-item similarities in the
> > collaborative filtering part of mahout (which at the first look seems to
> be
> > a totally different problem than yours) is based on a paper that deals
> with
> > computing the pairwise similarity of text documents in a very simple way.
> > Maybe that  could be helpful to you:
> >
> > Elsayed et al: Pairwise Document Similarity in Large Collections with
> > MapReduce
> >
> >
> http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf
> > <
> >
> http://www.umiacs.umd.edu/%7Ejimmylin/publications/Elsayed_etal_ACL2008_short.pdf
> > >
> >
> > -sebastian
> >
> >
> > 2010/6/8 Kris Jack <[email protected]>
> >
> > > Hi everyone,
> > >
> > > I currently use lucene's moreLikeThis function through solr to find
> > > documents that are related to one another.  A single call, however,
> takes
> > > around 4 seconds to complete and I would like to reduce this.  I got to
> > > thinking that I might be able to use Mahout to generate a document
> > > similarity matrix offline that could then be looked-up in real time for
> > > serving.  Is this a reasonable use of Mahout?  If so, what functions
> will
> > > generate a document similarity matrix?  Also, I would like to be able
> to
> > > keep the text processing advantages provided through lucene so it would
> > > help
> > > if I could still use my lucene index.  If not, then could you recommend
> > any
> > > alternative solutions please?
> > >
> > > Many thanks,
> > > Kris
> > >
> >
>

Reply via email to