Re: Generating a Document Similarity Matrix

Jake Mannix Tue, 08 Jun 2010 15:52:55 -0700

The code in mahout CF is doing that?  I don't think that's right, we don't
do anything that fancy right now, do we Sean?


  -jake

On Tue, Jun 8, 2010 at 3:39 PM, Sebastian Schelter
<[email protected]>wrote:

> Hi Kris,
>
> actually the code to compute the item-to-item similarities in the
> collaborative filtering part of mahout (which at the first look seems to be
> a totally different problem than yours) is based on a paper that deals with
> computing the pairwise similarity of text documents in a very simple way.
> Maybe that  could be helpful to you:
>
> Elsayed et al: Pairwise Document Similarity in Large Collections with
> MapReduce
>
> http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf
> <
> http://www.umiacs.umd.edu/%7Ejimmylin/publications/Elsayed_etal_ACL2008_short.pdf
> >
>
> -sebastian
>
>
> 2010/6/8 Kris Jack <[email protected]>
>
> > Hi everyone,
> >
> > I currently use lucene's moreLikeThis function through solr to find
> > documents that are related to one another.  A single call, however, takes
> > around 4 seconds to complete and I would like to reduce this.  I got to
> > thinking that I might be able to use Mahout to generate a document
> > similarity matrix offline that could then be looked-up in real time for
> > serving.  Is this a reasonable use of Mahout?  If so, what functions will
> > generate a document similarity matrix?  Also, I would like to be able to
> > keep the text processing advantages provided through lucene so it would
> > help
> > if I could still use my lucene index.  If not, then could you recommend
> any
> > alternative solutions please?
> >
> > Many thanks,
> > Kris
> >
>

Re: Generating a Document Similarity Matrix

Reply via email to