Re: Document Similarity

2012-07-30 Thread in.abdul
I had understood your need . You can use k mean clustering in mahout . Which can help your you case . You can better post this question in mahout user list where you get different idea . I had also had use case like this as i did as POC. But still my suggestion is that . You can post this question

RE: Document Similarity

2012-07-30 Thread Elshaimaa Ali
ntain any code any help will be greatly apreciated regardsshaimaa > Date: Mon, 30 Jul 2012 07:32:49 -0700 > From: in.ab...@gmail.com > To: java-user@lucene.apache.org > Subject: Re: Document Similarity > > Hi ELshaimaa, > I couldnt able understood what is your need . Can you

Re: Document Similarity

2012-07-30 Thread in.abdul
Hi ELshaimaa, I couldnt able understood what is your need . Can you please explain your use case. If this is case "I need to use Lucene to find the most similar documents from the generated index" then go for morelikethis[1] components . Based on your use case people can suggest some good wa

Re: Document similarity

2006-01-20 Thread Aleksey Serba
Yonik, Klaus, thanks for your quick response. Let me rephrase, i can't compare currently processed document with all documents in my collection using angle between documents in terms-vector space because of performance issues. As far as i can see, i can avoid unnecessary operations. At first, i ca

Re: Document similarity

2006-01-20 Thread Yonik Seeley
If you didn't want to store term vectors you could also run the document fields through the analyzer yourself and collect the Tokens (you should still have the fields you just indexed... no need to retrieve it again). -Yonik On 1/20/06, Klaus <[EMAIL PROTECTED]> wrote: > > >In my case, i need to