I had understood your need . You can use k mean clustering in mahout .
Which can help your you case . You can better post this question in mahout
user list where you get different idea . I had also had use case like this
as i did as POC. But still my suggestion is that . You can post this
question
ntain any code
any help will be greatly apreciated
regardsshaimaa
> Date: Mon, 30 Jul 2012 07:32:49 -0700
> From: in.ab...@gmail.com
> To: java-user@lucene.apache.org
> Subject: Re: Document Similarity
>
> Hi ELshaimaa,
> I couldnt able understood what is your need . Can you
Hi ELshaimaa,
I couldnt able understood what is your need . Can you please explain your
use case.
If this is case "I need to use Lucene to find the most similar documents
from the generated index"
then go for morelikethis[1] components .
Based on your use case people can suggest some good wa
Yonik, Klaus, thanks for your quick response.
Let me rephrase, i can't compare currently processed document with all
documents in my collection using angle between documents in
terms-vector space because of performance issues. As far as i can see,
i can avoid unnecessary operations. At first, i ca
If you didn't want to store term vectors you could also run the
document fields through the analyzer yourself and collect the Tokens
(you should still have the fields you just indexed... no need to
retrieve it again).
-Yonik
On 1/20/06, Klaus <[EMAIL PROTECTED]> wrote:
>
> >In my case, i need to