Hi Karl!
I'm interested in near duplicate detection based on termFreqVectos. Now
I'm comparing all documents with each other (calculating the angle)...
Is there a way to avoid that?
Thanks!
Beto
karl wettin wrote:
17 okt 2006 kl. 17.54 skrev Find Me:
How to eliminate near duplicates
Hi Andrej!
I'm taking a look to fuzzy signatures for near duplicate detection and
and I have seen your TextProfileSignature. The question is: If I index
the documents with their text signature, is there a way to filter near
duplicates at search time without comparing each document with all