Re: Similar documents and advantages / disadvantages of MLT / Deduplication

2011-11-16 Thread Chris Hostetter
: I index 1000 docs, 5 of them are 95% the same (for example: copy pasted : blog articles from different sources, with slight changes (author name, : etc..)). : But they have differences. : *Now i like to see 1 doc in my result set and the other 4 should be marked : as similar.* Do you actaully w

Similar documents and advantages / disadvantages of MLT / Deduplication

2011-11-07 Thread Vadim Kisselmann
Hello folks, i have questions about MLT and Deduplication and what would be the best choice in my case. Case: I index 1000 docs, 5 of them are 95% the same (for example: copy pasted blog articles from different sources, with slight changes (author name, etc..)). But they have differences. *Now i