thanks. i did consider postprocessing and may wind up doing that, i was hoping there was a way to have Solr do it for me! that I have to as this question is probably not a good sign, but what is LSH clustering?
On Fri, Nov 25, 2011 at 4:34 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > You can do that pretty easily by just retrieving extra documents and post > processing the results list. > > You are likely to have a significant number of apparent duplicates this > way. > > To really get rid of duplicates in results, it might be better to remove > them from the corpus by deploying something like LSH clustering. > > On Thu, Nov 24, 2011 at 5:04 PM, Fred Zimmerman <zimzaz....@gmail.com > >wrote: > > > I have a corpus that has a lot of identical or nearly identical > documents. > > I'd like to return only the unique ones (excluding the "nearly identical" > > which are redirects). I notice that all the identical/nearly identicals > > have identical Solr scores. How can I tell Solr to throw out all the > > successive documents in an answer set that have identical scores? > > > > doc 1 score 5.0 > > doc 2 score 5.0 > > doc 3 score 5.0 > > doc 4 score 4.9 > > > > skip docs 2 and 3 > > > > bring back 10 docs with unique scores > > >