thanks.  i did consider postprocessing and may wind up doing that, i was
hoping there was a way to have Solr do it for me! that I have to as this
question is probably not a good sign, but what is LSH clustering?

On Fri, Nov 25, 2011 at 4:34 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> You can do that pretty easily by just retrieving extra documents and post
> processing the results list.
>
> You are likely to have a significant number of apparent duplicates this
> way.
>
> To really get rid of duplicates in results, it might be better to remove
> them from the corpus by deploying something like LSH clustering.
>
> On Thu, Nov 24, 2011 at 5:04 PM, Fred Zimmerman <zimzaz....@gmail.com
> >wrote:
>
> > I have a corpus that has a lot of identical or nearly identical
> documents.
> > I'd like to return only the unique ones (excluding the "nearly identical"
> > which are redirects).  I notice that all the identical/nearly identicals
> > have identical Solr scores. How can I tell Solr to  throw out all the
> > successive documents in an answer set that have identical scores?
> >
> > doc 1 score 5.0
> > doc 2  score 5.0
> > doc 3 score 5.0
> > doc 4 score 4.9
> >
> > skip docs 2 and 3
> >
> > bring back 10 docs with unique scores
> >
>

Reply via email to