FWIW... We run a hash or the content and other bits of our docs, and then remove duplicates according to specific algorithms. (exactly the same page content can clearly be hosted on many different urls but, and domains) Then, the choosen ones are indexed. Though we toss the synonyms in the index too, so we know all it's other "names."
cheers gene Gene Campbell http:www.picante.co.nz gene at picante point co point nz http://www.travelbeen.com - "the social search engine for travel" On Fri, Feb 27, 2009 at 5:53 AM, Cheng Zhang <zhangyongji...@yahoo.com> wrote: > It's exactly what I'm looking for. Thank you Grant. > > > ----- Original Message ---- > From: Grant Ingersoll <gsing...@apache.org> > To: solr-user@lucene.apache.org > Sent: Thursday, February 26, 2009 6:56:22 AM > Subject: Re: unique result > > I presume these all have different unique ids? > > If you can address it at indexing time, then have a look at > https://issues.apache.org/jira/browse/SOLR-799 > > Otherwise, you might look at https://issues.apache.org/jira/browse/SOLR-236 > > > On Feb 25, 2009, at 6:54 PM, Cheng Zhang wrote: > >> Is it possible to have Solr to remove duplicated query results? >> >> For example, instead of return >> >> <result name="response" numFound="572" start="0"> >> <doc> <str name="productGroup_t_i_s_nm">Wireless</str> </doc> >> <doc> <str name="productGroup_t_i_s_nm">Wireless</str> </doc> >> <doc> <str name="productGroup_t_i_s_nm">Wireless</str> </doc> >> <doc> <str name="productGroup_t_i_s_nm">Video Games</str> </doc> >> <doc> <str name="productGroup_t_i_s_nm">Video Games</str> </doc> >> </result> >> >> return: >> <result name="response" numFound="572" start="0"> >> <doc> <str name="productGroup_t_i_s_nm">Wireless</str> </doc> >> <doc> <str name="productGroup_t_i_s_nm">Video Games</str> </doc> >> </result> >> >> Thanks a lot, >> Kevin >> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search >