Re: Deduplication in 1.4

Otis Gospodnetic Tue, 24 Nov 2009 21:55:08 -0800

Hi,

As far as I know, the point of deduplication in Solr ( 
http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate document 
before indexing it in order to avoid duplicates in the index in the first place.


What you are describing is closer to field collapsing patch in SOLR-236.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: KaktuChakarabati <jimmoe...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, November 24, 2009 5:29:00 PM
> Subject: Deduplication in 1.4
> 
> 
> Hey,
> I've been trying to find some documentation on using this feature in 1.4 but
> Wiki page is alittle sparse..
> In specific, here's what i'm trying to do:
> 
> I have a field, say 'duplicate_group_id' that i'll populate based on some
> offline documents deduplication process I have.
> 
> All I want is for solr to compute a 'duplicate_signature' field based on
> this one at update time, so that when i search for documents later, all
> documents with same original 'duplicate_group_id' value will be rolled up
> (e.g i'll just get the first one that came back  according to relevancy).
> 
> I enabled the deduplication processor and put it into updater, but i'm not
> seeing any difference in returned results (i.e results with same
> duplicate_id are returned separately..)
> 
> is there anything i need to supply in query-time for this to take effect?
> what should be the behaviour? is there any working example of this?
> 
> Anything will be helpful..
> 
> Thanks,
> Chak
> -- 
> View this message in context: 
> http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Deduplication in 1.4

Reply via email to