Hi, As far as I know, the point of deduplication in Solr ( http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate document before indexing it in order to avoid duplicates in the index in the first place.
What you are describing is closer to field collapsing patch in SOLR-236. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR ----- Original Message ---- > From: KaktuChakarabati <jimmoe...@gmail.com> > To: solr-user@lucene.apache.org > Sent: Tue, November 24, 2009 5:29:00 PM > Subject: Deduplication in 1.4 > > > Hey, > I've been trying to find some documentation on using this feature in 1.4 but > Wiki page is alittle sparse.. > In specific, here's what i'm trying to do: > > I have a field, say 'duplicate_group_id' that i'll populate based on some > offline documents deduplication process I have. > > All I want is for solr to compute a 'duplicate_signature' field based on > this one at update time, so that when i search for documents later, all > documents with same original 'duplicate_group_id' value will be rolled up > (e.g i'll just get the first one that came back according to relevancy). > > I enabled the deduplication processor and put it into updater, but i'm not > seeing any difference in returned results (i.e results with same > duplicate_id are returned separately..) > > is there anything i need to supply in query-time for this to take effect? > what should be the behaviour? is there any working example of this? > > Anything will be helpful.. > > Thanks, > Chak > -- > View this message in context: > http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html > Sent from the Solr - User mailing list archive at Nabble.com.