Hey Otis, Yep, I realized this myself after playing some with the dedupe feature yesterday. So it does look like Field collapsing is what I need pretty much. Any idea on how close it is to being production-ready?
Thanks, -Chak Otis Gospodnetic wrote: > > Hi, > > As far as I know, the point of deduplication in Solr ( > http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate > document before indexing it in order to avoid duplicates in the index in > the first place. > > What you are describing is closer to field collapsing patch in SOLR-236. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > ----- Original Message ---- >> From: KaktuChakarabati <jimmoe...@gmail.com> >> To: solr-user@lucene.apache.org >> Sent: Tue, November 24, 2009 5:29:00 PM >> Subject: Deduplication in 1.4 >> >> >> Hey, >> I've been trying to find some documentation on using this feature in 1.4 >> but >> Wiki page is alittle sparse.. >> In specific, here's what i'm trying to do: >> >> I have a field, say 'duplicate_group_id' that i'll populate based on some >> offline documents deduplication process I have. >> >> All I want is for solr to compute a 'duplicate_signature' field based on >> this one at update time, so that when i search for documents later, all >> documents with same original 'duplicate_group_id' value will be rolled up >> (e.g i'll just get the first one that came back according to relevancy). >> >> I enabled the deduplication processor and put it into updater, but i'm >> not >> seeing any difference in returned results (i.e results with same >> duplicate_id are returned separately..) >> >> is there anything i need to supply in query-time for this to take effect? >> what should be the behaviour? is there any working example of this? >> >> Anything will be helpful.. >> >> Thanks, >> Chak >> -- >> View this message in context: >> http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html Sent from the Solr - User mailing list archive at Nabble.com.