Two sites that use field-collapsing: 1) www.ilocal.nl 2) www.welke.nl I'm not sure what you mean with double-tripping? The sites mentioned do not have performance problems that are caused by field collapsing.
Field-collapsing currently only supports quasi distributed field-collapsing (as I have described on the Solr wiki). Currently I don't know a distributed field-collapsing algorithm that works properly and does not influence the search time in such a way that the search becomes slow. Martijn 2009/11/26 Otis Gospodnetic <otis_gospodne...@yahoo.com>: > Hi Martijn, > > > ----- Original Message ---- > >> From: Martijn v Groningen <martijn.is.h...@gmail.com> >> To: solr-user@lucene.apache.org >> Sent: Thu, November 26, 2009 3:19:40 AM >> Subject: Re: Deduplication in 1.4 >> >> Field collapsing has been used by many in their production >> environment. > > Got any pointers to public sites you know use it? I know of a high traffic > site that used an early version, and it caused performance problems. Is > double-tripping still required? > >> The last few months the stability of the patch grew as >> quiet some bugs were fixed. The only big feature missing currently is >> caching of the collapsing algorithm. I'm currently working on that and > > Is it also full distributed-search-ready? > >> I will put it in a new patch in the coming next days. So yes the >> patch is very near being production ready. > > Thanks, > Otis > >> Martijn >> >> 2009/11/26 KaktuChakarabati : >> > >> > Hey Otis, >> > Yep, I realized this myself after playing some with the dedupe feature >> > yesterday. >> > So it does look like Field collapsing is what I need pretty much. >> > Any idea on how close it is to being production-ready? >> > >> > Thanks, >> > -Chak >> > >> > Otis Gospodnetic wrote: >> >> >> >> Hi, >> >> >> >> As far as I know, the point of deduplication in Solr ( >> >> http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate >> >> document before indexing it in order to avoid duplicates in the index in >> >> the first place. >> >> >> >> What you are describing is closer to field collapsing patch in SOLR-236. >> >> >> >> Otis >> >> -- >> >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> >> >> >> >> >> >> >> ----- Original Message ---- >> >>> From: KaktuChakarabati >> >>> To: solr-user@lucene.apache.org >> >>> Sent: Tue, November 24, 2009 5:29:00 PM >> >>> Subject: Deduplication in 1.4 >> >>> >> >>> >> >>> Hey, >> >>> I've been trying to find some documentation on using this feature in 1.4 >> >>> but >> >>> Wiki page is alittle sparse.. >> >>> In specific, here's what i'm trying to do: >> >>> >> >>> I have a field, say 'duplicate_group_id' that i'll populate based on some >> >>> offline documents deduplication process I have. >> >>> >> >>> All I want is for solr to compute a 'duplicate_signature' field based on >> >>> this one at update time, so that when i search for documents later, all >> >>> documents with same original 'duplicate_group_id' value will be rolled up >> >>> (e.g i'll just get the first one that came back according to relevancy). >> >>> >> >>> I enabled the deduplication processor and put it into updater, but i'm >> >>> not >> >>> seeing any difference in returned results (i.e results with same >> >>> duplicate_id are returned separately..) >> >>> >> >>> is there anything i need to supply in query-time for this to take effect? >> >>> what should be the behaviour? is there any working example of this? >> >>> >> >>> Anything will be helpful.. >> >>> >> >>> Thanks, >> >>> Chak >> >>> -- >> >>> View this message in context: >> >>> http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html >> >>> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> >> >> > >> > -- >> > View this message in context: >> http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html >> > Sent from the Solr - User mailing list archive at Nabble.com. >> > >> > > >