I agree! I also share the thought experiment, these changes seem justifiable indeed! It is just that i am interested in the evidence.
Again, if you can webster.homer, please share significant figures. They should be interesting! Regards, M. -----Original message----- > From:Walter Underwood <wun...@wunderwood.org> > Sent: Tuesday 8th August 2017 23:41 > To: solr-user@lucene.apache.org > Subject: Re: Solr 6 and IDF > > There are good use cases for disabling idf and even tf for labels and > categories. > > Searching resumes, maybe you care that “microsoft word” is less selective > than “r programming”, but maybe you want all the ones that match three skills > followed by the ones that match two skills, regardless of how common those > skills are. > > And for tf, a document tagged with both “new york” and “new york city” is not > twice as much about New York. Same for the movie “New York, New York”. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Aug 8, 2017, at 2:18 PM, Markus Jelsma <markus.jel...@openindex.io> > > wrote: > > > > Do you measure MRR or sales conversion right now? It would be interesting > > to see the graph change after your modification, or not of course. Please > > let us know! > > > > -----Original message----- > >> From:Webster Homer <webster.ho...@sial.com> > >> Sent: Tuesday 8th August 2017 23:04 > >> To: solr-user@lucene.apache.org > >> Subject: Re: Solr 6 and IDF > >> > >> I think just disabling idf is what we want. For product searching we really > >> don't want to raise a rarer match. What we see analyzing results is that > >> some good hits are suppressed, have lower scores, due to idf. > >> > >> This is so we can test this. We think it will help, but we'll see. > >> > >> On Tue, Aug 8, 2017 at 3:53 PM, Markus Jelsma <markus.jel...@openindex.io> > >> wrote: > >> > >>> Yes, extend the default Similarity, return 1.0f for idf and probably the > >>> idfExplain methods, and configure it in your schema, global or per-field. > >>> > >>> If you think this is a good idea, why not also return 1.0f for tf? And > >>> while you're at it, also omitNorms on all fields entirely? > >>> > >>> I am curious if this is going to help you, please let us know! > >>> > >>> -----Original message----- > >>>> From:Webster Homer <webster.ho...@sial.com> > >>>> Sent: Tuesday 8th August 2017 22:44 > >>>> To: solr-user@lucene.apache.org > >>>> Subject: Re: Solr 6 and IDF > >>>> > >>>> It appears that all I need to do is create a class that > >>>> extends BM25Similarity, and have the new class return 1 as the idf. Is > >>> that > >>>> correct? > >>>> > >>>> On Tue, Aug 8, 2017 at 3:15 PM, Webster Homer <webster.ho...@sial.com> > >>>> wrote: > >>>> > >>>>> I do want to use BM25, just disable IDF > >>>>> > >>>>> On Tue, Aug 8, 2017 at 2:58 PM, Peter Lancaster < > >>>>> peter.lancas...@findmypast.com> wrote: > >>>>> > >>>>>> Hi Webster, > >>>>>> > >>>>>> If you're not worried about using BM25 searcher then you should just > >>> be > >>>>>> able to continue as you were before by providing your own similarity > >>> class > >>>>>> that extends ClassicSimilarity and then override the idf method to > >>> always > >>>>>> return 1, then reference that in your schema > >>>>>> e.g. > >>>>>> <similarity class="brightsolid.solr.plugins.MyCustomSimilarity" /> > >>>>>> > >>>>>> As far as I know you've been able to have different similarities per > >>>>>> field in solr for a while now. https://wiki.apache.org/solr/S > >>>>>> chemaXml#Similarity > >>>>>> > >>>>>> Cheers, > >>>>>> Peter Lancaster. > >>>>>> > >>>>>> > >>>>>> -----Original Message----- > >>>>>> From: Webster Homer [mailto:webster.ho...@sial.com] > >>>>>> Sent: 08 August 2017 20:39 > >>>>>> To: solr-user@lucene.apache.org > >>>>>> Subject: Solr 6 and IDF > >>>>>> > >>>>>> Our most common use for solr is searching for products, not text > >>> search. > >>>>>> My company is in the process of migrating away from an Endeca search > >>>>>> engine, the goal to keep the business happy is to make sure that > >>> search > >>>>>> results from the different engines be fairly similar, one area that > >>> we have > >>>>>> found that suppresses a result from being as good as it was in the old > >>>>>> system is the idf. > >>>>>> > >>>>>> We are using Solr 6. After moving to it, a lot of our results got > >>> better, > >>>>>> but idf still seems to deaden some results. Given that our focus is > >>> product > >>>>>> searching I really don't see a need for idf at all. Previous to Solr > >>> 6 you > >>>>>> could suppress idf by providing a custom similarity class. Looking > >>> over the > >>>>>> newer documentation a lot of things have improved, but I'm not sure I > >>> see a > >>>>>> simple way to turn off idf in Solr 6's BM25 searcher. > >>>>>> > >>>>>> How do I disable IDF in Solr 6? > >>>>>> > >>>>>> We also do have needs for text searching so it would be nice if we > >>> could > >>>>>> suppress IDF on a field or schema level > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> > >>>>>> This message and any attachment are confidential and may be > >>> privileged or > >>>>>> otherwise protected from disclosure. If you are not the intended > >>> recipient, > >>>>>> you must not copy this message or attachment or disclose the contents > >>> to > >>>>>> any other person. If you have received this transmission in error, > >>> please > >>>>>> notify the sender immediately and delete the message and any > >>> attachment > >>>>>> from your system. Merck KGaA, Darmstadt, Germany and any of its > >>>>>> subsidiaries do not accept liability for any omissions or errors in > >>> this > >>>>>> message which may arise as a result of E-Mail-transmission or for > >>> damages > >>>>>> resulting from any unauthorized changes of the content of this > >>> message and > >>>>>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > >>>>>> subsidiaries do not guarantee that this message is free of viruses > >>> and does > >>>>>> not accept liability for any damages caused by any virus transmitted > >>>>>> therewith. > >>>>>> > >>>>>> Click http://www.emdgroup.com/disclaimer to access the German, > >>> French, > >>>>>> Spanish and Portuguese versions of this disclaimer. > >>>>>> ________________________________ > >>>>>> > >>>>>> This message is confidential and may contain privileged information. > >>> You > >>>>>> should not disclose its contents to any other person. If you are not > >>> the > >>>>>> intended recipient, please notify the sender named above immediately. > >>> It is > >>>>>> expressly declared that this e-mail does not constitute nor form part > >>> of a > >>>>>> contract or unilateral obligation. Opinions, conclusions and other > >>>>>> information in this message that do not relate to the official > >>> business of > >>>>>> findmypast shall be understood as neither given nor endorsed by it. > >>>>>> ________________________________ > >>>>>> > >>>>>> ____________________________________________________________ > >>>>>> ______________ > >>>>>> > >>>>>> This email has been checked for virus and other malicious content > >>> prior > >>>>>> to leaving our network. > >>>>>> ____________________________________________________________ > >>>>>> ______________ > >>>>>> > >>>>> > >>>>> > >>>> > >>>> -- > >>>> > >>>> > >>>> This message and any attachment are confidential and may be privileged or > >>>> otherwise protected from disclosure. If you are not the intended > >>> recipient, > >>>> you must not copy this message or attachment or disclose the contents to > >>>> any other person. If you have received this transmission in error, please > >>>> notify the sender immediately and delete the message and any attachment > >>>> from your system. Merck KGaA, Darmstadt, Germany and any of its > >>>> subsidiaries do not accept liability for any omissions or errors in this > >>>> message which may arise as a result of E-Mail-transmission or for damages > >>>> resulting from any unauthorized changes of the content of this message > >>> and > >>>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > >>>> subsidiaries do not guarantee that this message is free of viruses and > >>> does > >>>> not accept liability for any damages caused by any virus transmitted > >>>> therewith. > >>>> > >>>> Click http://www.emdgroup.com/disclaimer to access the German, French, > >>>> Spanish and Portuguese versions of this disclaimer. > >>>> > >>> > >> > >> -- > >> > >> > >> This message and any attachment are confidential and may be privileged or > >> otherwise protected from disclosure. If you are not the intended > >> recipient, > >> you must not copy this message or attachment or disclose the contents to > >> any other person. If you have received this transmission in error, please > >> notify the sender immediately and delete the message and any attachment > >> from your system. Merck KGaA, Darmstadt, Germany and any of its > >> subsidiaries do not accept liability for any omissions or errors in this > >> message which may arise as a result of E-Mail-transmission or for damages > >> resulting from any unauthorized changes of the content of this message and > >> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > >> subsidiaries do not guarantee that this message is free of viruses and > >> does > >> not accept liability for any damages caused by any virus transmitted > >> therewith. > >> > >> Click http://www.emdgroup.com/disclaimer to access the German, French, > >> Spanish and Portuguese versions of this disclaimer. > >> > >