Perhaps you could use two indexed fields, one with synonym expansion and one without.
wunder On Dec 12, 2012, at 11:33 PM, Burgmans, Tom wrote: > In our case it's the opposite. For our clients it is very important that > every synonym gets equal chances in the relevancy calculation. The fact that > "nol" scores higher than "net operating loss", simply because its document > frequency is lower, is unacceptable and a reason to look for ways to disable > the IDF from the score calculation. But that is in fact something I don't > like to do since IDF is such an elementary part of the algorithm (and very > useful for non-synonym searches). > > Pre-processing synonyms to apply 'reverse weighting' is also a strategy to > consider but I agree with Walter that this very error-prone, things could get > easily out of sync. Moreover, none of our Dev-, QA-, STG-, PRD- environment > contain exactly the same content, so it would require different tuned > synonyms dictionary for each of them...meh... > > In our previous search engine (FAST ESP) we basically switched off IDF, but I > am still a bit hoping that there is a more sophisticated solution with Solr. > > > -----Original Message----- > From: Walter Underwood [mailto:wun...@wunderwood.org] > Sent: Thursday 13 December 2012 02:30 > To: solr-user@lucene.apache.org > Subject: Re: Can a field with defined synonym be searched without the synonym? > > All of the applications I've seen with user control over synonym expansion > where recall-oriented. The "give me all matches for X" kind of problem. So > ranking is not as important. > > wunder > > On Dec 12, 2012, at 5:23 PM, Roman Chyla wrote: > >> Well, this IDF problem has more sides. So, let's say your synonym file >> contains multi-token synonyms (it does, right? or perhaps you don't need >> it? well, some people do) >> >> "TV, TV set, TV foo, television" >> >> if you use the default synonym expansion, when you index 'television' >> >> you have increased frequency of also 'set', 'foo', so, the IDF of 'TV' is >> the same as that of 'television' - but IDF of 'foo' and 'set' has changed >> (their frequency increased, their IDF decreased) -- TV's have in fact made >> 'foo' term very frequent and undesirable >> >> So, you might be sure that IDF of 'TV' and 'television' are the same, but >> you are not aware it has 'screwed' other (desirable) terms - so it really >> depends. And I wouldn't argue these cases are esoteric. >> >> And finally: there are use cases out there, where people NEED to switch off >> synonym expansion at will (find only these documents, that contain the word >> 'TV' and not that bloody 'foo'). This cannot be done if the index contains >> all synonym terms (unless you have a way to mark the original and the >> synonym in the index). >> >> roman >> >> >> On Wed, Dec 12, 2012 at 12:50 PM, Walter Underwood >> <wun...@wunderwood.org>wrote: >> >>> Query parsers cannot fix the IDF problem or make query-time synonyms >>> faster. Query synonym expansion makes more search terms. More search terms >>> are more work at query time. >>> >>> The IDF problem is real; I've run up against it. The most rare variant of >>> the synonym have the highest score. This probably the opposite of what you >>> want. For me, it was "TV" and "television". Documents with "TV" had higher >>> scores than those with "television". >>> >>> wunder >>> >>> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote: >>> >>>> @wunder >>>> It is a misconception (well, supported by that wiki description) that the >>>> query time synonym filter have these problems. It is actually the default >>>> parser, that is causing these problems. Look at this if you still think >>>> that index time synonyms are cure for all: >>>> https://issues.apache.org/jira/browse/LUCENE-4499 >>>> >>>> @joe >>>> If you can use the flexible query parser (as linked in by @Swati) then >>> all >>>> you need to do is to define a different field with a different tokenizer >>>> chain and then swap the field names before the analyzers processes the >>>> document (and then rewrite the field name back - for example, we have >>>> fields called "author" and "author_nosyn") >>>> >>>> roman >>>> >>>> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood < >>> wun...@wunderwood.org>wrote: >>>> >>>>> Query time synonyms have known problems. They are slower, cause >>> incorrect >>>>> IDF, and don't work for phrase synonyms. >>>>> >>>>> Apply synonyms at index time and you will have none of those problems. >>>>> >>>>> See: >>>>> >>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory >>>>> >>>>> wunder >>>>> >>>>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote: >>>>> >>>>>> Query-time analyzers are still applied, even if you include a string in >>>>> quotes. Would you expect "foo" to not match "Foo" just because it's >>>>> enclosed in quotes? >>>>>> >>>>>> Also look at this, someone who had similar requirements: >>>>>> >>>>> >>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com] >>>>>> Sent: Wednesday, December 12, 2012 12:09 PM >>>>>> To: solr-user@lucene.apache.org >>>>>> Subject: Re: Can a field with defined synonym be searched without the >>>>> synonym? >>>>>> >>>>>> >>>>>> I'm aplying only query-time synonym, so I have the original values >>>>> stored and indexed. >>>>>> I would've expected that if I search a strin with quotations, i'll get >>>>> the exact match, without applying a synonym. >>>>>> >>>>>> any way to achieve that? >>>>>> >>>>>> >>>>>> Upayavira wrote >>>>>>> You can only search against terms that are stored in your index. If >>>>>>> you have applied index time synonyms, you can't remove them at query >>>>> time. >>>>>>> >>>>>>> You can, however, use copyField to clone an incoming field to another >>>>>>> field that doesn't use synonyms, and search against that field >>> instead. >>>>>>> >>>>>>> Upayavira >>>>>>> >>>>>>> On Wed, Dec 12, 2012, at 04:26 PM, >>>>>> >>>>>>> joe.cohen.m@ >>>>>> >>>>>>> wrote: >>>>>>>> Hi >>>>>>>> I hava a field type without defined synonym.txt which retrieves both >>>>>>>> records with "home" and "house" when I search either one of them. >>>>>>>> >>>>>>>> I want to be able to search this field on the specific value that I >>>>>>>> enter, without the synonym filter. >>>>>>>> >>>>>>>> is it possible? >>>>>>>> >>>>>>>> thanks. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> >>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b >>>>>>>> e-searched-without-the-synonym-tp4026381.html >>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>> >>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html >>>>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>>> >>>>> -- >>>>> Walter Underwood >>>>> wun...@wunderwood.org >>>>> >>>>> >>>>> >>>>> >>> >>> -- >>> Walter Underwood >>> wun...@wunderwood.org >>> >>> >>> >>> > > -- > Walter Underwood > wun...@wunderwood.org > > > > > This email and any attachments may contain confidential or privileged > information > and is intended for the addressee only. If you are not the intended > recipient, please > immediately notify us by email or telephone and delete the original email and > attachments > without using, disseminating or reproducing its contents to anyone other than > the intended > recipient. Wolters Kluwer shall not be liable for the incorrect or incomplete > transmission of > of this email or any attachments, nor for unauthorized use by its employees. > > Wolters Kluwer nv has its registered address in Alphen aan den Rijn, The > Netherlands, and is registered > with the Trade Registry of the Dutch Chamber of Commerce under number > 33202517. -- Walter Underwood wun...@wunderwood.org