Perhaps you could use two indexed fields, one with synonym expansion and one 
without.

wunder

On Dec 12, 2012, at 11:33 PM, Burgmans, Tom wrote:

> In our case it's the opposite. For our clients it is very important that 
> every synonym gets equal chances in the relevancy calculation. The fact that 
> "nol" scores higher than "net operating loss", simply because its document 
> frequency is lower, is unacceptable and a reason to look for ways to disable 
> the IDF from the score calculation. But that is in fact something I don't 
> like to do since IDF is such an elementary part of the algorithm (and very 
> useful for non-synonym searches).
> 
> Pre-processing synonyms to apply 'reverse weighting' is also a strategy to 
> consider but I agree with Walter that this very error-prone, things could get 
> easily out of sync. Moreover, none of our Dev-, QA-, STG-, PRD- environment 
> contain exactly the same content, so it would require different tuned 
> synonyms dictionary for each of them...meh...
> 
> In our previous search engine (FAST ESP) we basically switched off IDF, but I 
> am still a bit hoping that there is a more sophisticated solution with Solr.
> 
> 
> -----Original Message-----
> From: Walter Underwood [mailto:wun...@wunderwood.org]
> Sent: Thursday 13 December 2012 02:30
> To: solr-user@lucene.apache.org
> Subject: Re: Can a field with defined synonym be searched without the synonym?
> 
> All of the applications I've seen with user control over synonym expansion 
> where recall-oriented. The "give me all matches for X" kind of problem. So 
> ranking is not as important.
> 
> wunder
> 
> On Dec 12, 2012, at 5:23 PM, Roman Chyla wrote:
> 
>> Well, this IDF problem has more sides. So, let's say your synonym file
>> contains multi-token synonyms (it does, right? or perhaps you don't need
>> it? well, some people do)
>> 
>> "TV, TV set, TV foo, television"
>> 
>> if you use the default synonym expansion, when you index 'television'
>> 
>> you have increased frequency of also 'set', 'foo', so, the IDF of 'TV' is
>> the same as that of 'television' - but IDF of 'foo' and 'set' has changed
>> (their frequency increased, their IDF decreased) -- TV's have in fact made
>> 'foo' term very frequent and undesirable
>> 
>> So, you might be sure that IDF of 'TV' and 'television' are the same, but
>> you are not aware it has 'screwed' other (desirable) terms - so it really
>> depends. And I wouldn't argue these cases are esoteric.
>> 
>> And finally: there are use cases out there, where people NEED to switch off
>> synonym expansion at will (find only these documents, that contain the word
>> 'TV' and not that bloody 'foo'). This cannot be done if the index contains
>> all synonym terms (unless you have a way to mark the original and the
>> synonym in the index).
>> 
>> roman
>> 
>> 
>> On Wed, Dec 12, 2012 at 12:50 PM, Walter Underwood 
>> <wun...@wunderwood.org>wrote:
>> 
>>> Query parsers cannot fix the IDF problem or make query-time synonyms
>>> faster. Query synonym expansion makes more search terms. More search terms
>>> are more work at query time.
>>> 
>>> The IDF problem is real; I've run up against it. The most rare variant of
>>> the synonym have the highest score. This probably the opposite of what you
>>> want. For me, it was "TV" and "television". Documents with "TV" had higher
>>> scores than those with "television".
>>> 
>>> wunder
>>> 
>>> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
>>> 
>>>> @wunder
>>>> It is a misconception (well, supported by that wiki description) that the
>>>> query time synonym filter have these problems. It is actually the default
>>>> parser, that is causing these problems. Look at this if you still think
>>>> that index time synonyms are cure for all:
>>>> https://issues.apache.org/jira/browse/LUCENE-4499
>>>> 
>>>> @joe
>>>> If you can use the flexible query parser (as linked in by @Swati) then
>>> all
>>>> you need to do is to define a different field with a different tokenizer
>>>> chain and then swap the field names before the analyzers processes the
>>>> document (and then rewrite the field name back - for example, we have
>>>> fields called "author" and "author_nosyn")
>>>> 
>>>> roman
>>>> 
>>>> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood <
>>> wun...@wunderwood.org>wrote:
>>>> 
>>>>> Query time synonyms have known problems. They are slower, cause
>>> incorrect
>>>>> IDF, and don't work for phrase synonyms.
>>>>> 
>>>>> Apply synonyms at index time and you will have none of those problems.
>>>>> 
>>>>> See:
>>>>> 
>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>>>> 
>>>>> wunder
>>>>> 
>>>>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>>>>> 
>>>>>> Query-time analyzers are still applied, even if you include a string in
>>>>> quotes. Would you expect "foo" to not match "Foo" just because it's
>>>>> enclosed in quotes?
>>>>>> 
>>>>>> Also look at this, someone who had similar requirements:
>>>>>> 
>>>>> 
>>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
>>>>>> 
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
>>>>>> Sent: Wednesday, December 12, 2012 12:09 PM
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Subject: Re: Can a field with defined synonym be searched without the
>>>>> synonym?
>>>>>> 
>>>>>> 
>>>>>> I'm aplying only query-time synonym, so I have the original values
>>>>> stored and indexed.
>>>>>> I would've expected that if I search a strin with quotations, i'll get
>>>>> the exact match, without applying a synonym.
>>>>>> 
>>>>>> any way to achieve that?
>>>>>> 
>>>>>> 
>>>>>> Upayavira wrote
>>>>>>> You can only search against terms that are stored in your index. If
>>>>>>> you have applied index time synonyms, you can't remove them at query
>>>>> time.
>>>>>>> 
>>>>>>> You can, however, use copyField to clone an incoming field to another
>>>>>>> field that doesn't use synonyms, and search against that field
>>> instead.
>>>>>>> 
>>>>>>> Upayavira
>>>>>>> 
>>>>>>> On Wed, Dec 12, 2012, at 04:26 PM,
>>>>>> 
>>>>>>> joe.cohen.m@
>>>>>> 
>>>>>>> wrote:
>>>>>>>> Hi
>>>>>>>> I hava a field type without defined synonym.txt which retrieves both
>>>>>>>> records with "home" and "house" when I search either one of them.
>>>>>>>> 
>>>>>>>> I want to be able to search this field on the specific value that I
>>>>>>>> enter, without the synonym filter.
>>>>>>>> 
>>>>>>>> is it possible?
>>>>>>>> 
>>>>>>>> thanks.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> 
>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>>>>>>> e-searched-without-the-synonym-tp4026381.html
>>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> View this message in context:
>>>>> 
>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>> 
>>>>> --
>>>>> Walter Underwood
>>>>> wun...@wunderwood.org
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> --
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> 
>>> 
>>> 
>>> 
> 
> --
> Walter Underwood
> wun...@wunderwood.org
> 
> 
> 
> 
> This email and any attachments may contain confidential or privileged 
> information
> and is intended for the addressee only. If you are not the intended 
> recipient, please
> immediately notify us by email or telephone and delete the original email and 
> attachments
> without using, disseminating or reproducing its contents to anyone other than 
> the intended
> recipient. Wolters Kluwer shall not be liable for the incorrect or incomplete 
> transmission of
> of this email or any attachments, nor for unauthorized use by its employees.
> 
> Wolters Kluwer nv has its registered address in Alphen aan den Rijn, The 
> Netherlands, and is registered
> with the Trade Registry of the Dutch Chamber of Commerce under number 
> 33202517.

--
Walter Underwood
wun...@wunderwood.org



Reply via email to