I prefer fuzzy search for misspellings. Solr does a very nice job with those, 
weighting them by the similarity to the matched term.

wunder

On Dec 12, 2012, at 4:45 PM, Jack Krupansky wrote:

> Another great use case for synonyms is misspellings. I saw one synonym list 
> in which the top synonym was the phrase "dead mouse" (which doesn't look 
> misspelled at all); I won't tell you what it's "proper" synonym was, other 
> than to say that it was VERY app/culture-dependent. It was also interesting 
> because the user's original query phrase needed to be given a much lower 
> weighting in order to find what the user was "likely" looking for.
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: Walter Underwood
> Sent: Wednesday, December 12, 2012 7:16 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can a field with defined synonym be searched without the synonym?
> 
> If you have tons of content, you can do selective reindexing. You only need 
> to reindex the docs containing the the new terms. If I add a synonym for 
> "babysitter" and "baby sitter", then I can do a search for documents 
> containing either of those, and only reindex those.
> 
> Reverse weighting to even out the IDF would work, but it could be pretty 
> tweaky. If one synonym is very rare, you put in small weight, but then you 
> index several documents with that term and the it is overweighted.
> 
> wunder
> 
> On Dec 12, 2012, at 4:09 PM, Jack Krupansky wrote:
> 
>> Sure, synonyms have lots of issues and choosing index vs. query is simply 
>> picking your poison, but it all depends on your app and your data and your 
>> user expectations, and you, the developer, have tools to moderate a lot of 
>> these issues.
>> 
>> Index-time synonyms have the problem (among others) that they cannot be 
>> changed without reindexing.
>> 
>> One technique is to simulate the query-time synonym filter expansion by 
>> having your app preprocess user queries to expand to the OR of the synonyms 
>> and then boost or de-boost the synonyms as makes sense for your app.
>> 
>> For example,
>> 
>>  (tv^0.5 OR television^2.5 OR "boob tube"^0.0001)
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- From: Steve Rowe
>> Sent: Wednesday, December 12, 2012 5:28 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Can a field with defined synonym be searched without the 
>> synonym?
>> 
>> Hmm, I've gotten this very wrong :) - DisjunctionMaxQuery will operate 
>> per-doc, so using it in the way I suggested will not allow for synonym IDF 
>> leveling across documents.  Also, scoring obviously includes more factors 
>> than IDF.
>> 
>> On Dec 12, 2012, at 5:18 PM, Steve Rowe <sar...@gmail.com> wrote:
>> 
>>> But couldn't the IDF problem be fixed by applying the same IDF to all 
>>> synonyms, e.g. via DisjunctionMaxQuery?  (Maybe the ideal would be an 
>>> average, not a max.)
>>> 
>>> (E)dismax applies this query per-field, but AFAICT there is nothing 
>>> stopping anybody (modulo query parser construction :) ) from using it on 
>>> synonyms in the same field.
>>> 
>>> Steve
>>> 
>>> On Dec 12, 2012, at 12:50 PM, Walter Underwood <wun...@wunderwood.org> 
>>> wrote:
>>> 
>>>> Query parsers cannot fix the IDF problem or make query-time synonyms 
>>>> faster. Query synonym expansion makes more search terms. More search terms 
>>>> are more work at query time.
>>>> 
>>>> The IDF problem is real; I've run up against it. The most rare variant of 
>>>> the synonym have the highest score. This probably the opposite of what you 
>>>> want. For me, it was "TV" and "television". Documents with "TV" had higher 
>>>> scores than those with "television".
>>>> 
>>>> wunder
>>>> 
>>>> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
>>>> 
>>>>> @wunder
>>>>> It is a misconception (well, supported by that wiki description) that the
>>>>> query time synonym filter have these problems. It is actually the default
>>>>> parser, that is causing these problems. Look at this if you still think
>>>>> that index time synonyms are cure for all:
>>>>> https://issues.apache.org/jira/browse/LUCENE-4499
>>>>> 
>>>>> @joe
>>>>> If you can use the flexible query parser (as linked in by @Swati) then all
>>>>> you need to do is to define a different field with a different tokenizer
>>>>> chain and then swap the field names before the analyzers processes the
>>>>> document (and then rewrite the field name back - for example, we have
>>>>> fields called "author" and "author_nosyn")
>>>>> 
>>>>> roman
>>>>> 
>>>>> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
>>>>> <wun...@wunderwood.org>wrote:
>>>>> 
>>>>>> Query time synonyms have known problems. They are slower, cause incorrect
>>>>>> IDF, and don't work for phrase synonyms.
>>>>>> 
>>>>>> Apply synonyms at index time and you will have none of those problems.
>>>>>> 
>>>>>> See:
>>>>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>>>>> 
>>>>>> wunder
>>>>>> 
>>>>>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>>>>>> 
>>>>>>> Query-time analyzers are still applied, even if you include a string in
>>>>>> quotes. Would you expect "foo" to not match "Foo" just because it's
>>>>>> enclosed in quotes?
>>>>>>> 
>>>>>>> Also look at this, someone who had similar requirements:
>>>>>>> 
>>>>>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
>>>>>>> 
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
>>>>>>> Sent: Wednesday, December 12, 2012 12:09 PM
>>>>>>> To: solr-user@lucene.apache.org
>>>>>>> Subject: Re: Can a field with defined synonym be searched without the
>>>>>> synonym?
>>>>>>> 
>>>>>>> 
>>>>>>> I'm aplying only query-time synonym, so I have the original values
>>>>>> stored and indexed.
>>>>>>> I would've expected that if I search a strin with quotations, i'll get
>>>>>> the exact match, without applying a synonym.
>>>>>>> 
>>>>>>> any way to achieve that?
>>>>>>> 
>>>>>>> 
>>>>>>> Upayavira wrote
>>>>>>>> You can only search against terms that are stored in your index. If
>>>>>>>> you have applied index time synonyms, you can't remove them at query
>>>>>> time.
>>>>>>>> 
>>>>>>>> You can, however, use copyField to clone an incoming field to another
>>>>>>>> field that doesn't use synonyms, and search against that field instead.
>>>>>>>> 
>>>>>>>> Upayavira
>>>>>>>> 
>>>>>>>> On Wed, Dec 12, 2012, at 04:26 PM,
>>>>>>> 
>>>>>>>> joe.cohen.m@
>>>>>>> 
>>>>>>>> wrote:
>>>>>>>>> Hi
>>>>>>>>> I hava a field type without defined synonym.txt which retrieves both
>>>>>>>>> records with "home" and "house" when I search either one of them.
>>>>>>>>> 
>>>>>>>>> I want to be able to search this field on the specific value that I
>>>>>>>>> enter, without the synonym filter.
>>>>>>>>> 
>>>>>>>>> is it possible?
>>>>>>>>> 
>>>>>>>>> thanks.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> View this message in context:
>>>>>>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>>>>>>>> e-searched-without-the-synonym-tp4026381.html
>>>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> View this message in context:
>>>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>> 
>>>>>> --
>>>>>> Walter Underwood
>>>>>> wun...@wunderwood.org
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> --
>>>> Walter Underwood
>>>> wun...@wunderwood.org
>>>> 
>>>> 
>>>> 
> 
> --
> Walter Underwood
> wun...@wunderwood.org
> 
> 
> 

--
Walter Underwood
wun...@wunderwood.org



Reply via email to