On May 21, 2010, at 10:35 AM, Robert Muir wrote:

> I honestly do not know the rationale behind this in Solr, except to
> say similar problems exist even if you reduce the scope to just
> casing:

Then why are you talking about stemming in the following example?  We know 
stemming is problematic with wildcard searching.  But casing... I argue not.

> For example, if you are using a german stemmer, it will case-fold ß to
> 'ss' (such that it will match SS).
> 
> So doing some lowercasing at query-time will not correct the situation
> for that character, and furthermore it will be inconsistent with the
> '?' operator... (which only matches one character)
> 
> On Fri, May 21, 2010 at 10:28 AM, Sascha Szott <sz...@zib.de> wrote:
>> Hi Robert,
>> 
>> thanks, you're absolutely right. I should better refine my initial question
>> to: What's the idea behind the fact that no *lowercasing* is performed on
>> wildcarded search terms if the field in question contains a LowercaseFilter
>> in its associated field type definition?
>> 
>> -Sascha
>> 
>> Robert Muir wrote:
>>> 
>>> we can use stemming as an example:
>>> 
>>> lets say your query is c?ns?st?nt?y
>>> 
>>> how will this match "consistently", which the porter stemmer
>>> transforms to 'consistent'.
>>> furthermore, note that i replaced the vowels with ?'s here. The porter
>>> stemmer doesnt just rip stuff off the end, but attempts to guess
>>> syllables as part of the process, so it cannot possibly work.
>>> 
>>> the only way it would work in this situation would be if you formed
>>> permutations of all the possible words this wildcard would match, and
>>> then did analysis on each form, and searched on all stems.
>>> 
>>> but, this is impossible, since the * operator allows an infinite language.
>>> 
>>> On Fri, May 21, 2010 at 10:11 AM, Sascha Szott<sz...@zib.de>  wrote:
>>>> 
>>>> Hi folks,
>>>> 
>>>> what's the idea behind the fact that no text analysis (e.g. lowercasing)
>>>> is
>>>> performed on wildcarded search terms?
>>>> 
>>>> In my context this behaviour seems to be counter-intuitive (I guess
>>>> that's
>>>> the case in the majority of applications) and my application needs to
>>>> lowercase any input term before sending the HTTP request to my Solr
>>>> server.
>>>> 
>>>> Would it be easy to disable this behaviour in Solr (1.5)? I would like to
>>>> see a config parameter (per field type) that allows to disable this "odd"
>>>> behaviour if needed. To ensure backward compatibility the "odd" behaviour
>>>> should be the default anymore.
>>>> 
>>>> Am I missing any drawbacks?
>>>> 
>>>> Best,
>>>> Sascha
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Robert Muir
> rcm...@gmail.com

Reply via email to