On May 21, 2010, at 10:35 AM, Robert Muir wrote: > I honestly do not know the rationale behind this in Solr, except to > say similar problems exist even if you reduce the scope to just > casing:
Then why are you talking about stemming in the following example? We know stemming is problematic with wildcard searching. But casing... I argue not. > For example, if you are using a german stemmer, it will case-fold ß to > 'ss' (such that it will match SS). > > So doing some lowercasing at query-time will not correct the situation > for that character, and furthermore it will be inconsistent with the > '?' operator... (which only matches one character) > > On Fri, May 21, 2010 at 10:28 AM, Sascha Szott <sz...@zib.de> wrote: >> Hi Robert, >> >> thanks, you're absolutely right. I should better refine my initial question >> to: What's the idea behind the fact that no *lowercasing* is performed on >> wildcarded search terms if the field in question contains a LowercaseFilter >> in its associated field type definition? >> >> -Sascha >> >> Robert Muir wrote: >>> >>> we can use stemming as an example: >>> >>> lets say your query is c?ns?st?nt?y >>> >>> how will this match "consistently", which the porter stemmer >>> transforms to 'consistent'. >>> furthermore, note that i replaced the vowels with ?'s here. The porter >>> stemmer doesnt just rip stuff off the end, but attempts to guess >>> syllables as part of the process, so it cannot possibly work. >>> >>> the only way it would work in this situation would be if you formed >>> permutations of all the possible words this wildcard would match, and >>> then did analysis on each form, and searched on all stems. >>> >>> but, this is impossible, since the * operator allows an infinite language. >>> >>> On Fri, May 21, 2010 at 10:11 AM, Sascha Szott<sz...@zib.de> wrote: >>>> >>>> Hi folks, >>>> >>>> what's the idea behind the fact that no text analysis (e.g. lowercasing) >>>> is >>>> performed on wildcarded search terms? >>>> >>>> In my context this behaviour seems to be counter-intuitive (I guess >>>> that's >>>> the case in the majority of applications) and my application needs to >>>> lowercase any input term before sending the HTTP request to my Solr >>>> server. >>>> >>>> Would it be easy to disable this behaviour in Solr (1.5)? I would like to >>>> see a config parameter (per field type) that allows to disable this "odd" >>>> behaviour if needed. To ensure backward compatibility the "odd" behaviour >>>> should be the default anymore. >>>> >>>> Am I missing any drawbacks? >>>> >>>> Best, >>>> Sascha >> >> >> >> > > > > -- > Robert Muir > rcm...@gmail.com