Have you made any progress? Since the AnalyzingQueryParser doesn't inherit from QParserPlugin solr doesn't want to use it but I guess we could implement a similar parser that does inherit from QParserPlugin?
Switching parser seems to be what is needed? Has really no one solved this before? - Kári ----- Original Message ----- From: "Matti Oinas" <matti.oi...@gmail.com> To: solr-user@lucene.apache.org Sent: Tuesday, 11 January, 2011 12:47:52 PM Subject: Re: solr wildcard queries and analyzers This might be the solution. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html 2011/1/11 Matti Oinas <matti.oi...@gmail.com>: > Sorry, the message was not meant to be sent here. We are struggling > with the same problem here. > > 2011/1/11 Matti Oinas <matti.oi...@gmail.com>: >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers >> >> On wildcard and fuzzy searches, no text analysis is performed on the >> search word. >> >> 2011/1/11 Kári Hreinsson <k...@gagnavarslan.is>: >>> Hi, >>> >>> I am having a problem with the fact that no text analysis are performed on >>> wildcard queries. I have the following field type (a bit simplified): >>> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> >>> <analyzer> >>> <tokenizer class="solr.WhitespaceTokenizerFactory" /> >>> <filter class="solr.TrimFilterFactory" /> >>> <filter class="solr.LowerCaseFilterFactory" /> >>> <filter class="solr.ASCIIFoldingFilterFactory" /> >>> </analyzer> >>> </fieldType> >>> >>> My problem has to do with Icelandic characters, when I index a document >>> with a text field including the word "sjálfsögðu" it gets indexed as >>> "sjalfsogdu" (because of the ASCIIFoldingFilterFactory which replaces the >>> Icelandic characters with their English equivalents). Then, when I search >>> (without a wildcard) for "sjálfsögðu" or "sjalfsogdu" I get that document >>> as a result. This is convenient since it enables people to search without >>> using accented characters and yet get the results they want (e.g. if they >>> are working on computers with English keyboards). >>> >>> However this all falls apart when using wildcard searches, then the search >>> string isn't passed through the filters, and even if I search for "sjálf*" >>> I don't get any results because the index doesn't contain the original >>> words (I get result if I search for "sjalf*"). I know people have been >>> having a similar problem with the case sensitivity of wildcard queries and >>> most often the solution seems to be to lowercase the string before passing >>> it on to solr, which is not exactly an optimal solution (yet a simple one >>> in that case). The Icelandic characters complicate things a bit and >>> applying the same solution (doing the lowercasing and character mapping) in >>> my application seems like unnecessary duplication of code already part of >>> solr, not to mention complication of my application and possible >>> maintenance down the road. >>> >>> Is there any way around this? How are people solving this? Is there a way >>> to apply the filters to wildcard queries? I guess removing the >>> ASCIIFoldingFilterFactory is the simplest "solution" but this >>> "normalization" (of the text done by the filter) is often very useful. >>> >>> I hope I'm not overlooking some obvious explanation. :/ >>> >>> Thanks in advance, >>> Kári Hreinsson >>> >> >