I'm little busy right now, but I'm going to try to find suitable parser or if none is found then I think the only solution is to write a new one.
2011/1/13 Jayendra Patil <jayendra.patil....@gmail.com>: > Had the same issues with international characters and wildcard searches. > > One workaround we implemented, was to index the field with and without the > ASCIIFoldingFilterFactory. > You would have an original field and one with english equivalent to be used > during searching. > > Wildcard searches with english equivalent or international terms would match > either of those. > Also, lowere case the search terms if you are using lowercasefilter during > indexing. > > Reagrds, > Jayendra > > On Wed, Jan 12, 2011 at 7:46 AM, Kári Hreinsson <k...@gagnavarslan.is>wrote: > >> Have you made any progress? Since the AnalyzingQueryParser doesn't inherit >> from QParserPlugin solr doesn't want to use it but I guess we could >> implement a similar parser that does inherit from QParserPlugin? >> >> Switching parser seems to be what is needed? Has really no one solved this >> before? >> >> - Kári >> >> ----- Original Message ----- >> From: "Matti Oinas" <matti.oi...@gmail.com> >> To: solr-user@lucene.apache.org >> Sent: Tuesday, 11 January, 2011 12:47:52 PM >> Subject: Re: solr wildcard queries and analyzers >> >> This might be the solution. >> >> >> http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html >> >> 2011/1/11 Matti Oinas <matti.oi...@gmail.com>: >> > Sorry, the message was not meant to be sent here. We are struggling >> > with the same problem here. >> > >> > 2011/1/11 Matti Oinas <matti.oi...@gmail.com>: >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers >> >> >> >> On wildcard and fuzzy searches, no text analysis is performed on the >> >> search word. >> >> >> >> 2011/1/11 Kári Hreinsson <k...@gagnavarslan.is>: >> >>> Hi, >> >>> >> >>> I am having a problem with the fact that no text analysis are performed >> on wildcard queries. I have the following field type (a bit simplified): >> >>> <fieldType name="text" class="solr.TextField" >> positionIncrementGap="100"> >> >>> <analyzer> >> >>> <tokenizer class="solr.WhitespaceTokenizerFactory" /> >> >>> <filter class="solr.TrimFilterFactory" /> >> >>> <filter class="solr.LowerCaseFilterFactory" /> >> >>> <filter class="solr.ASCIIFoldingFilterFactory" /> >> >>> </analyzer> >> >>> </fieldType> >> >>> >> >>> My problem has to do with Icelandic characters, when I index a document >> with a text field including the word "sjálfsögðu" it gets indexed as >> "sjalfsogdu" (because of the ASCIIFoldingFilterFactory which replaces the >> Icelandic characters with their English equivalents). Then, when I search >> (without a wildcard) for "sjálfsögðu" or "sjalfsogdu" I get that document as >> a result. This is convenient since it enables people to search without >> using accented characters and yet get the results they want (e.g. if they >> are working on computers with English keyboards). >> >>> >> >>> However this all falls apart when using wildcard searches, then the >> search string isn't passed through the filters, and even if I search for >> "sjálf*" I don't get any results because the index doesn't contain the >> original words (I get result if I search for "sjalf*"). I know people have >> been having a similar problem with the case sensitivity of wildcard queries >> and most often the solution seems to be to lowercase the string before >> passing it on to solr, which is not exactly an optimal solution (yet a >> simple one in that case). The Icelandic characters complicate things a bit >> and applying the same solution (doing the lowercasing and character mapping) >> in my application seems like unnecessary duplication of code already part of >> solr, not to mention complication of my application and possible maintenance >> down the road. >> >>> >> >>> Is there any way around this? How are people solving this? Is there a >> way to apply the filters to wildcard queries? I guess removing the >> ASCIIFoldingFilterFactory is the simplest "solution" but this >> "normalization" (of the text done by the filter) is often very useful. >> >>> >> >>> I hope I'm not overlooking some obvious explanation. :/ >> >>> >> >>> Thanks in advance, >> >>> Kári Hreinsson >> >>> >> >> >> > >> >