Re: solr wildcard queries and analyzers

Matti Oinas Wed, 12 Jan 2011 23:12:13 -0800

I'm little busy right now, but I'm going to try to find suitable
parser or if none is found then I think the only solution is to write
a new one.


2011/1/13 Jayendra Patil <jayendra.patil....@gmail.com>:
> Had the same issues with international characters and wildcard searches.
>
> One workaround we implemented, was to index the field with and without the
> ASCIIFoldingFilterFactory.
> You would have an original field and one with english equivalent to be used
> during searching.
>
> Wildcard searches with english equivalent or international terms would match
> either of those.
> Also, lowere case the search terms if you are using lowercasefilter during
> indexing.
>
> Reagrds,
> Jayendra
>
> On Wed, Jan 12, 2011 at 7:46 AM, Kári Hreinsson <k...@gagnavarslan.is>wrote:
>
>> Have you made any progress?  Since the AnalyzingQueryParser doesn't inherit
>> from QParserPlugin solr doesn't want to use it but I guess we could
>> implement a similar parser that does inherit from QParserPlugin?
>>
>> Switching parser seems to be what is needed?  Has really no one solved this
>> before?
>>
>> - Kári
>>
>> ----- Original Message -----
>> From: "Matti Oinas" <matti.oi...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, 11 January, 2011 12:47:52 PM
>> Subject: Re: solr wildcard queries and analyzers
>>
>> This might be the solution.
>>
>>
>> http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
>>
>> 2011/1/11 Matti Oinas <matti.oi...@gmail.com>:
>> > Sorry, the message was not meant to be sent here. We are struggling
>> > with the same problem here.
>> >
>> > 2011/1/11 Matti Oinas <matti.oi...@gmail.com>:
>> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers
>> >>
>> >> On wildcard and fuzzy searches, no text analysis is performed on the
>> >> search word.
>> >>
>> >> 2011/1/11 Kári Hreinsson <k...@gagnavarslan.is>:
>> >>> Hi,
>> >>>
>> >>> I am having a problem with the fact that no text analysis are performed
>> on wildcard queries.  I have the following field type (a bit simplified):
>> >>>    <fieldType name="text" class="solr.TextField"
>> positionIncrementGap="100">
>> >>>      <analyzer>
>> >>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>> >>>        <filter class="solr.TrimFilterFactory" />
>> >>>        <filter class="solr.LowerCaseFilterFactory" />
>> >>>        <filter class="solr.ASCIIFoldingFilterFactory" />
>> >>>      </analyzer>
>> >>>    </fieldType>
>> >>>
>> >>> My problem has to do with Icelandic characters, when I index a document
>> with a text field including the word "sjálfsögðu" it gets indexed as
>> "sjalfsogdu" (because of the ASCIIFoldingFilterFactory which replaces the
>> Icelandic characters with their English equivalents).  Then, when I search
>> (without a wildcard) for "sjálfsögðu" or "sjalfsogdu" I get that document as
>> a result.  This is convenient since it enables people to search without
>> using accented characters and yet get the results they want (e.g. if they
>> are working on computers with English keyboards).
>> >>>
>> >>> However this all falls apart when using wildcard searches, then the
>> search string isn't passed through the filters, and even if I search for
>> "sjálf*" I don't get any results because the index doesn't contain the
>> original words (I get result if I search for "sjalf*").  I know people have
>> been having a similar problem with the case sensitivity of wildcard queries
>> and most often the solution seems to be to lowercase the string before
>> passing it on to solr, which is not exactly an optimal solution (yet a
>> simple one in that case).  The Icelandic characters complicate things a bit
>> and applying the same solution (doing the lowercasing and character mapping)
>> in my application seems like unnecessary duplication of code already part of
>> solr, not to mention complication of my application and possible maintenance
>> down the road.
>> >>>
>> >>> Is there any way around this?  How are people solving this?  Is there a
>> way to apply the filters to wildcard queries?  I guess removing the
>> ASCIIFoldingFilterFactory is the simplest "solution" but this
>> "normalization" (of the text done by the filter) is often very useful.
>> >>>
>> >>> I hope I'm not overlooking some obvious explanation. :/
>> >>>
>> >>> Thanks in advance,
>> >>> Kári Hreinsson
>> >>>
>> >>
>> >
>>
>

Re: solr wildcard queries and analyzers

Reply via email to