Re: Why use a different analyzer for "index" and "query"?

Erick Erickson Thu, 10 Sep 2020 09:49:19 -0700

When you want to do something different and index and query time. There, an 
answer that’s almost, but not quite, completely useless while being accurate ;)


A concrete example is synonyms as have been mentioned. Say you have an 
index-time synonym definition of
A,B,C

These three tokens will be “stacked” in the index wherever any of them are 
found. 
A query "q=field:B” would find a document with any of the three tokens in the 
original. It would be wasteful for the query to be transformed into “q=field:(A 
B C)”…

And take a very close look at WordDelimiterGraphFilterFactory. I’m pretty sure 
you’ll find the parameters are different. Say the parameters for the input 
123-456-7890 cause WDGFF to add
123, 456, 7890, 1234567890 to the index. Again, at query time you don’t need to 
repeat and have all of those tokens in the search itself.

Best,
Erick

> On Sep 10, 2020, at 12:41 PM, Alexandre Rafalovitch <arafa...@gmail.com> 
> wrote:
> 
> There are a lot of different use cases and the separate analyzers for
> indexing and query is part of the Solr power. For example, you could
> apply ngram during indexing time to generate multiple substrings. But
> you don't want to do that during the query, because otherwise you are
> matching on 'shared prefix' instead of on what user entered. Thinking
> phone number directory where people may enter any suffix and you want
> to match it.
> See for example
> https://www.slideshare.net/arafalov/rapid-solr-schema-development-phone-directory
> , starting slide 16 onwards.
> 
> Or, for non-production but fun use case:
> https://github.com/arafalov/solr-thai-test/blob/master/collection1/conf/schema.xml#L34-L55
> (search phonetically mapped Thai text in English).
> 
> Similarly, you may want to apply synonyms at query time only if you
> want to avoid diluting some relevancy. Or at index type to normalize
> spelling and help relevancy.
> 
> Or you may want to be doing some accent folding for sorting or
> faceting (which uses indexed tokens).
> 
> Regards,
>   Alex.
> 
> On Thu, 10 Sep 2020 at 11:19, Steven White <swhite4...@gmail.com> wrote:
>> 
>> Hi everyone,
>> 
>> In Solr's schema, I have come across field types that use a different logic
>> for "index" than for "query".  To be clear, I"m talking about this block:
>> 
>>    <fieldType name="text_en" class="solr.TextField"
>> positionIncrementGap="100">
>>      <analyzer type="index">
>>   <!-- what you see in this block doesn't have to be the same as what you
>> see inside "query" block -->
>>      </analyzer>
>>      <analyzer type="query">
>>   <!-- what you see in this block doesn't have to be the same as what you
>> see inside "index" block -->
>>      </analyzer>
>>    </fieldType>
>> 
>> Why would one want to not use the same logic for both and simply use:
>> 
>>    <fieldType name="text_en" class="solr.TextField"
>> positionIncrementGap="100">
>>      <analyzer>
>>   <!-- same logic to be used by for "index" and "query" -->
>>      </analyzer>
>>    </fieldType>
>> 
>> What are real word use cases to use a different analyzer for index and
>> query?
>> 
>> Thanks,
>> 
>> Steve

Re: Why use a different analyzer for "index" and "query"?

Reply via email to