People usually want to do some analysis during index time. This analysis
should be considered 'expensive', compared to any single query run. You
can think of it as indexing every day, over a 86400 second day, vs a 200 ms
query time.
Normally, you want to index as honestly as possible. That is,
It is very common for us to do more processing in the index analysis chain. In
general, we do that when we want additional terms in the index to be
searchable. Some examples:
* synonyms: If the book title is “EMT” add “Emergency Medical Technician”.
* ngrams: For prefix matching, generate all
When you want to do something different and index and query time. There, an
answer that’s almost, but not quite, completely useless while being accurate ;)
A concrete example is synonyms as have been mentioned. Say you have an
index-time synonym definition of
A,B,C
These three tokens will be
I gave an example of why you might want to analyze the corpus differently
from the query just yesterday -- see
https://lucene.472066.n3.nabble.com/Lowercase-ing-everything-but-acronyms-td4462899.html
-s
On Thu, Sep 10, 2020 at 11:19 AM Steven White wrote:
> Hi everyone,
>
> In
There are a lot of different use cases and the separate analyzers for
indexing and query is part of the Solr power. For example, you could
apply ngram during indexing time to generate multiple substrings. But
you don't want to do that during the query, because otherwise you are
matching on 'shared
Hi Steve
I have a real-world use case. We don't apply a synonym filter at index
time, but we do apply a managed synonym filter at query time. This allows
content managers to add new synonyms (or remove existing ones) "on the fly"
without having to reindex any documents.
Thomas
Op do 10 sep.
Hi Steven,
I can think of one case. If we have an index of database table or column
names, e.g., words like 'THIS_IS_A_TABLE_NAME', we may want to split the name
at the underscores when indexing (as well as keep the original), since the
individual parts might be significant and meaningful.