Basically you need to use a copyField, but in several variants:

If you use the field _exclusively_ for highlighting then store the raw
content there and have the field use whatever analyzer you want. You
do _not_ need to have indexed="true" set for the field if you're
highlighting on the fly. So you're searching against field1 (which has
indexed="true" stored="false" set) but highlighting against field2
(which has indexed="false" stored="true" set). Of course any time you
want to return the contents in a doc your fl needs to specify
field2...

The above does not bloat your index at all since the cost of
stored="true" indexed="true" is the same as if you use two fields,
each with only one option turned on.

The second approach if you want to use FastVectorHighlighter or the
like is simply to index both fields.

Best,
Erick

On Thu, Mar 22, 2018 at 2:18 AM, Arturas Mazeika <maze...@gmail.com> wrote:
> Hi Solr-Users,
>
> I've been playing with a german collection of documents, where I tried to
> search for one word (q=Tag) and highlighted another: (hl.q=Kundigung). Is
> this a "legal" use case? My key question is how can I tell solr which query
> analyzer to use for highlighting? Strictly speaking, I should use
> hl.q=Kündigung to conceptually look for relevant information, but in this
> case, no highlighting is returned (as all umlauts are left out in the
> index) .
>
> Additional infos:
>
> solr version: 7.2
> urls to query:
>
> http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=true&hl.fl=trans&hl.q=Kundigung&hl.snippets=3&wt=xml&rows=1
>
> http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=true&hl.fl=trans&hl.q=K%C3%BCndigung&hl.snippets=3&wt=xml&rows=1
> <http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=true&hl.fl=trans&hl.q=Kundigung&hl.snippets=3&wt=xml&rows=1>
>
> Managed-schema:
>
>   <fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
>     <analyzer>
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.StopFilterFactory" format="snowball"
> words="lang/stopwords_de.txt" ignoreCase="true"/>
>       <filter class="solr.GermanNormalizationFilterFactory"/>
>       <filter class="solr.GermanLightStemFilterFactory"/>
>     </analyzer>
>   </fieldType>
>
>
> Other additional infos:
> https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-umlaut-not-found-not-highlighted
>
> Cheers,
> Arturas

Reply via email to