Solr version: 6.5.x

Why do we need to pass hl.fl and df to be same for correct highlighting?

Let us suppose I am highlighting on field: fieldA which has stemming filter
on its analysis.

Sample doc: {"id":"1", "fieldA":"Vacation"}

If I then highlighting request:
> "params":{
>       "q":"Vacation",
>       "hl":"on",
>       "indent":"on",
>       "hl.fl":"fieldA",
>       "wt":"json"}


Highlighting doesn't work as "Vacation" via _text_::text_general as
"Vacation" remains "Vacation", while on the index it is stored as "vacat".

I debugged through the code and HighlightComponent::169

highlightQuery = rb.getQparser().getHighlightQuery();


highlightQuery is passed which is analysed value of what's being passed,
this case: _text_:Vacation.

Fast-forwarding to WeightedSpanTermExtractor::extractWeightedTerms::366::

for (final Term queryTerm : nonWeightedTerms) {
>   if (fieldNameComparator(queryTerm.field())) {
>     WeightedSpanTerm weightedSpanTerm = new WeightedSpanTerm(boost,
> queryTerm.text());
>     terms.put(queryTerm.text(), weightedSpanTerm);
>   }
> }

extracted term is "Vacation".

Jumping to core highlighting code:

Highlighter::getBestTextFragements::213

TokenGroup tokenGroup=new TokenGroup(tokenStream);


Each tokenStream, has analysed tokens: "vacat" which obviously doesn't
match with extracted term.

Why the df, qf, values concern with what we pass in "hl.fl"? Isn't the
query which is to be highlighted be analysed by field passed in "hl.fl",
but then multiple fields can be passed in "hl.fl". Just wondering how it is
suppose to be done. Any explanation will be fine.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

Reply via email to