We've moved from an asterisk based autosuggest functionality ("searchterm*") to a version using a special field called autosuggest, filled via copyField directives. The field definition:

<fieldType name="autosuggest" class="solr.TextField" positionIncrementGap="100">
                        <analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true" enablePositionIncrements="true" format="snowball"/> <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="3" maxSubwordSize="30" onlyLongestMatch="false"/> <filter class="solr.GermanNormalizationFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="German2" protected="protwords.txt"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
                        </analyzer>
                        <analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true" enablePositionIncrements="true" format="snowball"/> <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="3" maxSubwordSize="30" onlyLongestMatch="false"/> <filter class="solr.GermanNormalizationFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="German2" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
                        </analyzer>
                </fieldType>

It works like a charm. Now, we've had highlighting from Solr before, using these parameters:

hl=true&hl.simple.pre=<span+class%3D"highlight">&hl.snippets=1&hl.simple.post=</span>&spellcheck=true&hl.fl=description

Now, we've seen something strange. This is just an example, the problem is with more than this record. In this example, the autosuggest field contains:

2CV4 Spot, Dekorsatz, für 2CV.

However, the highlighting branch for this autosuggest field in the record looks like this:

<lst name="highlighting">
  <lst name="34725">
    <arr name="short_description">
      <str>2CV4 Spot, Dekorsatz, für <em>2CV</em>.</str>
    </arr>
  </lst>
  ...

Although the EdgeNGramFilterFactory generated the NGrams so that "2CV4" -> "2", "2C", "2CV", "2CV4", the term is not highlighted. Shouldn't it? It's not a question of the number of highlights, records containing multiple occurances of "2CV" get highlighted multiple times with no problems.

It seems that words only containing parts of the search term which match the EdgeNGrams are not highlighted. As we're using highlighting from Solr exclusively, this leads to records being found, but having no highlight at all.

Reply via email to