Autosuggest using EdgeNGrams with strange highlighting

Thomas Michael Engelke Fri, 07 Nov 2014 07:24:27 -0800

We've moved from an asterisk based autosuggest functionality("searchterm*") to a version using a special field called autosuggest,filled via copyField directives. The field definition:

<fieldType name="autosuggest" class="solr.TextField"positionIncrementGap="100">

                        <analyzer type="index">

<tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.LowerCaseFilterFactory"/><filter class="solr.StopFilterFactory"words="stopwords.txt" ignoreCase="true" enablePositionIncrements="true"format="snowball"/><filterclass="solr.DictionaryCompoundWordTokenFilterFactory"dictionary="dictionary.txt" minWordSize="5" minSubwordSize="3"maxSubwordSize="30" onlyLongestMatch="false"/><filterclass="solr.GermanNormalizationFilterFactory"/><filterclass="solr.SnowballPorterFilterFactory" language="German2"protected="protwords.txt"/><filterclass="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"side="front"/><filterclass="solr.RemoveDuplicatesTokenFilterFactory"/>

                        </analyzer>
                        <analyzer type="query">

                        </analyzer>
                </fieldType>

It works like a charm. Now, we've had highlighting from Solr before,using these parameters:


hl=true&hl.simple.pre=<span+class%3D"highlight">&hl.snippets=1&hl.simple.post=</span>&spellcheck=true&hl.fl=description

Now, we've seen something strange. This is just an example, the problemis with more than this record. In this example, the autosuggest fieldcontains:


2CV4 Spot, Dekorsatz, für 2CV.

However, the highlighting branch for this autosuggest field in therecord looks like this:


<lst name="highlighting">
  <lst name="34725">
    <arr name="short_description">
      <str>2CV4 Spot, Dekorsatz, für <em>2CV</em>.</str>
    </arr>
  </lst>
  ...

Although the EdgeNGramFilterFactory generated the NGrams so that "2CV4"-> "2", "2C", "2CV", "2CV4", the term is not highlighted. Shouldn't it?It's not a question of the number of highlights, records containingmultiple occurances of "2CV" get highlighted multiple times with noproblems.

It seems that words only containing parts of the search term which matchthe EdgeNGrams are not highlighted. As we're using highlighting fromSolr exclusively, this leads to records being found, but having nohighlight at all.

Autosuggest using EdgeNGrams with strange highlighting

Reply via email to