and as was said in previous post, we can clearly see in analysis output
that end values for edgengrams are good for solr 4.10.1 and not good for
solr 5.5.2


solr 5.5.2

text
raw_bytes
start
end
positionLength
type
position
p
[70]
0
5
1
word
1
pa
[70 61]
0
5
1
word
1
par
[70 61 72]
0
5
1
word
1
pari
[70 61 72 69]
0
5
1
word
1
paris
[70 61 72 69 73]
0
5
1
word



end is always set to 5, which is false


solr 4.10.1


text
raw_bytes
start
end
positionLength
type
position
p
[70]
0
1
1
word
1
pa
[70 61]
0
2
1
word
1
par
[70 61 72]
0
3
1
word
1
pari
[70 61 72 69]
0
4
1
word
1
paris
[70 61 72 69 73]
0
5
1
word

end is set to 1, 2, 3 or 4 depending on edgengrams length


2016-09-22 14:57 GMT+02:00 elisabeth benoit <elisaelisael...@gmail.com>:

>
> Hello
>
> After migrating from solr 4.10.1 to solr 5.5.2, we dont have the same
> behaviour with highlighting on edge ngrams fields.
>
> We're using it for an autocomplete component. With Solr 4.10.1, if request
> is sol, highlighting on solr is <em>sol<\em>r
>
> with solr 5.5.2, we have <em>solr<\em>.
>
> Same problem as described in http://grokbase.com/t/
> lucene/solr-user/154m4jzv2f/solr-5-hit-highlight-with-
> ngram-edgengram-fields
>
> but nobody answered the post.
>
> Does anyone know we can fix this?
>
> Best regards,
> Elisabeth
>
> Field definition
>
> <fieldType name="autocomplete_ngram" class="solr.TextField">
>   <analyzer type="index">
>     <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-
> ISOLatin1Accent.txt"/>
>     <!--<tokenizer class="solr.StandardTokenizerFactory"/>-->
> <tokenizer class="solr.PatternTokenizerFactory"
> pattern="[\s,;:\-\&#39;]"/>
>     <filter class="solr.WordDelimiterFilterFactory"
>         splitOnNumerics="0"
>         generateWordParts="1"
>         generateNumberParts="1"
>         catenateWords="0"
>         catenateNumbers="0"
>         catenateAll="0"
>         splitOnCaseChange="1"
>         preserveOriginal="1"
>         types="wdfftypes.txt"
>         />
>     <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.EdgeNGramFilterFactory" maxGramSize="20"
> minGramSize="1"/>
>   </analyzer>
>   <analyzer type="query">
>     <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-
> ISOLatin1Accent.txt"/>
>     <!--<tokenizer class="solr.StandardTokenizerFactory"/>-->
> <tokenizer class="solr.PatternTokenizerFactory"
> pattern="[\s,;:\-\&#39;]"/>
>     <filter class="solr.WordDelimiterFilterFactory"
>         splitOnNumerics="0"
>         generateWordParts="1"
>         generateNumberParts="0"
>         catenateWords="0"
>         catenateNumbers="0"
>         catenateAll="0"
>         splitOnCaseChange="0"
>         preserveOriginal="1"
>         types="wdfftypes.txt"
>         />
>     <filter class="solr.LowerCaseFilterFactory"/>
>
>   </analyzer>
> </fieldType>
>

Reply via email to