Hi

First posting to list, but here goes .

I'm using WordDelimiterGraphFilter on a field and came across a curious 
additional positional "hole" generated by the filter while playing with the 
analysis tool.  
For input "wibble , wobble" (space either side of the comma so it's a separate 
token), the output introduces an additional positional hole after the comma, 
i.e. 

Term   position
Wibble 1
,  2
Wobble  4 *

The positionlength for each is 1, so no obvious graph-span going on.

Its not just comma, any punctuation would do, e.g. "wibble ! wobble"

I know it's a bit contrived, and it doesn't break anything in production but it 
just puzzled me.  

The question is - is this by design ?.  Its not the behaviour of the old 
WordDelimiterFilter filter.  

Setup:

Solr 6.6.3

Field:
<fieldType name="text_en_allies" class="solr.TextField" 
positionIncrementGap="100">
        <analyzer type="index">
                <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt"/>
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterGraphFilterFactory" 
generateWordParts="1" splitOnNumerics="0" generateNumberParts="1" 
catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" 
preserveOriginal="1" stemEnglishPossessive="1"/>
     ...
      </analyzer>

Thanks for any insight.

Kelvyn Scrupps
Developer for Allies Computing 

 
 
 


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service 
(http://www.symanteccloud.com) for Allies Computing Ltd
______________________________________________________________________

Reply via email to