On 2/5/2018 3:55 AM, Александр Шестак wrote: > > Hi, I have misunderstanding about usage of SynonymGraphFilterFactory > and WordDelimiterGraphFilterFactory. Can they be used together? >
There should be no problem with using them together. But it is always possible that the behavior will surprise you, while working 100% as designed. > I have solr type configured in next way > > <fieldtype name="fulltext_en" class="solr.TextField" > autoGeneratePhraseQueries="true"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" > generateWordParts="1" generateNumberParts="1" > splitOnNumerics="1" > catenateWords="1" catenateNumbers="1" catenateAll="0" > preserveOriginal="1" protected="protwords_en.txt"/> > <filter class="solr.FlattenGraphFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" > generateWordParts="1" generateNumberParts="1" > splitOnNumerics="1" > catenateWords="0" catenateNumbers="0" catenateAll="0" > preserveOriginal="1" protected="protwords_en.txt"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SynonymGraphFilterFactory" > synonyms="synonyms_en.txt" ignoreCase="true" expand="true"/> > </analyzer> > </fieldtype> > > So on query time it uses SynonymGraphFilterFactory after > WordDelimiterGraphFilterFactory. > Synonyms are configured in next way: > b=>b,boron > 2=>ii,2 > > Query in solr analysis tool looks so. It is shown that terms after SGF > have positions 3 and 4. Is it correct? I thought that they should had > 1 and 2 positions. > What matters is the *relative* positions. The exact position number doesn't matter much. Something new that the Graph implementations use is the position length. That feature is necessary for multi-term synonyms to function correctly in phrase queries. In your analysis screenshot, WDGF creates three tokens. The two tokens created by splitting the input are at positions 1 and 2, which I think is 100% as expected. It also sets the positionLength of the first term to 2, probably because it has split that term into 2 additional terms. Then the SGF takes those last two terms and expands them. Each of the synonyms is at the same position as the original term, and the relative positions of the two synonym pairs have not changed -- the second one is still one higher than the first. I think the reason that SGF moves the positions two higher is because the positionLength on the "b2" term is 2, previously set by WDGF. Someone with more knowledge about the Graph implementations may have to speak up as to whether this behavior is correct. Because the relative positions of the split terms don't change when SGF runs, I think this is probably working as designed. Thanks, Shawn