Multiple word synonym is not found because of an extra token between words

Jean-Marc Desprez Fri, 16 Aug 2013 12:44:19 -0700

Hi,

Let's say I have this synonyms entry :
b c => ok


My configuration (index time) :
1. WhitespaceTokenizerFactory
2. WordDelimiterFilterFactory with catenateWords="0"
3. SynonymFilterFactory

The input : "a/b c" produce (one line per tokenizer/filter)
0:"a/b", 1:"c"
0:"a", 1:"b", 2:"c"
0:"a", 1:"ok"

So everything is ok, now if I set catenateWords to "1", the same input
produce :
0:"a/b", 1:"c"
0:"a", 1:"b", 1:"ab", 2:"c"
0:"a", 1:"b", 1:"ab", 2:"c"

The synonym filter doesn't match entry because of the extra token "ab"
between "b" and "c".
To my mind the synonym should be triggered when a token "b" and a token "c"
are separate by one position (which is still the case in the second
example).

Is there any way to make the second example work ?

Jean-Marc

Multiple word synonym is not found because of an extra token between words

Reply via email to