> I've been writing some custom synonym filters and have run > into an issue with returning a list of tokens. I have a > synonym filter that uses the WordNet database to extract > synonyms. My problem is how to define the offsets and > position increments in the new Tokens I'm returning. > > For an input token, I get a list of synonyms from the > WordNet database. I then create a List<Token> of those > results. Each Token is created with the same startOffset, > endOffset and positionIncrement of the input Token. Is this > correct? My understanding from looking at the Lucene > codebase is that the startOffset/endOffset should be the > same, as we are referring to the same term in the original > text. However, I don't quite get the positionIncrement. I > understand that it is relative to the previous term ... does > this mean all my synonyms should have a positionIncrement of > 0? But whether I use 0 or the positionIncrement of the > original input Token, Solr seems to ignore the returned > tokens ...
You can look at the source code of SynonymTokenFilter[1] and SynonymMap[2] in Lucene. [1] http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/memory/SynonymTokenFilter.html [2] http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/memory/SynonymMap.html