> I've been writing some custom synonym filters and have run
> into an issue with returning a list of tokens. I have a
> synonym filter that uses the WordNet database to extract
> synonyms. My problem is how to define the offsets and
> position increments in the new Tokens I'm returning.
> 
> For an input token, I get a list of synonyms from the
> WordNet database. I then create a List<Token> of those
> results. Each Token is created with the same startOffset,
> endOffset and positionIncrement of the input Token. Is this
> correct? My understanding from looking at the Lucene
> codebase is that the startOffset/endOffset should be the
> same, as we are referring to the same term in the original
> text. However, I don't quite get the positionIncrement. I
> understand that it is relative to the previous term ... does
> this mean all my synonyms should have a positionIncrement of
> 0? But whether I use 0 or the positionIncrement of the
> original input Token, Solr seems to ignore the returned
> tokens ...

You can look at the source code of SynonymTokenFilter[1] and SynonymMap[2] in 
Lucene.

[1] 
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/memory/SynonymTokenFilter.html
[2] 
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/memory/SynonymMap.html


      

Reply via email to