I'm converting many of the TokenFilters to the new Lucene attribute API...
I'm currently on DoubleMetaphone, but something looks wrong.
// If we did not add something, then go to the next one...
if( !isPhonetic ) {
t = next(in);
if( t != null ) {
t.setPositionIncrement( t.getPositionIncrement()+1 );
}
return t;
}
It looks like if DoubleMetaphone didn't add any tokens, then the
*next* token is indexed w/o any variants? That doesn't make sense,
but it also seems like it could mess up the token ordering since the
original token (in the case of inject==true) hasn't even been returned
yet.
I also couldn't find any documentation on exactly how "inject" is
supposed to work. DoubleMetaphoneFilterFactory doesn't appear at
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and the
javadoc doesn't give any clues.
We also have PhoneticFilter... should DoubleMetaphoneFilterFactory be
deprecated?
-Yonik
http://www.lucidimagination.com