DoubleMetaphone bugs?

Yonik Seeley Wed, 12 Aug 2009 13:30:18 -0700

I'm converting many of the TokenFilters to the new Lucene attribute API...
I'm currently on DoubleMetaphone, but something looks wrong.


      // If we did not add something, then go to the next one...
      if( !isPhonetic ) {
        t = next(in);
        if( t != null ) {
          t.setPositionIncrement( t.getPositionIncrement()+1 );
        }
        return t;
      }

It looks like if DoubleMetaphone didn't add any tokens, then the
*next* token is indexed w/o any variants?  That doesn't make sense,
but it also seems like it could mess up the token ordering since the
original token (in the case of inject==true) hasn't even been returned
yet.

I also couldn't find any documentation on exactly how "inject" is
supposed to work.  DoubleMetaphoneFilterFactory doesn't appear at
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and the
javadoc doesn't give any clues.
We also have PhoneticFilter... should DoubleMetaphoneFilterFactory be
deprecated?


-Yonik
http://www.lucidimagination.com

DoubleMetaphone bugs?

Reply via email to