Re: DoubleMetaphone bugs?

Yonik Seeley Wed, 12 Aug 2009 13:57:25 -0700

I assume that if inject==false, and there is no phonetic alternative,
the original token should be indexed?


PhoneticFilter might also have an issue - I'm not sure how all of the
encoders report back that there is no alternative, but PhoneticFilter
does fewer checks and may generate zero length tokens, or tokens
equivalent to the original.

-Yonik
http://www.lucidimagination.com



On Wed, Aug 12, 2009 at 4:29 PM, Yonik Seeley<[email protected]> wrote:
> I'm converting many of the TokenFilters to the new Lucene attribute API...
> I'm currently on DoubleMetaphone, but something looks wrong.
>
>      // If we did not add something, then go to the next one...
>      if( !isPhonetic ) {
>        t = next(in);
>        if( t != null ) {
>          t.setPositionIncrement( t.getPositionIncrement()+1 );
>        }
>        return t;
>      }
>
> It looks like if DoubleMetaphone didn't add any tokens, then the
> *next* token is indexed w/o any variants?  That doesn't make sense,
> but it also seems like it could mess up the token ordering since the
> original token (in the case of inject==true) hasn't even been returned
> yet.
>
> I also couldn't find any documentation on exactly how "inject" is
> supposed to work.  DoubleMetaphoneFilterFactory doesn't appear at
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and the
> javadoc doesn't give any clues.
> We also have PhoneticFilter... should DoubleMetaphoneFilterFactory be
> deprecated?
>
>
> -Yonik
> http://www.lucidimagination.com
>

Re: DoubleMetaphone bugs?

Reply via email to