On Aug 21, 2009, at 10:49 AM, Yonik Seeley wrote:

On Fri, Aug 21, 2009 at 10:33 AM, Ryan McKinley<ryan...@gmail.com> wrote:
Actually.... I think there may be something wrong here.

BaseTokenizerFactory does not make a Tokenizer, it creates a
TokenStream, so it should never be cast to Tokenizer

My custom TokenizerFactory now looks the same as:
o.a.s.analysis.PatternTokenizerFactory

Urg... looks like there's no end-to-end (index then search) test for
PatternTokenizerFactory, so we never caught this.

I guess we need to add one :)


It seems like when something is specified as a <tokenizer> in
schema.xml it should in fact be a tokenizer - it's the only way
tokenstream reuse works.


I don't see anything in Solr that creates a Tokenizer. The TokenizerFactory just creates a TokenStream.

It seems that TokenizerFactory really needs to be:
  public Tokenizer create( Reader input )
rather then:
  public TokenStream create( Reader input );

I don't see any backwards compatible way to make this change!

ideas?
ryan

Reply via email to