On Aug 21, 2009, at 10:49 AM, Yonik Seeley wrote:
On Fri, Aug 21, 2009 at 10:33 AM, Ryan McKinley<ryan...@gmail.com>
wrote:
Actually.... I think there may be something wrong here.
BaseTokenizerFactory does not make a Tokenizer, it creates a
TokenStream, so it should never be cast to Tokenizer
My custom TokenizerFactory now looks the same as:
o.a.s.analysis.PatternTokenizerFactory
Urg... looks like there's no end-to-end (index then search) test for
PatternTokenizerFactory, so we never caught this.
I guess we need to add one :)
It seems like when something is specified as a <tokenizer> in
schema.xml it should in fact be a tokenizer - it's the only way
tokenstream reuse works.
I don't see anything in Solr that creates a Tokenizer. The
TokenizerFactory just creates a TokenStream.
It seems that TokenizerFactory really needs to be:
public Tokenizer create( Reader input )
rather then:
public TokenStream create( Reader input );
I don't see any backwards compatible way to make this change!
ideas?
ryan