Re: KeywordTokenizerFactory and stopwords

Erik Hatcher Wed, 08 Jun 2011 08:05:01 -0700

This seems like it deserves some kind of "collecting" TokenFilter(Factory) that 
will slurp up all incoming tokens and glue them together with a space (and 
allow separator to be configurable).   Hmmm.... surprised one of those doesn't 
already exist.  With something like that you could have a standard tokenization 
chain, and put it all back together at the end.


        Erik

On Jun 8, 2011, at 10:59 , Matt Mitchell wrote:

> Hi,
> 
> I have an "autocomplete" fieldType that works really well, but because
> the KeywordTokenizerFactory (if I understand correctly) is emitting a
> single token, the stopword filter will not detect any stopwords.
> Anyone know of a way to strip out stopwords when using
> KeywordTokenizerFactory? I did try the reg-exp replace filter, but I'm
> not sure I want to add a bunch of reg-exps for replacing every
> stopword.
> 
> Thanks,
> Matt
> 
> Here's the fieldType definition:
> 
> <fieldType name="autocomplete" class="solr.TextField"
> positionIncrementGap="100">
>  <analyzer type="index">
>    <tokenizer class="solr.KeywordTokenizerFactory"/>
>    <filter class="solr.TrimFilterFactory"/>
>    <filter class="solr.LowerCaseFilterFactory"/>
>    <filter class="solr.ASCIIFoldingFilterFactory"/>
> 
>    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="50"/>
>  </analyzer>
>  <analyzer type="query">
>    <tokenizer class="solr.KeywordTokenizerFactory"/>
>    <filter class="solr.TrimFilterFactory"/>
>    <filter class="solr.LowerCaseFilterFactory"/>
>    <filter class="solr.ASCIIFoldingFilterFactory"/>
>  </analyzer>
> </fieldType>

Re: KeywordTokenizerFactory and stopwords

Reply via email to