This seems like it deserves some kind of "collecting" TokenFilter(Factory) that will slurp up all incoming tokens and glue them together with a space (and allow separator to be configurable). Hmmm.... surprised one of those doesn't already exist. With something like that you could have a standard tokenization chain, and put it all back together at the end.
Erik On Jun 8, 2011, at 10:59 , Matt Mitchell wrote: > Hi, > > I have an "autocomplete" fieldType that works really well, but because > the KeywordTokenizerFactory (if I understand correctly) is emitting a > single token, the stopword filter will not detect any stopwords. > Anyone know of a way to strip out stopwords when using > KeywordTokenizerFactory? I did try the reg-exp replace filter, but I'm > not sure I want to add a bunch of reg-exps for replacing every > stopword. > > Thanks, > Matt > > Here's the fieldType definition: > > <fieldType name="autocomplete" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.TrimFilterFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > > <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" > maxGramSize="50"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.TrimFilterFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > </analyzer> > </fieldType>