On 4/16/2014 8:37 PM, Bob Laferriere wrote: >> I am seeing odd behavior from WordDelimiterFilterFactory (WDFF) when >> used in conjunction with StandardTokenizerFactory (STF).
<snip> >> I see the following results for the document of “wi-fi”: >> >> Index: “wi”, “fi” >> Query: “wi”,”fi”,”wifi” >> >> The documentation seems to indicate that I should see the same results >> in either case as the WDFF is handling the generation of word parts. >> But the concatenate of words does not seem to work with a >> StandardTokenizer? The standard tokenizer breaks things up by punctuation, so when it hits WDFF, there's nothing for it to do. The following page links to a Unicode document that explains how it all works: http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html If you use the Analysis page in the Solr admin UI, you can see how the analysis works at each step. https://cwiki.apache.org/confluence/display/solr/Analysis+Screen Thanks, Shawn