There's about a zillion tokenizers, for what you're describing WhitespaceTokenizerFactory is a good candidate.
See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for a partial list, and it has links to the authoritative docs. Best Erick On Wed, Nov 30, 2011 at 9:23 AM, Marian Steinbach <marian.steinb...@gmail.com> wrote: > I have documents containing tokens of a certain format in arbitrary > positions, like this: > > ... blah blahblah AB/1234/5678 blah blah blahblah ... > > I would like to enable "usual" keyword searching within these documents. In > addition, I'd also like to enable users to find "AB/1234/5678", ideally > without a need to quote this as a phrase. And match highlighting should > highlight this term just as other term matches would be highlighted. > > BTW, it's *not* necessary to find this document by searching for parts of > that token, like "ab", "1234" or "5678". > > As I understand, StandardTokenizerFactory considers the slash as a word > delimiter and thus removes it. > > Is there a Tokenizer available that allows me to to skip tokenizing on > slashes in this case, but only on this case? Or how could I create one > myself? Do I extend StandardTokenizerFactory in my own Java class? > > Thanks! > > Marian