Leaving certain tokens intact during indexing

Marian Steinbach Wed, 30 Nov 2011 06:21:26 -0800

I have documents containing tokens of a certain format in arbitrary
positions, like this:


    ... blah blahblah AB/1234/5678 blah blah blahblah ...

I would like to enable "usual" keyword searching within these documents. In
addition, I'd also like to enable users to find "AB/1234/5678", ideally
without a need to quote this as a phrase. And match highlighting should
highlight this term just as other term matches would be highlighted.

BTW, it's *not* necessary to find this document by searching for parts of
that token, like "ab", "1234" or "5678".

As I understand, StandardTokenizerFactory considers the slash as a word
delimiter and thus removes it.

Is there a Tokenizer available that allows me to to skip tokenizing on
slashes in this case, but only on this case? Or how could I create one
myself? Do I extend StandardTokenizerFactory in my own Java class?

Thanks!

Marian

Leaving certain tokens intact during indexing

Reply via email to