StandardTokenizerFactory and WhitespaceTokenizerFactory

Tarala, Magesh Thu, 30 Jul 2015 08:11:28 -0700

I am indexing text that contains part numbers in various formats that contain 
hypens/dashes, and a few other special characters.


Here's the problem: If I use StandardTokenizerFactory, the hypens, etc are 
stripped and so I cannot search by the part number 222-333-4444. I can only 
search for 222 or 333 or 444.
If I use the WhitespaceTokenizerFactory instead, I can search part numbers, but 
I'm not able to search words if they have punctuations like comma or period 
after the word. Example: wheel,

Should I use copy fields and use different tokenizers and then during the 
search based on the search string? Any other options?

StandardTokenizerFactory and WhitespaceTokenizerFactory

Reply via email to