: After 1/2 hour of regex hacking... I think I'll stick with a two step
: process: split then trim ;)
But regex hacking is FUN!!
I'm 99% certain this does waht you want...
<tokenizer class="solr.PatternTokenizerFactory"
pattern="((\A\s*)|\s*?(\s+-\s+|--|,|\(|\))|\s+)\s*\z?"
..if it doesn't send me an example string that it fails on and tell me
what hte desired output is.
Incidently, PatternTokenizerFactory seems to have the anoying limitation
of assuming there is a token prior to each match -- even if the match
explicitly matches on the start of the string (so it creates a 0 width
token) ... that seems like a bug right?
-Hoss