Chris Hostetter wrote:
: After 1/2 hour of regex hacking... I think I'll stick with a two step
: process: split then trim ;)

But regex hacking is FUN!!

I'm 99% certain this does waht you want...

        <tokenizer class="solr.PatternTokenizerFactory"
                   pattern="((\A\s*)|\s*?(\s+-\s+|--|,|\(|\))|\s+)\s*\z?"


yup!  that does it.  thanks


..if it doesn't send me an example string that it fails on and tell me
what hte desired output is.

Incidently, PatternTokenizerFactory seems to have the anoying limitation
of assuming there is a token prior to each match -- even if the match
explicitly matches on the start of the string (so it creates a 0 width
token) ... that seems like a bug right?


how would you change it? I don't know regex well enough to see the limitation. My only criteria was that the output is the same as if you send it to string.split( pattern );


thanks again
ryan

Reply via email to