Chris Hostetter wrote:
: After 1/2 hour of regex hacking... I think I'll stick with a two step : process: split then trim ;)But regex hacking is FUN!! I'm 99% certain this does waht you want... <tokenizer class="solr.PatternTokenizerFactory" pattern="((\A\s*)|\s*?(\s+-\s+|--|,|\(|\))|\s+)\s*\z?"
yup! that does it. thanks
..if it doesn't send me an example string that it fails on and tell me what hte desired output is. Incidently, PatternTokenizerFactory seems to have the anoying limitation of assuming there is a token prior to each match -- even if the match explicitly matches on the start of the string (so it creates a 0 width token) ... that seems like a bug right?
how would you change it? I don't know regex well enough to see the limitation. My only criteria was that the output is the same as if you send it to string.split( pattern );
thanks again ryan
