I think you're absolutely right Erick,
Thanks for the insight - that's the direction I'll be heading.
Cheers,
-D
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, July 13, 2012 8:53 AM
To: java-user@lucene.apache.org
Subject: Re: P
Sure, you can do it that way. But first I'd look over the zillion
tokenizers and filters
that are available and string together the ones that best suit your
need. For instance,
WhitespaceTokenizer and PatternReplaceFilter might make your regex much
easier since the PatternReplaceFilter gets just th