RE: Pattern Analyzer

2012-07-13 Thread Dave Seltzer
I think you're absolutely right Erick, Thanks for the insight - that's the direction I'll be heading. Cheers, -D -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, July 13, 2012 8:53 AM To: java-user@lucene.apache.org Subject: Re: P

Re: Pattern Analyzer

2012-07-13 Thread Erick Erickson
Sure, you can do it that way. But first I'd look over the zillion tokenizers and filters that are available and string together the ones that best suit your need. For instance, WhitespaceTokenizer and PatternReplaceFilter might make your regex much easier since the PatternReplaceFilter gets just th