There are a few implementations too like http://jakarta.apache.org/oro/index.html
With a pattern like this: (".+"|[\d\w]*)
You should get close to what you need (i think the above will return the space character as a token too, not sure).
And here is a program that lets you test out patterns to see how they match strings or will be split
http://www.weitz.de/regex-coach/
Robert Taylor wrote:
I did a google search on this and didn't really come up with anything useful. Before I implement this myself, is there an existing implementation of parsing a search string which would produce tokens similar to how Google or other search engines parse search strings.
For example, I would like to parse a search string into tokens where tokens are delimited by either a blank space or a quoted phrase.
So the string:
'Struts "web presentation tier"'
would return 2 tokens: - Struts - web presentation tier
but the string:
'Struts web presentation tier'
would return 4 tokens: - Struts - web - presentation - tier
Any help is appreciated.
robert
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
-- Jason Lea
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]