Regular Expressions should work:

There are a few implementations too like http://jakarta.apache.org/oro/index.html

With a pattern like this: (".+"|[\d\w]*)
You should get close to what you need (i think the above will return the space character as a token too, not sure).


And here is a program that lets you test out patterns to see how they match strings or will be split
http://www.weitz.de/regex-coach/


Robert Taylor wrote:

I did a google search on this and didn't really come up with anything useful.
Before I implement this myself, is there an existing implementation of parsing
a search string which would produce tokens similar to how Google or other search
engines parse search strings.

For example, I would like to parse a search string into tokens where tokens are delimited by either a blank space or a quoted phrase.

So the string:

'Struts "web presentation tier"'

would return  2 tokens:
- Struts
- web presentation tier

but the string:

'Struts web presentation tier'

would return 4 tokens:
- Struts
- web
- presentation
- tier


Any help is appreciated.


robert







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






--
Jason Lea



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to