Re: [OT] Search string tokenizer

Jason Lea Mon, 15 Mar 2004 14:32:16 -0800

Regular Expressions should work:

There are a few implementations too like http://jakarta.apache.org/oro/index.html

With a pattern like this: (".+"|[\d\w]*) You should get close to what you need (i think the above will return the space character as a token too, not sure).

And here is a program that lets you test out patterns to see how they match strings or will be split http://www.weitz.de/regex-coach/

Robert Taylor wrote:

I did a google search on this and didn't really come up with anything useful.
Before I implement this myself, is there an existing implementation of parsing
a search string which would produce tokens similar to how Google or other search
engines parse search strings.
For example, I would like to parse a search string into tokens where tokens are delimited by either a blank space or a quoted phrase.

So the string:

'Struts "web presentation tier"'
would return  2 tokens:
- Struts
- web presentation tier
but the string:

'Struts web presentation tier'
would return 4 tokens:
- Struts
- web
- presentation
- tier
Any help is appreciated.

robert
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--
Jason Lea

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [OT] Search string tokenizer

Reply via email to