Greetings, I have a few simple applications which query the database given a search 
string.
I need to parse the search string into tokens in the manner that search engines would.
For example, if someone entered the following string in a search field:

Java's "my favorite" programming language!

I would like to get a collection of tokens like:

Java's, "my favorite", programming, language!


So I need to split up a string based on the following requirements:

.Tokens are separated by white space
.Tokens are separated by matching quotes
 - If there is single double quote,
   them match from the first occurence of the double quote
   to the end of the string.
.Ignore any metacharacters in the search string

I have Googled and came across what seemed to be a partial solution here:
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=pan.2002.09.07.08.05.42.322748.1437%40kamelfreund.de&rnum=8&prev
=/groups%3Fq%3DSplitting%2Ba%2Bstring%2Bregexp%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26selm%3Dpan.2002.09.07.08.05.42.322748.1
437%2540kamelfreund.de%26rnum%3D8

I asked the Struts list, because it seems like a pretty common web application 
requirement.
http://www.mail-archive.com/[EMAIL PROTECTED]/msg96258.html

And received a suggestion for using Lucene and another one suggesting using 
java.lang.String.split()

I looked into using Lucene, but I don't need anything that "heavy". I even tried to 
see if part of its
API would give me what I need, but couldn't find anything.

I tried using java.lang.String.split() and it doesn't support the OR operator which I 
think I will need.

So I downloaded ORO in the hopes that the Perl5Util.split() would do it for me.

To make matters worse, my regular expression knowlege is poor at best.


Here is what I've tried (but it doesn't cover escaping metacharacters which might be 
in the search string):

    /"(.*?)"|(\w+)/

Prints the following after Perl5Util.split()
[, Java, ', s,  , my favorite,  , programming,  , language, !]


Any suggestions would be greatly appreciated.


robert





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to