I wrote some code to do this for an open-source project on sf.net an eon or two ago, before the regex packages matured. You can probably enhance what I wrote with regex, but it's at least a starting point...
http://cvs.sourceforge.net/viewcvs.py/omd/java/coruscant/omd/util/StringSear cher.java?rev=1.2&view=markup Anyway, what the code does is split the input into lists (ok, so I used vectors, I was still learning Java!) of 3 types: required present (i.e. +) , required absent (i.e. -), and optional terms. In short, the yahoo search style. Usage: 1) call setCriteriaString (passing your user search input) 2) call compareString (passing the content to search/validate) In your case, since you're going to pass the search criteria to SQL, you can probably just use the tokeinzing logic and add some getters for the criteria lists... David Hibbs, ACS Staff Programmer / Analyst American National Insurance Company > -----Original Message----- > From: Robert Taylor [mailto:[EMAIL PROTECTED] > Sent: Monday, March 15, 2004 3:20 PM > To: [EMAIL PROTECTED] > Subject: [OT] Search string tokenizer > > > I did a google search on this and didn't really come up with > anything useful. > Before I implement this myself, is there an existing > implementation of parsing > a search string which would produce tokens similar to how > Google or other search > engines parse search strings. > > For example, I would like to parse a search string into > tokens where tokens are > delimited by either a blank space or a quoted phrase. > > So the string: > > 'Struts "web presentation tier"' > > would return 2 tokens: > - Struts > - web presentation tier > > but the string: > > 'Struts web presentation tier' > > would return 4 tokens: > - Struts > - web > - presentation > - tier > > > Any help is appreciated. > > robert > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]