I wrote some code to do this for an open-source project on sf.net an eon or
two ago, before the regex packages matured.  You can probably enhance what I
wrote with regex, but it's at least a starting point... 

http://cvs.sourceforge.net/viewcvs.py/omd/java/coruscant/omd/util/StringSear
cher.java?rev=1.2&view=markup

Anyway, what the code does is split the input into lists (ok, so I used
vectors, I was still learning Java!) of 3 types: required present (i.e. +) ,
required absent (i.e. -), and optional terms.  In short, the yahoo search
style.  Usage:
1) call setCriteriaString (passing your user search input)
2) call compareString (passing the content to search/validate)

In your case, since you're going to pass the search criteria to SQL, you can
probably just use the tokeinzing logic and add some getters for the criteria
lists...

David Hibbs, ACS
Staff Programmer / Analyst
American National Insurance Company

> -----Original Message-----
> From: Robert Taylor [mailto:[EMAIL PROTECTED]
> Sent: Monday, March 15, 2004 3:20 PM
> To: [EMAIL PROTECTED]
> Subject: [OT] Search string tokenizer
> 
> 
> I did a google search on this and didn't really come up with 
> anything useful.
> Before I implement this myself, is there an existing 
> implementation of parsing
> a search string which would produce tokens similar to how 
> Google or other search
> engines parse search strings.
> 
> For example, I would like to parse a search string into 
> tokens where tokens are 
> delimited by either a blank space or a quoted phrase.
> 
> So the string:
> 
> 'Struts "web presentation tier"' 
> 
> would return  2 tokens:
>  - Struts
>  - web presentation tier
> 
> but the string:
> 
> 'Struts web presentation tier'
> 
> would return 4 tokens:
>  - Struts
>  - web
>  - presentation
>  - tier
> 
> 
> Any help is appreciated.
> 
> robert
> 
> 
> 
> 
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to