[Just some background, while I'm thinking about it.] Google has placeholder syntax, if you search for 'full * search', the first hit is a Wikipedia article on 'full text search'. It's not the same, but it is in a similar ballpark. For instance, you might have 'full */10 search' (Google doesn't, but it might be a reasonable extension). Doing the fixed-distance thing is easier than allowing for variable distance.
Lucene (*) has '"full search"~10' as a proximity search for "full" and "search" within 10 of each other. Without more research, I'm not clear on whether this has ordering implications, and I'm not entirely certain how you would compose a sequence of such things. Xapian (**) has two variants. "full NEAR/10 search" and "full ADJ/10 search", the latter enforces the order given. I think that being able to do things like "full NEAR/5 text NEAR/5 search" feels more approachable than "full text search"~5. I won't say it's more understandable, because I think any of these kinds of search may have non-obvious subtleties, but I think the infix version might make it easier to generate useful results for ad-hoc queries without having to step back too far. One thing I'll think on in the background as a how-to-integrate question is the balance between sophistication for query experts versus the approachability for non-experts. For some systems, having things like proximity queries complicates the query language to no particular end, while in other systems proximity queries might be essential. Insofar as more sophisticated query forms don't interfere with simpler forms, they can just be ignored, but it would be nice if they didn't crop up in warts like unexpected results for a search 'stoplight near krispy kreme' where you no longer find documents where stoplight is more than 10 terms away from krispy. We've discussed having the ability to express both more ad-hoc and more stylized queries, maybe this is something to think about along those lines. -scott (*) http://lucene.apache.org/java/docs/queryparsersyntax.html#Proximity%20Searches (**) http://www.xapian.org/docs/queryparser.html On 9/14/07, Mike Marshall <[EMAIL PROTECTED]> wrote: > Thanks for the quick response, much appreciated. Guess I better go and look > at the query parser. > > Thanks again > > Mike > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: 14 September 2007 19:22 > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] Adding additional operators to FTS3 > > "Mike Marshall" <[EMAIL PROTECTED]> wrote: > > > > 1) We need to be able to index items such as AT&T, this seems like > > it's a case of replacing the default tokeniser with our own implementation > > Correct. > > > > > 2) A NEAR query operator so we can do things like 'foo NEAR10 bar' > > which will bring back all documents that have bar within 10 words of foo > > (either direction). This is the one that I'm really not sure on and > having > > looked at the code don't really have a clue where to start. > > > > A NEAR operator is just a generalization of a phrase > search. A phrase search is when you put two keywords in > doublequotes: '"foo bar"' FTS looks for documents that > contain the words foo and bar such that bar occurs immediately > after foo. FTS records the index of each word in each document, > so what phrase search is really doing is looking for instances > of foo and bar where the index of bar is exactly one more than the > index of foo. To implement NEAR10 you just have to look for > instances of bar with an index that is not more than 10 different > from the index on foo. Not such a big change, really. The > hard part will be parsing out the NEAR10 operator. > > -- > D. Richard Hipp <[EMAIL PROTECTED]> > > > ---------------------------------------------------------------------------- > - > To unsubscribe, send email to [EMAIL PROTECTED] > ---------------------------------------------------------------------------- > - > > > > ----------------------------------------------------------------------------- > To unsubscribe, send email to [EMAIL PROTECTED] > ----------------------------------------------------------------------------- > > ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------