"Mike Marshall" <[EMAIL PROTECTED]> wrote:
> 
> 1)       We need to be able to index items such as AT&T, this seems like
> it's a case of replacing the default tokeniser with our own implementation

Correct.

> 
> 2)       A NEAR query operator  so we can do things like 'foo NEAR10 bar'
> which will bring back all documents that have bar within 10 words of foo
> (either direction).  This is the one that I'm really not sure on and having
> looked at the code don't really have a clue where to start.
> 

A NEAR operator is just a generalization of a phrase
search.  A phrase search is when you put two keywords in
doublequotes:   '"foo bar"'  FTS looks for documents that
contain the words foo and bar such that bar occurs immediately
after foo.  FTS records the index of each word in each document,
so what phrase search is really doing is looking for instances
of foo and bar where the index of bar is exactly one more than the
index of foo.  To implement NEAR10 you just have to look for
instances of bar with an index that is not more than 10 different
from the index on foo.  Not such a big change, really.  The
hard part will be parsing out the NEAR10 operator.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to