1. What is the meaning of such a query operator?

foo #5 bar -> true if the document has word "foo" followed by "bar" at
5th position.

foo #<5 bar -> true if document has word "foo" followed by "bar" with in
5 positions

foo #>5 bar -> true if document has word "foo" followed by "bar" after 5
positions

Sounds good, but, may be it's an overkill.

etc .....

2. How to implement such query operators?

Should we modify QueryItem to include additional distance information or
is there any other way to accomplish it?

Is the following list sufficient to accomplish this?
a. Modify to_tsquery
b. Modify TS_execute in tsvector_op.c to check new operator
Exactly


Is there anything needed in rewrite subsystem?
Yes, of course - rewrite system should support that operation.


3. Are these valid uses of the operators and if yes what would they
mean?

foo #5 (bar & cup)
It must support! Because of lexize might return subtsquery. For example, russian ispell can return several lexemes: "adfg" can become a 'adf | adfs | ad', norwegian and german languages are more complicated: "abc" -> " (ab & c) | (a & bc) | abc"


4. If the operator only applies to two query items can we create an
index such that (foo, bar)-> documents[min distance, max distance]
How difficult it is to implement an index like this?
No, index should execute query 'foo & bar' and mark recheck flag to true to execute 'foo #<5 bar' on original tsvector from table.

--
Teodor Sigaev                                   E-mail: [EMAIL PROTECTED]
                                                   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to