(I started with this addressed to the -general list, but it just doesn't seem like a general usage topic. If -hackers is the wrong place, please point me in the right direction).

I've been working on a custom index type (access method) that allows postgres to do fulltext searching (including phrase and proximity) that I believe is more flexible and natural than "tsearch2" (no offense to the tsearch2 guys... it's been a source of inspiration). I also plan on providing "term browsing" and hit-position information.

I'm using Xapian (www.xapian.org) as the backend fulltext engine, and I'm trying to learn more of the innards of postgres as I go. I've only spent about 18-24 hours on this, so it's nowhere near complete, and I really hesitate to even mention it in a public forum. But if in the end it doesn't suck it might make a great contrib/ package. *shrug* who knows?

In a nutshell, I've implemented all the necessary access method functions to teach postgres about a new index type named "xapian", and so far, everything is really great. It works just like any other index type would:

        create table test (stuff varchar(255), more_stuff text);
        create index idxstuff on test using xapian (stuff);
        create index idxmore_stuff on test using xapian (more_stuff);

insert into test (stuff, more_stuff) values ('this is stuff', 'this is more stuff');
insert into test (stuff, more_stuff) values ('my name is eric ridge', 'i like to drink beer');


select * from test where stuff => 'stuff' or more_stuff => '"drink beer"';
stuff | more_stuff
-----------------------+----------------------
this is stuff | this is more stuff
my name is eric ridge | i like to drink beer
(2 rows)


All this aside, I've got some questions related to indexes and query plans.

1) Is it possible for an access method to receive some kind of "DROP INDEX" notification? As far as I can tell, the answer is "no", but it can't hurt to ask.

2) When does the query planner decide that it needs to "Filter" results, and is there any way to turn this off or otherwise fool the query planner into NOT doing Filters (even if only for certain operators)?

For example, this query:
select * from test where stuff => 'stuff' AND NOT more_stuff => '"drink beer"';
has this plan:
Index Scan using idxstuff on test (cost=0.00..-0.98 rows=250 width=177)
Index Cond: ((stuff)::text => 'stuff'::text)
Filter: (NOT (more_stuff => '"drink beer"'::text))


In this case, postgres is forced to re-parse the contents of the "more_stuff" field (via the defined procedure for the => operator) for every row returned by the previous index scan, just so it can determine if the field contains the phrase 'drink beer' or not. Since so much overhead is involved here, it would be cool if postgres could somehow do another index scan.

Maybe there's some way for the operator function to know not only the Datum value, but the actual field (and ItemPointer)?

3) How does one get the $PGDATA directory? getenv() doesn't seem ideal since PGDATA can be specified as a command-line argument.

4) Can a function be registered as part of a transaction, pre-commit -- so the function can have an opportunity to abort the transaction. I've seen RegisterEOXactCallback, but it's not quite what I'm looking for.

5) are there discussions in the archives of this list (or other pgsql- lists) that discuss fulltext searching that y'all think are worth reading?

thanks in advance for your time and input!

eric


---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Reply via email to