[HACKERS] Proposal: Create index on foreign table

Etsuro Fujita Fri, 16 Mar 2012 01:42:27 -0700

I have a plan to support 'Create index on foreign table' for 9.3.  Here
is my plan.


The index creation is supported for a flat file such as CSV and a remote
table on a RDB e.g., Postgres using CREATE INDEX.  (I thought using a
new statement, CREATE FOREIGN INDEX, at first, but I think that CREATE
INDEX would be sufficient to define an index for the foreign table.)
For a flat file, CREATE INDEX constructs an index in the same way as an
index for a regular table.  On the other hand, for a remote table,
CREATE INDEX collects information about the index on the specified
column(s) for the specified table that was created on the remote table.
 An index created is stored in pg_class and pg_index like an index for a
regular table.  It depends on the wrappers implementation whether it
supports the options such as UNIQUE or WHERE predicates, though I think
that CONCURRENTLY is not supported in common for the foreign tables.
For a flat file, I plan that the user can specify all the options
excluding CONCURRENTLY and UNIQUE.  On the other hand, for a remote
table, I think that the user can specify only the names of the foreign
table and its column(s), using which the wrapper collects information
about all the related indexes created on the remote table.  To do so,
I'd like to propose new FDW callback routines:

CreateIndex(): This is called maybe from DefineIndex(), and does the
similar task to index_create().  For a flat file, this function makes
the catalog entries for the index and actually build the index, while
for a remote table, it just stores the catalog entries collected from
the remote end.
DropIndex(): This is called at DROP INDEX, and does the similar task to
index_drop().

I'd like to build the index physical data file for a flat file using the
index access method of regular tables (ie btree, hash, gin, gist, and
spgist) based on the following transformation between the TID and the
file offset to some data in the file:

    block_number  = file_offset_to_some_data / BLCKSZ
    offset_number = file_offset_to_some_data % BLCKSZ

I plan to make use of the above index for better query optimization.
For a flat file, I'd like to realize index scans, index-only scans,
bitmap (like) scans and parametrized scans on the file in the same way
as those on a regular table utilizing the currently revised FDW
infrastructure.  For a remote table, I have to admit that I don't have
any clear idea to make use of the index information stored in the system
catalogs for better query optimization, but I believe that it's useful
for the ORDER BY push down and/or nestloop-with-inner-parametrized-scan
join optimization.

Thoughts?

Best regards,
Etsuro Fujita

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Proposal: Create index on foreign table

Reply via email to