(2012/03/16 18:58), Heikki Linnakangas wrote: > On 16.03.2012 10:44, Etsuro Fujita wrote: >> I have a plan to support 'Create index on foreign table' for 9.3. Here >> is my plan. >> >> The index creation is supported for a flat file such as CSV and a remote >> table on a RDB e.g., Postgres using CREATE INDEX. (I thought using a >> new statement, CREATE FOREIGN INDEX, at first, but I think that CREATE >> INDEX would be sufficient to define an index for the foreign table.) >> For a flat file, CREATE INDEX constructs an index in the same way as an >> index for a regular table. > > I think this belongs completely in the remote data source. If you want > to index flat files, create an extra file for it or something, and > enhance the wrapper so that it can take advantage of it. Keeping the > index inside the database while the data is somewhere else creates a > whole new class of problems. For starters, how would you keep the index > up-to-date when the flat file is modified?
Index update is outside the scope at least until foreign table update is supported. It is required for the user to do DROP INDEX and then do CREATE INDEX again when the file has been modified. I plan to implement the GetForeignPaths callback routine of file_fdw to just give up creating index paths if the file's checksum or timestamp or something has changed. I think the index of the file is something similar to the stats of it in a certain sense. >> On the other hand, for a remote table, >> CREATE INDEX collects information about the index on the specified >> column(s) for the specified table that was created on the remote table. > > I don't see the point of this either. The planner asks the FDW for cost > estimates, and if the FDW knows about indexes in the remote server, it > can certainly adjust the estimates accordingly. But that's all internal > to the FDW. It might want delegate the whole cost estimation to the > remote server by running EXPLAIN there, or it could use its knowledge of > indexes that exist there, but I don't see why the rest of the system > would need to know what indexes there are in the remote system. In the case of joining A on the local system and B on the remote system, for example, it is required for the local system to know what indexes there are on B in the remote system in order to consider index paths parametrized by A for nestloop-with-inner-parametrized-scan paths. > If the > FDW needs that information, it can query the remote server for it on > first access, and cache the information for the lifetime of the session. I think that is one of the choices. However, it seems convenient to me to utilize the existing framework for index information because this approach has the possibility for the FDWs to share e.g., the path creation processing with Postgres core to some extent to create index paths, of course which requires to refactor existing query optimization code. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers