Hi,

On 2021-02-22 08:33:21 +0100, Mats Kindahl wrote:
> I started to experiment with the table access method interface to see if it
> can be used for some ideas I have.

Cool.


> The `relation_set_new_filenode` is indirectly called from
> `heap_create_with_catalog`, but there is no corresponding callback from
> `heap_drop_with_catalog`. It also seems like the intention is that the
> callback should call `RelationCreateStorage` itself (makes sense, since the
> access method knows about how to use the storage), so it seems natural to
> add a `relation_reset_filenode` to the table AM that is called from
> `heap_drop_with_catalog` for tables and add that to the heap implementation
> (see the attached patch).

I don't think that's quite right. It's not exactly obvious from the
name, but RelationDropStorage() does not actually drop storage. Instead
it *schedules* the storage to be dropped upon commit.

The reason for deferring the dropping of table storage is that DDL in
postgres is transactional. Therefore we cannot remove the storage at the
moment the DROP TABLE is executed - only when the transaction that
performed the DDL commits. Therefore just providing you with a callback
that runs in heap_drop_with_catalog() doesn't really achieve much -
you'd not have a way to execute the "actual" dropping of the relation at
the later stage.


> Creating new blocks for a table is straightforward to implement by using
> the `relation_set_new_filenode` callback where you can create new memory
> blocks for a relation, but I cannot find a way to clean up those blocks
> when the table is dropped nor a way to handle a change of the schema for a
> table.

What precisely do you mean with the "handle a change of the schema" bit?
I.e. what would you like to do, and what do you think is preventing you
from it? But before you answer see my next point below.


> Altering the schema does not seem to be covered at all, but this is
> something that table access methods need to know about since it might want
> to optimize the internal storage when the schema changes. I have not been
> able to find any discussions around this, but it seems like a natural thing
> to do with a table. Have I misunderstood how this works?

Due to postgres' transactional DDL you cannot really change the storage
layout of *existing data* when that DDL command is executed - the data
still needs to be interpretable in case the DDL is rolled back
(including when crashing).

Before I explain some more: Could you describe in a bit more detail what
kind of optimization you'd like to make?

Back to schema change handling:

For some schema changes postgres assumes that they can be done
"in-place", e.g. adding a column to a table.

Other changes, e.g. changing the type of a column "sufficiently", will
cause a so called table rewrite. Which means that a new relation will be
created (including a call to relation_set_new_filenode()), then that new
relation will get all the new data inserted, and then
pg_class->relfilenode for the "original" relation will be changed to the
"rewritten" table (there's two variants of this, once for rewrites due
to ALTER TABLE and a separate one for VACUUM FULL/CLUSTER).

When the transaction containing such a rewrite commits that
->relfilenode change becomes visible for everyone, and the old
relfilenode will be deleted.


This means that right now there's no easy way to store the data anywhere
but in the file referenced by pg_class.relfilenode. I don't think
anybody would object on principle to making the necessary infrastructure
changes to support storing data elsewhere - but I think it'll also not
quite as simple as the change you suggested :(.

Greetings,

Andres Freund


Reply via email to