Hi hackers, If you drop a column from a partitioned table then it has a TupleDesc that matches existing partitions, but new partitions created after that have non-same TupleDescs (according to convert_tuples_by_name) because they don't have the dropped column. That means that inserts to partitions created later need to go via the deform->remap->form code path in tupconvert.c. If you're using a time-based partitioning scheme where you add a new partition for each month and mostly insert into the current month, as is very common, then after dropping a column you'll eventually finish up sending ALL your inserts through tupconvert.c for the rest of time.
For example, having hacked my tree to print out a message to tell me if it had to convert a tuple: postgres=# create table parent (a int, b int) partition by list (b); CREATE TABLE postgres=# create table child1 partition of parent for values in (1); CREATE TABLE postgres=# alter table parent drop column a; ALTER TABLE postgres=# create table child2 partition of parent for values in (2); CREATE TABLE postgres=# insert into parent values (1); NOTICE: no map INSERT 0 1 postgres=# insert into parent values (2); NOTICE: map! INSERT 0 1 Of course there are other usage patterns where you might prefer it this way, because you'll mostly be inserting into partitions created before the change. In general, would it be better for the partitioned table's TupleDesc to match partitions created before or after a change? Since partitioned tables have no storage themselves, is there any technical reason we couldn't remove a partitioned table's dropped pg_attribute so that its TupleDesc matches partitions created later? Is there some way that tupconvert.c could make this type of difference moot? -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers