Tom Lane wrote:
The only alternative I can see is the one Heikki suggested: don't truncate clog until the freeze horizon. That's safe (given the planned change to WAL-log tuple freezing) and clean and simple, but a permanent requirement of 250MB+ for pg_clog would put the final nail in the coffin of PG's usability in small-disk-footprint environments. So I don't like it much. I suppose it could be made more tolerable by reducing the freeze horizon, say to 100M instead of 1G transactions. Anyone for a GUC parameter? In a high-volume DB you'd want the larger setting to minimize the amount of tuple freezing work. OTOH it seems like making this configurable creates a nasty risk for PITR situations: a slave that's configured with a smaller freeze window than the master is probably not safe.
If we go down that route, we really should make it a GUC parameter, and reduce the default at least for 8_1_STABLE.
I got another idea. If we make sure that vacuum removes any aborted xid older than OldestXmin from the table, we can safely assume that any xid < the current clog truncation point we are going to be interested in is committed. Vacuum already removes any tuple with an aborted xmin. If we also set any aborted xmax (and xvac) to InvalidXid, and WAL logged that, we would know that after vacuum commits, any xid < relvacuumxid in the vacuumed table was committed, regardless of the hint bits. We could then safely truncate the clog without flushing anything. This also seems safe for PITR.
The only performance hit would be the clearing of xmax values of aborted transactions, but that doesn't seem too bad to me because most transactions commit.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match