On Mon, 2006-10-30 at 16:58 -0500, Tom Lane wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > ISTM we only need to flush iff the clog would be truncated when we > > update relminxid. > > Wrong :-( If the relvacuumxid change (not relminxid ... as I said, these > names aren't very transparent) makes it to disk but not all the hint > bits do, you're at risk. Crash, restart, vacuum some other table, and > *now* the global min vacuumxid advances. The fact that we're > WAL-logging the relvacuumxid change makes this scenario exceedingly > probable, if no action is taken to force out the hint bits.
I don't agree: If the truncation points are at 1 million, 2 million etc, then if we advance the relvacuumxid from 1.2 million to 1.5 million, then crash, the hints bits for that last vacuum are lost. Sounds bad, but we have not truncated clog, so there is no danger. In order to truncate up to 2 million we need to re-vacuum; at that point we discover that the 1.5 million setting was wrong, realise it should have been 1.2 million but don't care because we now set it to 1.8 million etc. No problem, even with repeated crashes. We only flush when we move the counter past a truncation point. If you look at this another way, maybe you'll see what I'm saying: Only update relvacuumxid iff the update would allow us to truncate the clog. That way we leap forwards in 1 million Xid chunks, rounded down. No change to clog => no update => no danger that we need to flush to avoid. > The only alternative I can see is the one Heikki suggested: don't > truncate clog until the freeze horizon. That's safe (given the planned > change to WAL-log tuple freezing) and clean and simple, but a permanent > requirement of 250MB+ for pg_clog would put the final nail in the coffin > of PG's usability in small-disk-footprint environments. So I don't like > it much. I suppose it could be made more tolerable by reducing the > freeze horizon, say to 100M instead of 1G transactions. Anyone for a > GUC parameter? In a high-volume DB you'd want the larger setting to > minimize the amount of tuple freezing work. OTOH it seems like making > this configurable creates a nasty risk for PITR situations: a slave > that's configured with a smaller freeze window than the master is > probably not safe. If we need to, just put the CLOG seg size in pg_config_manual.h -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match