On Wed, 2005-07-20 at 09:24 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > On Tue, 2005-07-19 at 22:24 -0400, Tom Lane wrote: > >> Simon Riggs <[EMAIL PROTECTED]> writes: > >>> Short patch enclosed to turn off writing of commit-status hint bits. > >> > >> Doesn't this entirely destroy the ability to truncate clog, and > >> therefore the ability to survive XID wraparound? > > > I hope not for all our sakes, since the hint bits are not WAL logged and > > anything that relies upon them would be fragile. > > We don't rely on any one write of them to work, but that doesn't mean > that we can indefinitely postpone writing them.
OK, I think I understand where you're coming from now. A table can't be migrated to read-only media until all of its tuples have an xmin of FrozenTransactionId; I said the following, which was wrong: "This should allow migration of older child tables to hierarchical storage when using a large historical table design." The patch doesn't directly contribute to that goal, though is of value in a large historical table design with many read only child tables (and other situations). But that comment wasn't the only inspiration for the patch. When VACUUM freezes the xid, it *does* make sense at that point to update the hint bits as a performance optimization. That isn't required though, and *can* be indefinitely postponed, AFAICS. All of the tqual routines will still work just as well without the hint bits set. If you know different, I'll need a deeper explanation before I understand. IMHO, the direction the patch is going in is still worthwhile because of these issues: 1. Any block read may attempt to set hint bits, which dirties the block and must be written out. So *reads* can result in heavier write activity at checkpoint time. That effects both OLTP and DW systems: Random read transactions against a large table will be worst effected, since we may end up writing the block once for each read. 2. A lazy vacuum may also dirty a block, even when it has done nothing else useful to that block. Worse, if we vacuum a table that is bigger than shared_buffers (or close), then we will end up having to evict dirty buffers that the vacuum itself has written in order to continue the vacuum. Since Vacuum is two-pass, we may end up writing a block *twice*, once where we set the hint bits and then again later where we remove the tuples and re-write. So this patch will allow a normal VACUUM to perform better on larger tables. (1) is a pain, but there's no point solving it without also solving (2). The patch would fail an Assert test during a VACUUM, since the info bits are not actually set if cache_txn_status_with_data = false during VACUUM. As a result of (2), perhaps we should remove all of the SetBufferCommitInfoNeedsSave calls in HeapTupleSatisfiesVacuum, and add a call to SetBufferCommitInfoNeedsSave that overrides cache_txn_status_with_data when we actually freeze a row. (Or perhaps that should be a VACUUM FAST command?) That way we would set the hint bits *only* when we freeze a row and not at any other time. If we further reduced the number of times we dirty the block *at all* on the first pass of a VACUUM, we would reduce the chance of writing twice. We could save the setting of frozen transactions until the second phase, i.e. only dirty the block if (pgchanged && vacrelstats->num_dead_tuples > prev_dead_count) The patch sets cache_txn_status_with_data as a USERSET, with the intention that particular read-only users would not wish to have their read-only transactions turn into write transactions. There was no intention to prevent VACUUM, not to avoid the optimisation of writing hint bits on a VACUUM FREEZE nor to set full read only status - which we discussed previously but is a much longer project. Sorry for any confusion caused in my initial patch submission. Does my longer explanation make sense of what the patch is trying to achieve. Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq