Re: [HACKERS] Vacuum dead tuples that are between transactions
On Wed, 2006-03-01 at 10:22 -0600, Kevin Grittner wrote: On Tue, Feb 28, 2006 at 7:22 am, in message [EMAIL PROTECTED], Simon Riggs [EMAIL PROTECTED] wrote: OTOH a few hackers discussed this recently and found that nobody used serializable transactions (ST) except during pg_dump. I've not been able to keep up with all messages on these lists, and I missed that discussion. It was a verbal discussion, hence not recorded on list. I should have said nobody on that discussion; I had no doubt somebody used them. My mention of that wasn't to add weight to the thought, just to mention a quick straw poll had been taken... We use serializable transactions heavily; our whole middle tier architecture depends on having that transaction isolation level for all requests which modify data. (You probably don't want to hear the details.) *I* would, but others may not. ;-) It would be OK (although a little disappointing) if VACUUM enhancements weren't as beneficial to us as a result; it would render PostgreSQL entirely unusable for us if the integrity of serializable transactions was broken unless we added some other, non-standard steps to run them. I would never suggest breaking STs; they are part of the SQL standard. I merely suggested an extra, optional API by which ST users could provide additional information that could help others avoid pessimal decisions in order to preserve correctness. We only use pg_dump for version upgrades and other special cases. PITR is our main backup technique. Cool. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Vacuum dead tuples that are between transactions
On Tue, 2006-02-28 at 01:32 -0500, Tom Lane wrote: Paul Tillotson [EMAIL PROTECTED] writes: The topic of improving vacuum for use in heavy-update environments seems to come up frequently on the list. Has anyone weighed the costs of allowing VACUUM to reclaim tuples that are not older than the oldest transaction but are nonetheless invisible to all running transactions? It seems that it's not that hard It's not that easy either --- you are assuming that every process advertises far more of its internal state than it actually does. Suppose that in the system, we have a serializable transaction with xid 1000 and a read committed transaction with xid 1001. Other than these two, the oldest running xid is 2000. Suppose we consider a tuple with xmin 1200 and xmax 1201. We will assume that xid 1201 committed before xid 2000 began to run. So: (A) This tuple is invisible to the serializable transaction, since its snapshot can't ever advance. Wrong --- you can't assume that simply from the transaction numbering, even assuming that you know that 1000 is serializable. 1000 might not have set its snapshot until quite some time after it started. (This is even pretty likely, if it waited for some locks before setting the snapshot.) You'd need access to the snapshot 1000 is actually using to be sure which later transactions are invisible to it. While advertising whole snapshots (rather than just xmin) in shared memory is at least theoretically possible, the costs of doing that seem nontrivial to me ... and they'd have to be paid whether any savings ensued or not. (B) The read committed transaction might be able to see it. However, if its current command started AFTER xid 1201 committed, it can't. Another issue is that there's not just one single snapshot to worry about per backend. Cursors for instance capture their own snaps. So a backend would have to somehow keep track of the oldest live snapshot it has internally. The read committed ones are a more difficult matter, but I think you can treat a tuple as dead if it was inserted after the read committed transaction started to run AND the tuple was deleted before the transaction's currently running command started to run. To do that requires not just that you have access to a backend's oldest snapshot, but that you have access to *all* its active snapshots; because such a transient tuple might be visible in some newer snap even though it's too new for the oldest snap. Doing that will create very significant problems of shared memory management, as well as performance and locking issues. There's been some talk of distinguishing global and within database xmin values, so that a long-lived transaction wouldn't interfere with vacuuming tables in other databases that that xact couldn't possibly access. That seems doable to me, but I think any finer-grained analysis is probably going to be a net loss because of the distributed overhead it'd impose on every transaction. Paul raises some thoughts that are worth considering, even with the usual minefield of difficulties. Paul, you mention serializable transactions, but your root issue seems to be that VACUUM clears up less rows when pg_dump is running, yes? Have you tried using an on-line hot backup with archive_command set (PITR)? That doesn't suffer from the same issue and is faster too. OTOH a few hackers discussed this recently and found that nobody used serializable transactions (ST) except during pg_dump. It seems a reasonable observation that *most* STs are pg_dumps, or at very least: the longest running STs are pg_dumps. So rather than changing all transaction modes, or even special-casing STs, why not put in some infrastructure to cope specifically with the problems that pg_dump can cause? A general facility that would allow STs to identify which tables they would/would not touch again could be used by pg_dump to advertise useful information. That information could then be picked up by a VACUUM: when locking to get xmin it would see an ST, then retrieve the information to allow it to work out a per-table xmin. Non-VACUUM transactions would ignore any special ST information, causing very low overhead for normal operation (checking whether each current transaction was an ST, which mostly will be predicted correctly as no by the CPU). You could take that further and get pg_dump to use a list file like pg_restore. You would then be able to *where possible* alter the sequence of data dumping so that heavily updated tables were dumped first so the dumping ST could then advertise no further access to particular tables. VACUUMs could then proceed as if the ST were not there at all. Or maybe at least the idea of some special case ST behaviour might be worthy of some thought. I've no intention of working on this myself, especially since PITR provides an alternate backup solution anyway (even in combination
Re: [HACKERS] Vacuum dead tuples that are between transactions
On Tue, Feb 28, 2006 at 01:22:35PM +, Simon Riggs wrote: Paul, you mention serializable transactions, but your root issue seems to be that VACUUM clears up less rows when pg_dump is running, yes? Have you tried using an on-line hot backup with archive_command set (PITR)? That doesn't suffer from the same issue and is faster too. OTOH a few hackers discussed this recently and found that nobody used serializable transactions (ST) except during pg_dump. It seems a reasonable observation that *most* STs are pg_dumps, or at very least: the longest running STs are pg_dumps. So rather than changing all transaction modes, or even special-casing STs, why not put in some infrastructure to cope specifically with the problems that pg_dump can cause? While it's not currently serialized, another big candidate IMO is vacuum itself. Vacuuming a large table in a database that also sees heavy update activity can be a real nightmare, because dead space piles up in the updated tables while the long vacuum is running. Although there's probably any number of ways that this problem could be addressed, making vacuum a serialized transaction (which shouldn't be an issue, afaik) and creating a generic framework that optimizes for that case would win in more than one place. Also, does this really only apply to serialized transactions? As the OP stated, if a row couldn't possibly exist to a specific (old) transaction, it should be safe to vacuum it... A general facility that would allow STs to identify which tables they would/would not touch again could be used by pg_dump to advertise useful information. That information could then be picked up by a VACUUM: when locking to get xmin it would see an ST, then retrieve the information to allow it to work out a per-table xmin. Non-VACUUM transactions would ignore any special ST information, causing very low overhead for normal operation (checking whether each current transaction was an ST, which mostly will be predicted correctly as no by the CPU). You could take that further and get pg_dump to use a list file like pg_restore. You would then be able to *where possible* alter the sequence of data dumping so that heavily updated tables were dumped first so the dumping ST could then advertise no further access to particular tables. VACUUMs could then proceed as if the ST were not there at all. Or maybe at least the idea of some special case ST behaviour might be worthy of some thought. I've no intention of working on this myself, especially since PITR provides an alternate backup solution anyway (even in combination with other techniques), but the idea seems worth recording for others to discuss. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Vacuum dead tuples that are between transactions
On Tue, Feb 28, 2006 at 01:32:19AM -0500, Tom Lane wrote: To do that requires not just that you have access to a backend's oldest snapshot, but that you have access to *all* its active snapshots; because such a transient tuple might be visible in some newer snap even though it's too new for the oldest snap. Doing that will create very significant problems of shared memory management, as well as performance and locking issues. There's been some talk of distinguishing global and within database xmin values, so that a long-lived transaction wouldn't interfere with vacuuming tables in other databases that that xact couldn't possibly access. That seems doable to me, but I think any finer-grained analysis is probably going to be a net loss because of the distributed overhead it'd impose on every transaction. True, but we don't need this for every transaction, only long-running ones. And in most cases, it'd probably be safe to define 'long-running' in terms of minutes. Presumably, a mechanism similar to statement_timeout could be used to 'publish' the required state information after a given period of time. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Vacuum dead tuples that are between transactions
The topic of improving vacuum for use in heavy-update environments seems to come up frequently on the list. Has anyone weighed the costs of allowing VACUUM to reclaim tuples that are not older than the oldest transaction but are nonetheless invisible to all running transactions? It seems that it's not that hard Currently, a tuple is not elligible to be reclaimed by vacuum unless it was deleted by a transaction that committed before the oldest currently running transaction committed. (i.e., it's xmax is known to have committed before the oldest-currently-running xid was started.) Right? However, it seems like under certain scenarios (heavy updates to small tables while a long-running transaction is occurring) there might be a lot of tuples that are invisible to all transactions but not able to be vacuumed under the current method. Example: updating a single row over and over again while pg_dump is running. Suppose that in the system, we have a serializable transaction with xid 1000 and a read committed transaction with xid 1001. Other than these two, the oldest running xid is 2000. Suppose we consider a tuple with xmin 1200 and xmax 1201. We will assume that xid 1201 committed before xid 2000 began to run. So: (A) This tuple is invisible to the serializable transaction, since its snapshot can't ever advance. (B) The read committed transaction might be able to see it. However, if its current command started AFTER xid 1201 committed, it can't. Unless I'm missing something, it seems that when vacuuming you can leave serializable transactions (like pg_dump) out of the calculation of the oldest running transaction so long as you keep a list of them and check each tuple T against each serializable transaction X to make sure that T's xmin is greater than X, or else T's xmax committed before X started to run. Of course this is a lot of work, but this should mitigate the effect of long running serializable transactions until such time as processor power becomes your limiting factor. The read committed ones are a more difficult matter, but I think you can treat a tuple as dead if it was inserted after the read committed transaction started to run AND the tuple was deleted before the transaction's currently running command started to run. I suppose the major difficulty here is that currently a transaction has no way of knowing when another backend's command started to run? Is this too difficult to do or is it a good idea that no one has enough 'round tuits for? Regards, Paul Tillotson ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Vacuum dead tuples that are between transactions
Paul Tillotson [EMAIL PROTECTED] writes: The topic of improving vacuum for use in heavy-update environments seems to come up frequently on the list. Has anyone weighed the costs of allowing VACUUM to reclaim tuples that are not older than the oldest transaction but are nonetheless invisible to all running transactions? It seems that it's not that hard It's not that easy either --- you are assuming that every process advertises far more of its internal state than it actually does. Suppose that in the system, we have a serializable transaction with xid 1000 and a read committed transaction with xid 1001. Other than these two, the oldest running xid is 2000. Suppose we consider a tuple with xmin 1200 and xmax 1201. We will assume that xid 1201 committed before xid 2000 began to run. So: (A) This tuple is invisible to the serializable transaction, since its snapshot can't ever advance. Wrong --- you can't assume that simply from the transaction numbering, even assuming that you know that 1000 is serializable. 1000 might not have set its snapshot until quite some time after it started. (This is even pretty likely, if it waited for some locks before setting the snapshot.) You'd need access to the snapshot 1000 is actually using to be sure which later transactions are invisible to it. While advertising whole snapshots (rather than just xmin) in shared memory is at least theoretically possible, the costs of doing that seem nontrivial to me ... and they'd have to be paid whether any savings ensued or not. (B) The read committed transaction might be able to see it. However, if its current command started AFTER xid 1201 committed, it can't. Another issue is that there's not just one single snapshot to worry about per backend. Cursors for instance capture their own snaps. So a backend would have to somehow keep track of the oldest live snapshot it has internally. The read committed ones are a more difficult matter, but I think you can treat a tuple as dead if it was inserted after the read committed transaction started to run AND the tuple was deleted before the transaction's currently running command started to run. To do that requires not just that you have access to a backend's oldest snapshot, but that you have access to *all* its active snapshots; because such a transient tuple might be visible in some newer snap even though it's too new for the oldest snap. Doing that will create very significant problems of shared memory management, as well as performance and locking issues. There's been some talk of distinguishing global and within database xmin values, so that a long-lived transaction wouldn't interfere with vacuuming tables in other databases that that xact couldn't possibly access. That seems doable to me, but I think any finer-grained analysis is probably going to be a net loss because of the distributed overhead it'd impose on every transaction. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings