Re: [HACKERS] Vacuum dead tuples that are between transactions

2006-03-01 Thread Simon Riggs
On Wed, 2006-03-01 at 10:22 -0600, Kevin Grittner wrote:
  On Tue, Feb 28, 2006 at  7:22 am, in message
 [EMAIL PROTECTED], Simon Riggs
 [EMAIL PROTECTED] wrote: 
  
  OTOH a few hackers discussed this recently and found that nobody
 used
  serializable transactions (ST) except during pg_dump.
 
 I've not been able to keep up with all messages on these lists, and I
 missed that discussion.

It was a verbal discussion, hence not recorded on list. I should have
said nobody on that discussion; I had no doubt somebody used them. My
mention of that wasn't to add weight to the thought, just to mention a
quick straw poll had been taken...

 We use serializable transactions heavily; our whole middle tier
 architecture depends on having that transaction isolation level for all
 requests which modify data.  (You probably don't want to hear the
 details.) 

*I* would, but others may not. ;-)

  It would be OK (although a little disappointing) if VACUUM
 enhancements weren't as beneficial to us as a result; it would render
 PostgreSQL entirely unusable for us if the integrity of serializable
 transactions was broken unless we added some other, non-standard steps
 to run them.

I would never suggest breaking STs; they are part of the SQL standard. I
merely suggested an extra, optional API by which ST users could provide
additional information that could help others avoid pessimal decisions
in order to preserve correctness.

 We only use pg_dump for version upgrades and other special cases.  PITR
 is our main backup technique.

Cool.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Vacuum dead tuples that are between transactions

2006-02-28 Thread Simon Riggs
On Tue, 2006-02-28 at 01:32 -0500, Tom Lane wrote:
 Paul Tillotson [EMAIL PROTECTED] writes:
  The topic of improving vacuum for use in heavy-update environments seems 
  to come up frequently on the list.  Has anyone weighed the costs of 
  allowing VACUUM to reclaim tuples that are not older than the oldest 
  transaction but are nonetheless invisible to all running transactions?  
  It seems that it's not that hard
 
 It's not that easy either --- you are assuming that every process
 advertises far more of its internal state than it actually does.
 
  Suppose that in the system, we have a serializable transaction with xid 
  1000 and a read committed transaction with xid 1001.  Other than these 
  two, the oldest running xid is 2000.
 
  Suppose we consider a tuple with xmin 1200 and xmax 1201.  We will 
  assume that xid 1201 committed before xid 2000 began to run.
 
  So:
 
  (A) This tuple is invisible to the serializable transaction, since its 
  snapshot can't ever advance.
 
 Wrong --- you can't assume that simply from the transaction numbering,
 even assuming that you know that 1000 is serializable.  1000 might not
 have set its snapshot until quite some time after it started.  (This is
 even pretty likely, if it waited for some locks before setting the
 snapshot.)  You'd need access to the snapshot 1000 is actually using to
 be sure which later transactions are invisible to it.
 
 While advertising whole snapshots (rather than just xmin) in shared
 memory is at least theoretically possible, the costs of doing that seem
 nontrivial to me ... and they'd have to be paid whether any savings
 ensued or not.
 
  (B) The read committed transaction might be able to see it.  However, if 
  its current command started AFTER xid 1201 committed, it can't. 
 
 Another issue is that there's not just one single snapshot to worry
 about per backend.  Cursors for instance capture their own snaps.
 So a backend would have to somehow keep track of the oldest live
 snapshot it has internally.
 
  The read committed ones are a more difficult matter, but I think you can 
  treat a tuple as dead if it was inserted after the read committed 
  transaction started to run AND the tuple was deleted before the 
  transaction's currently running command started to run.
 
 To do that requires not just that you have access to a backend's oldest
 snapshot, but that you have access to *all* its active snapshots;
 because such a transient tuple might be visible in some newer snap even
 though it's too new for the oldest snap.  Doing that will create very
 significant problems of shared memory management, as well as performance
 and locking issues.
 
 There's been some talk of distinguishing global and within database
 xmin values, so that a long-lived transaction wouldn't interfere with
 vacuuming tables in other databases that that xact couldn't possibly
 access.  That seems doable to me, but I think any finer-grained analysis
 is probably going to be a net loss because of the distributed overhead
 it'd impose on every transaction.

Paul raises some thoughts that are worth considering, even with the
usual minefield of difficulties.

Paul, you mention serializable transactions, but your root issue seems
to be that VACUUM clears up less rows when pg_dump is running, yes? Have
you tried using an on-line hot backup with archive_command set (PITR)?
That doesn't suffer from the same issue and is faster too.

OTOH a few hackers discussed this recently and found that nobody used
serializable transactions (ST) except during pg_dump. It seems a
reasonable observation that *most* STs are pg_dumps, or at very least:
the longest running STs are pg_dumps. So rather than changing all
transaction modes, or even special-casing STs, why not put in some
infrastructure to cope specifically with the problems that pg_dump can
cause? 

A general facility that would allow STs to identify which tables they
would/would not touch again could be used by pg_dump to advertise useful
information. That information could then be picked up by a VACUUM: when
locking to get xmin it would see an ST, then retrieve the information to
allow it to work out a per-table xmin. Non-VACUUM transactions would
ignore any special ST information, causing very low overhead for normal
operation (checking whether each current transaction was an ST, which
mostly will be predicted correctly as no by the CPU).

You could take that further and get pg_dump to use a list file like
pg_restore. You would then be able to *where possible* alter the
sequence of data dumping so that heavily updated tables were dumped
first so the dumping ST could then advertise no further access to
particular tables. VACUUMs could then proceed as if the ST were not
there at all.

Or maybe at least the idea of some special case ST behaviour might be
worthy of some thought.

I've no intention of working on this myself, especially since PITR
provides an alternate backup solution anyway (even in combination 

Re: [HACKERS] Vacuum dead tuples that are between transactions

2006-02-28 Thread Jim C. Nasby
On Tue, Feb 28, 2006 at 01:22:35PM +, Simon Riggs wrote:
 Paul, you mention serializable transactions, but your root issue seems
 to be that VACUUM clears up less rows when pg_dump is running, yes? Have
 you tried using an on-line hot backup with archive_command set (PITR)?
 That doesn't suffer from the same issue and is faster too.
 
 OTOH a few hackers discussed this recently and found that nobody used
 serializable transactions (ST) except during pg_dump. It seems a
 reasonable observation that *most* STs are pg_dumps, or at very least:
 the longest running STs are pg_dumps. So rather than changing all
 transaction modes, or even special-casing STs, why not put in some
 infrastructure to cope specifically with the problems that pg_dump can
 cause? 

While it's not currently serialized, another big candidate IMO is vacuum
itself. Vacuuming a large table in a database that also sees heavy
update activity can be a real nightmare, because dead space piles up in
the updated tables while the long vacuum is running. Although there's
probably any number of ways that this problem could be addressed, making
vacuum a serialized transaction (which shouldn't be an issue, afaik) and
creating a generic framework that optimizes for that case would win in
more than one place.

Also, does this really only apply to serialized transactions? As the OP
stated, if a row couldn't possibly exist to a specific (old)
transaction, it should be safe to vacuum it...

 A general facility that would allow STs to identify which tables they
 would/would not touch again could be used by pg_dump to advertise useful
 information. That information could then be picked up by a VACUUM: when
 locking to get xmin it would see an ST, then retrieve the information to
 allow it to work out a per-table xmin. Non-VACUUM transactions would
 ignore any special ST information, causing very low overhead for normal
 operation (checking whether each current transaction was an ST, which
 mostly will be predicted correctly as no by the CPU).
 
 You could take that further and get pg_dump to use a list file like
 pg_restore. You would then be able to *where possible* alter the
 sequence of data dumping so that heavily updated tables were dumped
 first so the dumping ST could then advertise no further access to
 particular tables. VACUUMs could then proceed as if the ST were not
 there at all.
 
 Or maybe at least the idea of some special case ST behaviour might be
 worthy of some thought.
 
 I've no intention of working on this myself, especially since PITR
 provides an alternate backup solution anyway (even in combination with
 other techniques), but the idea seems worth recording for others to
 discuss.
 
 Best Regards, Simon Riggs
 
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Vacuum dead tuples that are between transactions

2006-02-28 Thread Jim C. Nasby
On Tue, Feb 28, 2006 at 01:32:19AM -0500, Tom Lane wrote:
 To do that requires not just that you have access to a backend's oldest
 snapshot, but that you have access to *all* its active snapshots;
 because such a transient tuple might be visible in some newer snap even
 though it's too new for the oldest snap.  Doing that will create very
 significant problems of shared memory management, as well as performance
 and locking issues.
 
 There's been some talk of distinguishing global and within database
 xmin values, so that a long-lived transaction wouldn't interfere with
 vacuuming tables in other databases that that xact couldn't possibly
 access.  That seems doable to me, but I think any finer-grained analysis
 is probably going to be a net loss because of the distributed overhead
 it'd impose on every transaction.

True, but we don't need this for every transaction, only long-running
ones. And in most cases, it'd probably be safe to define 'long-running'
in terms of minutes. Presumably, a mechanism similar to
statement_timeout could be used to 'publish' the required state
information after a given period of time.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] Vacuum dead tuples that are between transactions

2006-02-27 Thread Paul Tillotson
The topic of improving vacuum for use in heavy-update environments seems 
to come up frequently on the list.  Has anyone weighed the costs of 
allowing VACUUM to reclaim tuples that are not older than the oldest 
transaction but are nonetheless invisible to all running transactions?  
It seems that it's not that hard


Currently, a tuple is not elligible to be reclaimed by vacuum unless it 
was deleted by a transaction that committed before the oldest currently 
running transaction committed. (i.e., it's xmax is known to have 
committed before the oldest-currently-running xid was started.)  Right?


However, it seems like under certain scenarios (heavy updates to small 
tables while a long-running transaction is occurring) there might be a 
lot of tuples that are invisible to all transactions but not able to be 
vacuumed under the current method.  Example: updating a single row over 
and over again while pg_dump is running.


Suppose that in the system, we have a serializable transaction with xid 
1000 and a read committed transaction with xid 1001.  Other than these 
two, the oldest running xid is 2000.


Suppose we consider a tuple with xmin 1200 and xmax 1201.  We will 
assume that xid 1201 committed before xid 2000 began to run.


So:

(A) This tuple is invisible to the serializable transaction, since its 
snapshot can't ever advance.


(B) The read committed transaction might be able to see it.  However, if 
its current command started AFTER xid 1201 committed, it can't. 

Unless I'm missing something, it seems that when vacuuming you can leave 
serializable transactions (like pg_dump) out of the calculation of the 
oldest running transaction so long as you keep a list of them and 
check each tuple T against each serializable transaction X to make sure 
that T's xmin is greater than X, or else T's xmax committed before X 
started to run.  Of course this is a lot of work, but this should 
mitigate the effect of long running serializable transactions until such 
time as processor power becomes your limiting factor.


The read committed ones are a more difficult matter, but I think you can 
treat a tuple as dead if it was inserted after the read committed 
transaction started to run AND the tuple was deleted before the 
transaction's currently running command started to run.  I suppose the 
major difficulty here is that currently a transaction has no way of 
knowing when another backend's command started to run?


Is this too difficult to do or is it a good idea that no one has enough 
'round tuits for?


Regards,

Paul Tillotson

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Vacuum dead tuples that are between transactions

2006-02-27 Thread Tom Lane
Paul Tillotson [EMAIL PROTECTED] writes:
 The topic of improving vacuum for use in heavy-update environments seems 
 to come up frequently on the list.  Has anyone weighed the costs of 
 allowing VACUUM to reclaim tuples that are not older than the oldest 
 transaction but are nonetheless invisible to all running transactions?  
 It seems that it's not that hard

It's not that easy either --- you are assuming that every process
advertises far more of its internal state than it actually does.

 Suppose that in the system, we have a serializable transaction with xid 
 1000 and a read committed transaction with xid 1001.  Other than these 
 two, the oldest running xid is 2000.

 Suppose we consider a tuple with xmin 1200 and xmax 1201.  We will 
 assume that xid 1201 committed before xid 2000 began to run.

 So:

 (A) This tuple is invisible to the serializable transaction, since its 
 snapshot can't ever advance.

Wrong --- you can't assume that simply from the transaction numbering,
even assuming that you know that 1000 is serializable.  1000 might not
have set its snapshot until quite some time after it started.  (This is
even pretty likely, if it waited for some locks before setting the
snapshot.)  You'd need access to the snapshot 1000 is actually using to
be sure which later transactions are invisible to it.

While advertising whole snapshots (rather than just xmin) in shared
memory is at least theoretically possible, the costs of doing that seem
nontrivial to me ... and they'd have to be paid whether any savings
ensued or not.

 (B) The read committed transaction might be able to see it.  However, if 
 its current command started AFTER xid 1201 committed, it can't. 

Another issue is that there's not just one single snapshot to worry
about per backend.  Cursors for instance capture their own snaps.
So a backend would have to somehow keep track of the oldest live
snapshot it has internally.

 The read committed ones are a more difficult matter, but I think you can 
 treat a tuple as dead if it was inserted after the read committed 
 transaction started to run AND the tuple was deleted before the 
 transaction's currently running command started to run.

To do that requires not just that you have access to a backend's oldest
snapshot, but that you have access to *all* its active snapshots;
because such a transient tuple might be visible in some newer snap even
though it's too new for the oldest snap.  Doing that will create very
significant problems of shared memory management, as well as performance
and locking issues.

There's been some talk of distinguishing global and within database
xmin values, so that a long-lived transaction wouldn't interfere with
vacuuming tables in other databases that that xact couldn't possibly
access.  That seems doable to me, but I think any finer-grained analysis
is probably going to be a net loss because of the distributed overhead
it'd impose on every transaction.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings