Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-27 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Uh, no, that's not very clear.  A long-running transaction will be a
  VACUUM bottleneck because of its own XID, never mind its xmin.
 
  Well, my secondary addition was to start MyProc-xmin with the current
  xid counter, rather than your own xid.
 
 Don't tell me you seriously believe that would work.

I do.  Please tell me why it will not.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-27 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 Well, my secondary addition was to start MyProc-xmin with the current
 xid counter, rather than your own xid.
 
 Don't tell me you seriously believe that would work.

 I do.  Please tell me why it will not.

You can't have GlobalXmin greater than your own xid, else log space
(particularly pg_subtrans) may be recycled too soon.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-27 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
  Well, my secondary addition was to start MyProc-xmin with the current
  xid counter, rather than your own xid.
  
  Don't tell me you seriously believe that would work.
 
  I do.  Please tell me why it will not.
 
 You can't have GlobalXmin greater than your own xid, else log space
 (particularly pg_subtrans) may be recycled too soon.

OK, so maybe GlobalXmin would have to be split into two new variables
that control log space and dead-row detection separately.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Simon Riggs
On Fri, 2007-03-23 at 17:35 -0400, Tom Lane wrote:
 To make intra-transaction advancing of xmin possible, we'd need to
 explicitly track all of the backend's live snapshots, probably by
 introducing a snapshot cache manager that gives out tracked
 refcounts
 as we do for some other structures like catcache entries.  This might
 have some other advantages (I think most of the current CopySnapshot
 operations could be replaced by refcount increments) 

Seems like we should do this for many reasons, whether or not this is
one of them. I seem to have butted heads and lost more than once with
not being able to tell which Snapshots exist.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Simon Riggs
On Sat, 2007-03-24 at 11:48 -0400, Tom Lane wrote:

 Also, at some point a long-running transaction becomes a bottleneck
 simply because its XID is itself the oldest thing visible in the
 ProcArray and is determining everyone's xmin.  How much daylight is
 there really between your xmin is old and your xid is old?

Hmm, yes. How often do we have an LRT that consists of multiple
statements of significant duration? Not often, I'd wager. 

How much does it cost to optimise for this case?

Did Heikki's patch to move the xmin forward during VACUUM get rejected?
That seems like it has a much wider use case.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Did Heikki's patch to move the xmin forward during VACUUM get rejected?
 That seems like it has a much wider use case.

It's still in the queue (which I have finally started to review in
earnest).

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Bruce Momjian
Simon Riggs wrote:
 On Sat, 2007-03-24 at 11:48 -0400, Tom Lane wrote:
 
  Also, at some point a long-running transaction becomes a bottleneck
  simply because its XID is itself the oldest thing visible in the
  ProcArray and is determining everyone's xmin.  How much daylight is
  there really between your xmin is old and your xid is old?
 
 Hmm, yes. How often do we have an LRT that consists of multiple
 statements of significant duration? Not often, I'd wager. 
 
 How much does it cost to optimise for this case?

Zero cost.  It involves just tracking if there are any old snapshots,
and if not, update the proc xmin rather than skipping the asignment,
like we do now.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Simon Riggs wrote:
 How much does it cost to optimise for this case?

 Zero cost.  It involves just tracking if there are any old snapshots,

I will be fascinated to hear how you define that as zero cost.  It might
be relatively low, but it's not zero, and for many simple cases (eg,
all single-statement transactions) the benefit will indeed be zero.
We need some serious consideration of the costs and benefits, not airy
dismissals.

I had originally thought that we could avoid CopySnapshot copying,
which might buy back more than enough to cover the cost of tracking
reference counts ... but in a quick look yesterday it seemed that the
high-use code paths would still need a copy, because they are always
copying off the static variables filled by GetTransactionSnapshot.
Changing that would come with a tremendous memory penalty, or else
a time penalty to scan the ProcArray twice to see how big the arrays
need to be.

[ thinks a bit... ]  The other way we might possibly tackle that is
to avoid copying by the expedient of just using those static snapshot
variables in-place.  SerializableSnapshot surely need never be copied
since it remains unchanged till end of xact, and no use of the snap
will survive longer than that.  In simple cases LatestSnapshot is not
overwritten until the prior value is no longer needed, either.  If
we could arrange to copy it only when setting up a cursor, or other
long-lived usage, we'd be ahead of the game.  But I'd certainly want
a management layer in there to ensure that we don't overwrite a value
that *is* still in use, and offhand I'm not sure what that layer would
need to look like.  Possibly some sort of double indirection so that
callers would have a Snapshot pointer that remained valid after we'd
copied off arrays that need to be updated?  We already do have double
indirection in the form of the Snapshot and xip[] pointers ... maybe
attach refcounts to the xip arrays not the Snapshots per se?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Bruce Momjian

This is an optimization so I was thinking of something simpler, like 
incrementing/decrementing a counter every time we allocate/free a
snapshot (like in the patch), and updating MyProc-xmin only if there
are no open snapshots.  I don't think it is worth tracking anything more
complicated than that.  Is that now possible to do cleanly?  I am having
trouble getting this to work in the attached patch.

---

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Simon Riggs wrote:
  How much does it cost to optimise for this case?
 
  Zero cost.  It involves just tracking if there are any old snapshots,
 
 I will be fascinated to hear how you define that as zero cost.  It might
 be relatively low, but it's not zero, and for many simple cases (eg,
 all single-statement transactions) the benefit will indeed be zero.
 We need some serious consideration of the costs and benefits, not airy
 dismissals.
 
 I had originally thought that we could avoid CopySnapshot copying,
 which might buy back more than enough to cover the cost of tracking
 reference counts ... but in a quick look yesterday it seemed that the
 high-use code paths would still need a copy, because they are always
 copying off the static variables filled by GetTransactionSnapshot.
 Changing that would come with a tremendous memory penalty, or else
 a time penalty to scan the ProcArray twice to see how big the arrays
 need to be.
 
 [ thinks a bit... ]  The other way we might possibly tackle that is
 to avoid copying by the expedient of just using those static snapshot
 variables in-place.  SerializableSnapshot surely need never be copied
 since it remains unchanged till end of xact, and no use of the snap
 will survive longer than that.  In simple cases LatestSnapshot is not
 overwritten until the prior value is no longer needed, either.  If
 we could arrange to copy it only when setting up a cursor, or other
 long-lived usage, we'd be ahead of the game.  But I'd certainly want
 a management layer in there to ensure that we don't overwrite a value
 that *is* still in use, and offhand I'm not sure what that layer would
 need to look like.  Possibly some sort of double indirection so that
 callers would have a Snapshot pointer that remained valid after we'd
 copied off arrays that need to be updated?  We already do have double
 indirection in the form of the Snapshot and xip[] pointers ... maybe
 attach refcounts to the xip arrays not the Snapshots per se?
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/catalog/index.c
===
RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.281
diff -c -c -r1.281 index.c
*** src/backend/catalog/index.c	25 Mar 2007 19:45:14 -	1.281
--- src/backend/catalog/index.c	26 Mar 2007 21:06:30 -
***
*** 1358,1364 
  	ExprContext *econtext;
  	Snapshot	snapshot;
  	TransactionId OldestXmin;
! 
  	/*
  	 * sanity checks
  	 */
--- 1358,1364 
  	ExprContext *econtext;
  	Snapshot	snapshot;
  	TransactionId OldestXmin;
! 	
  	/*
  	 * sanity checks
  	 */
***
*** 1555,1561 
  	ExecDropSingleTupleTableSlot(slot);
  
  	FreeExecutorState(estate);
! 
  	/* These may have been pointing to the now-gone estate */
  	indexInfo-ii_ExpressionsState = NIL;
  	indexInfo-ii_PredicateState = NIL;
--- 1555,1563 
  	ExecDropSingleTupleTableSlot(slot);
  
  	FreeExecutorState(estate);
! 	if (snapshot != SnapshotNow)
! 		FreeSnapshot(snapshot);
! 		
  	/* These may have been pointing to the now-gone estate */
  	indexInfo-ii_ExpressionsState = NIL;
  	indexInfo-ii_PredicateState = NIL;
Index: src/backend/commands/cluster.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.157
diff -c -c -r1.157 cluster.c
*** src/backend/commands/cluster.c	13 Mar 2007 00:33:39 -	1.157
--- src/backend/commands/cluster.c	26 Mar 2007 21:06:30 -
***
*** 204,209 
--- 204,210 
  			/* Start a new transaction for each relation. */
  			StartTransactionCommand();
  			/* functions in indexes may want a snapshot set */
+ 			FreeSnapshot(ActiveSnapshot);
  			ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
  			cluster_rel(rvtc, true);
  			CommitTransactionCommand();
Index: src/backend/commands/indexcmds.c

Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 This is an optimization so I was thinking of something simpler, like 
 incrementing/decrementing a counter every time we allocate/free a
 snapshot (like in the patch), and updating MyProc-xmin only if there
 are no open snapshots.  I don't think it is worth tracking anything more
 complicated than that.  Is that now possible to do cleanly?  I am having
 trouble getting this to work in the attached patch.

No kidding.

Even if you get it to pass the regression tests, it'll be a constant
source of bugs.  IMHO the *only* way to do this reliably is to have
a centralized reference-count management module.

I've been thinking more about the avoid-copy idea and I think it will
indeed work, and be simpler than what we have now, to boot.  Ideas:

* There is only one (pair of) statically allocated, max-size XID arrays
for GetSnapshotData to read into.  When we want to acquire a new snap
from GetSnapshotData, we palloc a new SnapshotData struct (and no more),
make it point at the statically allocated arrays, fill and return it.

* There can be at most one SnapshotData struct using the statically
allocated arrays.  We remember its address in a static variable (which
goes NULL when there is no such struct).  If we have to acquire a new
snapshot before the old one's refcount has gone to zero, then at that
time we palloc arrays just big enough for the old one, copy the static
data to there, insert the palloc'd array pointers into the old one,
and clear the static variable (or more likely, immediately set it to
point to the new one).

* All the current CopySnapshot operations are replaced by refcount
increments.  We need explicit refcount-decrement calls added, and
ResourceOwner support to make sure they happen.  (I'm imagining that
snapshots will now live in a dedicated context, so they don't go away
implicitly but need explicit release operations.  The ResourceOwner
infrastructure will help us find any missed releases.)

* Decrementing the refcount on a SnapshotData pfree's it if the refcount
has gone to zero, and also clears the aforementioned static pointer
variable if it happens to point at that selfsame struct.  Note that for
a serializable transaction, or a simple read-committed transaction
that doesn't use more than one snap at a time, this will always happen
and so there will be zero copying cost.

* In this scheme the distinction between SerializableSnapshot and
LatestSnapshot vanishes, at least at the level of memory management.
Probably SerializableSnapshot would translate into a refcount held by
xact.c or some other upper-level module.

* We keep all the live MVCC snapshots in a linked list managed by
a snapshot cache module.  We can traverse this list to determine
the minimum xmin at any instant.  Freeing a snap whose xmin equals
the current MyProc-xmin causes us to traverse the list, determine
the new backend-wide xmin, and update MyProc-xmin.  Freeing the
last snap causes us to reset MyProc-xmin to 0.  (I'm not completely
sure, but I think that we can just set MyProc-xmin in these cases,
without bothering to lock the ProcArray --- any comments on that?)

* GetSnapshotData no longer cares about serializable vs nonserializable
snaps, either.  Its rule is just if MyProc-xmin is 0, then set it.
This can happen only when creating the first snap in a transaction that
currently has no live snaps.

* The above two rules clearly suffice to maintain the invariant
MyProc-xmin is the minimum of the xmins of the live snapshots in the
transaction, or zero when there are none.  Which is what we want.


On the whole though I think we should let this idea go till 8.4; we have
a lot to deal with for 8.3 and a definite shortage of evidence that
advancing xmin will buy much.  Mu gut feeling is that the above design
would save about enough in snapshot-copying costs to pay for its extra
management logic, but we won't come out ahead unless advancing xmin
intra-transaction really helps, and I'm not sure I believe that (except
in the special case of VACUUM, and we already have a solution for that,
which would be independent of this).

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Bruce Momjian
Tom Lane wrote:
 On the whole though I think we should let this idea go till 8.4; we have
 a lot to deal with for 8.3 and a definite shortage of evidence that
 advancing xmin will buy much.  Mu gut feeling is that the above design
 would save about enough in snapshot-copying costs to pay for its extra
 management logic, but we won't come out ahead unless advancing xmin
 intra-transaction really helps, and I'm not sure I believe that (except
 in the special case of VACUUM, and we already have a solution for that,
 which would be independent of this).

The improvement is going to be a win for multi-statement transactions
--- the only question is how often are they long-running.

It does seem best to put this on the TODO for 8.4, and I will do that
now.  The only thing that makes it tempting to get into 8.3 is that we
could advertise this release as a major space reuse release because of
HOT, autovacuum on by default, multiple autovacuum processes, and, if we
added it, improved VACUUM for multi-statement transactions.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Gregory Stark

I have a question about what would happen for a transaction running a command
like COPY FROM. Is it possible it would manage to arrange to have no live
snapshots at all? So it would have no impact on concurrent VACUUMs? What about
something running a large pg_restore?

Tom Lane [EMAIL PROTECTED] writes:

 On the whole though I think we should let this idea go till 8.4; 

I tend to agree but for a different reason. I think it's something that will
open the doors for a lot of new ideas. If we put it in CVS HEAD early in 8.4 I
think (or hope at any rate) we'll think of at least a few new things we can do
with the new more precise information it exposes.

Just as an example, if you find you have no live snapshots can you throw out
the combocid hash? Any tuple you find with a combocid that's been discarded
that way must predate your current scan and therefore is deleted for you.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Bruce Momjian
Gregory Stark wrote:
 
 I have a question about what would happen for a transaction running a command
 like COPY FROM. Is it possible it would manage to arrange to have no live
 snapshots at all? So it would have no impact on concurrent VACUUMs? What about
 something running a large pg_restore?

Interesting idea.  If the table had triggers, it would need a snapshot,
but if not, yea, that is certainly possible.

---


 
 Tom Lane [EMAIL PROTECTED] writes:
 
  On the whole though I think we should let this idea go till 8.4; 
 
 I tend to agree but for a different reason. I think it's something that will
 open the doors for a lot of new ideas. If we put it in CVS HEAD early in 8.4 I
 think (or hope at any rate) we'll think of at least a few new things we can do
 with the new more precise information it exposes.
 
 Just as an example, if you find you have no live snapshots can you throw out
 the combocid hash? Any tuple you find with a combocid that's been discarded
 that way must predate your current scan and therefore is deleted for you.
 
 -- 
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com
 
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 ... but we won't come out ahead unless advancing xmin
 intra-transaction really helps, and I'm not sure I believe that (except
 in the special case of VACUUM, and we already have a solution for that,
 which would be independent of this).

 The improvement is going to be a win for multi-statement transactions
 --- the only question is how often are they long-running.

Uh, no, that's not very clear.  A long-running transaction will be a
VACUUM bottleneck because of its own XID, never mind its xmin.  To make
this helpful, you have to posit a lot of overlapping long-running
transactions (such that the distance back to GlobalXmin might average
about twice the distance back to the oldest live XID).  That's not
impossible but I wonder whether it's not mostly a token of bad
application design.

 It does seem best to put this on the TODO for 8.4, and I will do that
 now.

Agreed.  Quite aside from the time needed for a reasonable
implementation, we'd really need to do more performance-testing than we
have time for now.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Gregory Stark wrote:
 I have a question about what would happen for a transaction running a command
 like COPY FROM. Is it possible it would manage to arrange to have no live
 snapshots at all? So it would have no impact on concurrent VACUUMs? What 
 about
 something running a large pg_restore?

 Interesting idea.

Indeed.  Currently, COPY forcibly sets a snapshot on the off chance
something will use it, but I could certainly see making that happen
lazily, ie not at all in the simple case.

pg_restore is probably a lost cause, at least if you are running it
in single-transaction mode.  I guess there'd be tradeoffs as to whether
to do that or not ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  ... but we won't come out ahead unless advancing xmin
  intra-transaction really helps, and I'm not sure I believe that (except
  in the special case of VACUUM, and we already have a solution for that,
  which would be independent of this).
 
  The improvement is going to be a win for multi-statement transactions
  --- the only question is how often are they long-running.
 
 Uh, no, that's not very clear.  A long-running transaction will be a
 VACUUM bottleneck because of its own XID, never mind its xmin.  To make
 this helpful, you have to posit a lot of overlapping long-running
 transactions (such that the distance back to GlobalXmin might average
 about twice the distance back to the oldest live XID).  That's not
 impossible but I wonder whether it's not mostly a token of bad
 application design.

Well, my secondary addition was to start MyProc-xmin with the current
xid counter, rather than your own xid.  This is because your xid is
already in progress and so will not be touched, so why not start with
the current xid counter.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Uh, no, that's not very clear.  A long-running transaction will be a
 VACUUM bottleneck because of its own XID, never mind its xmin.

 Well, my secondary addition was to start MyProc-xmin with the current
 xid counter, rather than your own xid.

Don't tell me you seriously believe that would work.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-25 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Well I think this would be the same infrastructure we would need to do
 the other discussed improvement to address pg_dump's impact. That
 would require us to publish the youngest xmax of the live
 snapshots. Vacuum could deduce that that xid cannot possibly see any
 transactions between the youngest extant xmax and the oldest
 in-progress transaction.

... and do what with the knowledge?  Not remove tuples, because any such
tuples would be part of RECENTLY_DEAD update chains that that xact might
be following now or in the future.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-25 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Well I think this would be the same infrastructure we would need to do
 the other discussed improvement to address pg_dump's impact. That
 would require us to publish the youngest xmax of the live
 snapshots. Vacuum could deduce that that xid cannot possibly see any
 transactions between the youngest extant xmax and the oldest
 in-progress transaction.

 ... and do what with the knowledge?  Not remove tuples, because any such
 tuples would be part of RECENTLY_DEAD update chains that that xact might
 be following now or in the future.

Well that just means it might require extra work, not that it would be
impossible. 

Firstly, some tuples would not be part of a chain and could be cleaned up
anyways. Others would be part of a HOT chain which might make it easier to
clean them up.

But even for tuples that are part of a chain there may be solutions. We could
truncate the tuple to just the MVCC information so subsequent transactions can
find the head. Or we might be able to go back and edit the forward link to
skip the dead intermediate tuples (and somehow deal with the race
conditions...)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-24 Thread Heikki Linnakangas

Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian [EMAIL PROTECTED] writes:

I have been thinking we could improve how quickly VACUUM can expire rows
if we update procArray.xmin more frequently for non-SERIALIZABLE
transactions.
The attached patch updates procArray.xmin in this manner.

This patch is incredibly broken.  Didn't you understand what I said
about how we don't track which snapshots are still alive?  You can't
advance procArray.xmin past the xmin of the oldest live snapshot in the
backend, and you can't assume that there are no live snapshots at the
places where this patch assumes that.  (Open cursors are one obvious
counterexample, but I think there are more.)

To make intra-transaction advancing of xmin possible, we'd need to
explicitly track all of the backend's live snapshots, probably by
introducing a snapshot cache manager that gives out tracked refcounts
as we do for some other structures like catcache entries.  This might
have some other advantages (I think most of the current CopySnapshot
operations could be replaced by refcount increments) but it's a whole
lot more complicated and invasive than what you've got here.


I updated the patch to save the MyProc-xid at the time the first cursor
is created, and not allow the MyProc-xid to be set lower than that
saved value in the current transaction.  It added only a few more lines
to the patch.


It seems to me a lot cleaner to do the reference counting like Tom 
suggested. Increase the refcount on CopySnapshot, and decrease it on 
FreeSnapshot. Assuming that all callers of CopySnapshot free the 
snapshot with FreeSnapshot when they're done with it.


BTW: I really like the idea of doing this. It's a relatively simple 
thing to do and gives some immediate benefit. And it opens the door for 
more tricks to vacuum more aggressively in the presence of long-running 
transactions. And it allows us to vacuum tuples that were inserted and 
deleted in the same transactions even while the transaction is still 
running, which helps some pathological cases where a transaction updates 
a counter column many times within a transaction. We could also use it 
to free entries in the new combocids hash table earlier (not that it's a 
problem as it is, though).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-24 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 It seems to me a lot cleaner to do the reference counting like Tom 
 suggested. Increase the refcount on CopySnapshot, and decrease it on 
 FreeSnapshot. Assuming that all callers of CopySnapshot free the 
 snapshot with FreeSnapshot when they're done with it.

I don't believe we bother at the moment; which is one of the reasons
it'd be a nontrivial patch.  I do think it might be worth doing though.
In the simple case where you're just issuing successive non-cursor
commands within a READ COMMITTED transaction, a refcounted
implementation would be able to recognize that there are *no* live
snapshots between commands and therefore reset MyProc-xmin to 0
whenever the backend is idle.

OTOH, do we have any evidence that this is worth bothering with at all?
I fear that the cases of long-running transactions that are problems
in the real world wouldn't be helped much --- for instance, pg_dump
wouldn't change behavior because it uses a serializable transaction.
Also, at some point a long-running transaction becomes a bottleneck
simply because its XID is itself the oldest thing visible in the
ProcArray and is determining everyone's xmin.  How much daylight is
there really between your xmin is old and your xid is old?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-24 Thread Bruce Momjian
Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  It seems to me a lot cleaner to do the reference counting like Tom 
  suggested. Increase the refcount on CopySnapshot, and decrease it on 
  FreeSnapshot. Assuming that all callers of CopySnapshot free the 
  snapshot with FreeSnapshot when they're done with it.
 
 I don't believe we bother at the moment; which is one of the reasons
 it'd be a nontrivial patch.  I do think it might be worth doing though.
 In the simple case where you're just issuing successive non-cursor
 commands within a READ COMMITTED transaction, a refcounted
 implementation would be able to recognize that there are *no* live
 snapshots between commands and therefore reset MyProc-xmin to 0
 whenever the backend is idle.

Attached is my current version of the patch.  It doesn't work now that I
tried to do reference count for Snapshots, but will stop now that Tom is
considering redesigning the snapshot mechanism.

 OTOH, do we have any evidence that this is worth bothering with at all?
 I fear that the cases of long-running transactions that are problems
 in the real world wouldn't be helped much --- for instance, pg_dump
 wouldn't change behavior because it uses a serializable transaction.
 Also, at some point a long-running transaction becomes a bottleneck
 simply because its XID is itself the oldest thing visible in the
 ProcArray and is determining everyone's xmin.  How much daylight is
 there really between your xmin is old and your xid is old?

Well, interesting you mention that, because I have a second idea on how
to improve things.  We start with MyProc-xmin equal to our own xid, and
then look for earlier transactions.  It should be possible to skip
considering our own xid for MyProc-xmin.  This would obviously help
VACUUM during long-running transactions.  While our transaction is
running, our xid isn't committed, so VACUUM isn't going to touch any of
our rows, and if other transactions complete before our
multi-transaction _statement_ starts, we can't see deleted rows from
them transaction, so why keep the deleted rows around?  Consider this
case:

Session #:
1   2   3
BEGIN;
SELECT 1;
CREATE TABLE test(x int);
INSERT INTO test VALUES (1);
DELETE FROM test;
SELECT 1;
VACUUM VERBOSE test;
(row can be reused)
COMMIT;
VACUUM VERBOSE test;
(normal row reuse)

As I understand it, in READ COMMITTED mode, we have to skip
transactions in progress when our _statement_ starts, but anything
committed before that we see and we don't see dead rows created by them.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/catalog/index.c
===
RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.280
diff -c -c -r1.280 index.c
*** src/backend/catalog/index.c	3 Mar 2007 20:08:41 -	1.280
--- src/backend/catalog/index.c	24 Mar 2007 19:17:56 -
***
*** 1358,1364 
  	ExprContext *econtext;
  	Snapshot	snapshot;
  	TransactionId OldestXmin;
! 
  	/*
  	 * sanity checks
  	 */
--- 1358,1364 
  	ExprContext *econtext;
  	Snapshot	snapshot;
  	TransactionId OldestXmin;
! 	
  	/*
  	 * sanity checks
  	 */
***
*** 1555,1561 
  	ExecDropSingleTupleTableSlot(slot);
  
  	FreeExecutorState(estate);
! 
  	/* These may have been pointing to the now-gone estate */
  	indexInfo-ii_ExpressionsState = NIL;
  	indexInfo-ii_PredicateState = NIL;
--- 1555,1563 
  	ExecDropSingleTupleTableSlot(slot);
  
  	FreeExecutorState(estate);
! 	if (snapshot != SnapshotNow)
! 		FreeSnapshot(snapshot);
! 		
  	/* These may have been pointing to the now-gone estate */
  	indexInfo-ii_ExpressionsState = NIL;
  	indexInfo-ii_PredicateState = NIL;
Index: src/backend/commands/cluster.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.157
diff -c -c -r1.157 cluster.c
*** src/backend/commands/cluster.c	13 Mar 2007 00:33:39 -	1.157
--- src/backend/commands/cluster.c	24 Mar 2007 19:17:56 -
***
*** 204,209 
--- 204,210 
  			/* Start a new transaction for each relation. */
  			StartTransactionCommand();
  			/* functions in indexes may want a snapshot set */
+ 			FreeSnapshot(ActiveSnapshot);
  			ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
  	

Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-24 Thread Gregory Stark
 OTOH, do we have any evidence that this is worth bothering with at all?
 I fear that the cases of long-running transactions that are problems
 in the real world wouldn't be helped much --- for instance, pg_dump
 wouldn't change behavior because it uses a serializable transaction.
Well I think this would be the same infrastructure we would need to do the
other discussed improvement to address pg_dump's impact. That would require us
to publish the youngest xmax of the live snapshots. Vacuum could deduce that
that xid cannot possibly see any transactions between the youngest extant xmax
and the oldest in-progress transaction.
-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-23 Thread Alvaro Herrera
Bruce Momjian wrote:

 The attached patch updates procArray.xmin in this manner.  Here is an
 example of how the patch improves dead row reuse:

I don't think this really works.  Consider what happens if I change
session 2 this way:

 Session #:
 1   2   3
 
 CREATE TABLE test(x int);
 INSERT INTO test VALUES (1);
 BEGIN;
 DELETE FROM test;
 BEGIN;
  DECLARE foo CURSOR FOR
  SELECT * FROM test;
 SELECT 1;
 COMMIT;
 VACUUM VERBOSE test;
 (row not reused)
 SELECT 1;
  FETCH * FROM foo;
 
 (At this point #2 doesn't see
  the test row anymore.  Patch 
  updates procArray.xmin.)
 
 VACUUM VERBOSE test;
 (row reused with 
 patch)
 COMMIT;
 VACUUM VERBOSE test;
 (normal row reuse)


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-23 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  I have been thinking we could improve how quickly VACUUM can expire rows
  if we update procArray.xmin more frequently for non-SERIALIZABLE
  transactions.
  The attached patch updates procArray.xmin in this manner.
 
 This patch is incredibly broken.  Didn't you understand what I said
 about how we don't track which snapshots are still alive?  You can't
 advance procArray.xmin past the xmin of the oldest live snapshot in the
 backend, and you can't assume that there are no live snapshots at the
 places where this patch assumes that.  (Open cursors are one obvious
 counterexample, but I think there are more.)
 
 To make intra-transaction advancing of xmin possible, we'd need to
 explicitly track all of the backend's live snapshots, probably by
 introducing a snapshot cache manager that gives out tracked refcounts
 as we do for some other structures like catcache entries.  This might
 have some other advantages (I think most of the current CopySnapshot
 operations could be replaced by refcount increments) but it's a whole
 lot more complicated and invasive than what you've got here.

I updated the patch to save the MyProc-xid at the time the first cursor
is created, and not allow the MyProc-xid to be set lower than that
saved value in the current transaction.  It added only a few more lines
to the patch.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/access/transam/xact.c
===
RCS file: /cvsroot/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.238
diff -c -c -r1.238 xact.c
*** src/backend/access/transam/xact.c	22 Mar 2007 19:55:04 -	1.238
--- src/backend/access/transam/xact.c	24 Mar 2007 02:18:41 -
***
*** 1563,1569 
  
  		LWLockRelease(ProcArrayLock);
  	}
! 
  	PG_TRACE1(transaction__commit, s-transactionId);
  
  	/*
--- 1563,1570 
  
  		LWLockRelease(ProcArrayLock);
  	}
! 	set_portal_min_xid(InvalidTransactionId);
! 	
  	PG_TRACE1(transaction__commit, s-transactionId);
  
  	/*
***
*** 1802,1807 
--- 1803,1810 
  
  	LWLockRelease(ProcArrayLock);
  
+ 	set_portal_min_xid(InvalidTransactionId);
+ 
  	/*
  	 * This is all post-transaction cleanup.  Note that if an error is raised
  	 * here, it's too late to abort the transaction.  This should be just
***
*** 1969,1974 
--- 1972,1979 
  		LWLockRelease(ProcArrayLock);
  	}
  
+ 	set_portal_min_xid(InvalidTransactionId);
+ 
  	PG_TRACE1(transaction__abort, s-transactionId);
  
  	/*
Index: src/backend/catalog/index.c
===
RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.280
diff -c -c -r1.280 index.c
*** src/backend/catalog/index.c	3 Mar 2007 20:08:41 -	1.280
--- src/backend/catalog/index.c	24 Mar 2007 02:18:42 -
***
*** 1394,1400 
  	}
  	else if (indexInfo-ii_Concurrent)
  	{
! 		snapshot = CopySnapshot(GetTransactionSnapshot());
  		OldestXmin = InvalidTransactionId;		/* not used */
  	}
  	else
--- 1394,1400 
  	}
  	else if (indexInfo-ii_Concurrent)
  	{
! 		snapshot = CopySnapshot(GetTransactionSnapshot(false));
  		OldestXmin = InvalidTransactionId;		/* not used */
  	}
  	else
Index: src/backend/commands/cluster.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.157
diff -c -c -r1.157 cluster.c
*** src/backend/commands/cluster.c	13 Mar 2007 00:33:39 -	1.157
--- src/backend/commands/cluster.c	24 Mar 2007 02:18:42 -
***
*** 204,210 
  			/* Start a new transaction for each relation. */
  			StartTransactionCommand();
  			/* functions in indexes may want a snapshot set */
! 			ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
  			cluster_rel(rvtc, true);
  			CommitTransactionCommand();
  		}
--- 204,210 
  			/* Start a new transaction for each relation. */
  			StartTransactionCommand();
  			/* functions in indexes may want a snapshot set */
! 			ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true));
  			cluster_rel(rvtc, true);
  			CommitTransactionCommand();
  		}
Index: src/backend/commands/indexcmds.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/indexcmds.c,v
retrieving revision 1.157
diff -c -c -r1.157 indexcmds.c
*** src/backend/commands/indexcmds.c	13 Mar 2007 00:33:39 -	1.157
--- src/backend/commands/indexcmds.c	24 Mar 2007 02:18:42 -
***
*** 493,499 
  	 * We also set ActiveSnapshot to this snap, since functions in indexes may
  	 * need a snapshot.
  	 */
! 	snapshot = CopySnapshot(GetTransactionSnapshot());