Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Uh, no, that's not very clear. A long-running transaction will be a VACUUM bottleneck because of its own XID, never mind its xmin. Well, my secondary addition was to start MyProc-xmin with the current xid counter, rather than your own xid. Don't tell me you seriously believe that would work. I do. Please tell me why it will not. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Well, my secondary addition was to start MyProc-xmin with the current xid counter, rather than your own xid. Don't tell me you seriously believe that would work. I do. Please tell me why it will not. You can't have GlobalXmin greater than your own xid, else log space (particularly pg_subtrans) may be recycled too soon. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Well, my secondary addition was to start MyProc-xmin with the current xid counter, rather than your own xid. Don't tell me you seriously believe that would work. I do. Please tell me why it will not. You can't have GlobalXmin greater than your own xid, else log space (particularly pg_subtrans) may be recycled too soon. OK, so maybe GlobalXmin would have to be split into two new variables that control log space and dead-row detection separately. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
On Fri, 2007-03-23 at 17:35 -0400, Tom Lane wrote: To make intra-transaction advancing of xmin possible, we'd need to explicitly track all of the backend's live snapshots, probably by introducing a snapshot cache manager that gives out tracked refcounts as we do for some other structures like catcache entries. This might have some other advantages (I think most of the current CopySnapshot operations could be replaced by refcount increments) Seems like we should do this for many reasons, whether or not this is one of them. I seem to have butted heads and lost more than once with not being able to tell which Snapshots exist. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
On Sat, 2007-03-24 at 11:48 -0400, Tom Lane wrote: Also, at some point a long-running transaction becomes a bottleneck simply because its XID is itself the oldest thing visible in the ProcArray and is determining everyone's xmin. How much daylight is there really between your xmin is old and your xid is old? Hmm, yes. How often do we have an LRT that consists of multiple statements of significant duration? Not often, I'd wager. How much does it cost to optimise for this case? Did Heikki's patch to move the xmin forward during VACUUM get rejected? That seems like it has a much wider use case. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Simon Riggs [EMAIL PROTECTED] writes: Did Heikki's patch to move the xmin forward during VACUUM get rejected? That seems like it has a much wider use case. It's still in the queue (which I have finally started to review in earnest). regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Simon Riggs wrote: On Sat, 2007-03-24 at 11:48 -0400, Tom Lane wrote: Also, at some point a long-running transaction becomes a bottleneck simply because its XID is itself the oldest thing visible in the ProcArray and is determining everyone's xmin. How much daylight is there really between your xmin is old and your xid is old? Hmm, yes. How often do we have an LRT that consists of multiple statements of significant duration? Not often, I'd wager. How much does it cost to optimise for this case? Zero cost. It involves just tracking if there are any old snapshots, and if not, update the proc xmin rather than skipping the asignment, like we do now. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Bruce Momjian [EMAIL PROTECTED] writes: Simon Riggs wrote: How much does it cost to optimise for this case? Zero cost. It involves just tracking if there are any old snapshots, I will be fascinated to hear how you define that as zero cost. It might be relatively low, but it's not zero, and for many simple cases (eg, all single-statement transactions) the benefit will indeed be zero. We need some serious consideration of the costs and benefits, not airy dismissals. I had originally thought that we could avoid CopySnapshot copying, which might buy back more than enough to cover the cost of tracking reference counts ... but in a quick look yesterday it seemed that the high-use code paths would still need a copy, because they are always copying off the static variables filled by GetTransactionSnapshot. Changing that would come with a tremendous memory penalty, or else a time penalty to scan the ProcArray twice to see how big the arrays need to be. [ thinks a bit... ] The other way we might possibly tackle that is to avoid copying by the expedient of just using those static snapshot variables in-place. SerializableSnapshot surely need never be copied since it remains unchanged till end of xact, and no use of the snap will survive longer than that. In simple cases LatestSnapshot is not overwritten until the prior value is no longer needed, either. If we could arrange to copy it only when setting up a cursor, or other long-lived usage, we'd be ahead of the game. But I'd certainly want a management layer in there to ensure that we don't overwrite a value that *is* still in use, and offhand I'm not sure what that layer would need to look like. Possibly some sort of double indirection so that callers would have a Snapshot pointer that remained valid after we'd copied off arrays that need to be updated? We already do have double indirection in the form of the Snapshot and xip[] pointers ... maybe attach refcounts to the xip arrays not the Snapshots per se? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
This is an optimization so I was thinking of something simpler, like incrementing/decrementing a counter every time we allocate/free a snapshot (like in the patch), and updating MyProc-xmin only if there are no open snapshots. I don't think it is worth tracking anything more complicated than that. Is that now possible to do cleanly? I am having trouble getting this to work in the attached patch. --- Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Simon Riggs wrote: How much does it cost to optimise for this case? Zero cost. It involves just tracking if there are any old snapshots, I will be fascinated to hear how you define that as zero cost. It might be relatively low, but it's not zero, and for many simple cases (eg, all single-statement transactions) the benefit will indeed be zero. We need some serious consideration of the costs and benefits, not airy dismissals. I had originally thought that we could avoid CopySnapshot copying, which might buy back more than enough to cover the cost of tracking reference counts ... but in a quick look yesterday it seemed that the high-use code paths would still need a copy, because they are always copying off the static variables filled by GetTransactionSnapshot. Changing that would come with a tremendous memory penalty, or else a time penalty to scan the ProcArray twice to see how big the arrays need to be. [ thinks a bit... ] The other way we might possibly tackle that is to avoid copying by the expedient of just using those static snapshot variables in-place. SerializableSnapshot surely need never be copied since it remains unchanged till end of xact, and no use of the snap will survive longer than that. In simple cases LatestSnapshot is not overwritten until the prior value is no longer needed, either. If we could arrange to copy it only when setting up a cursor, or other long-lived usage, we'd be ahead of the game. But I'd certainly want a management layer in there to ensure that we don't overwrite a value that *is* still in use, and offhand I'm not sure what that layer would need to look like. Possibly some sort of double indirection so that callers would have a Snapshot pointer that remained valid after we'd copied off arrays that need to be updated? We already do have double indirection in the form of the Snapshot and xip[] pointers ... maybe attach refcounts to the xip arrays not the Snapshots per se? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: src/backend/catalog/index.c === RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v retrieving revision 1.281 diff -c -c -r1.281 index.c *** src/backend/catalog/index.c 25 Mar 2007 19:45:14 - 1.281 --- src/backend/catalog/index.c 26 Mar 2007 21:06:30 - *** *** 1358,1364 ExprContext *econtext; Snapshot snapshot; TransactionId OldestXmin; ! /* * sanity checks */ --- 1358,1364 ExprContext *econtext; Snapshot snapshot; TransactionId OldestXmin; ! /* * sanity checks */ *** *** 1555,1561 ExecDropSingleTupleTableSlot(slot); FreeExecutorState(estate); ! /* These may have been pointing to the now-gone estate */ indexInfo-ii_ExpressionsState = NIL; indexInfo-ii_PredicateState = NIL; --- 1555,1563 ExecDropSingleTupleTableSlot(slot); FreeExecutorState(estate); ! if (snapshot != SnapshotNow) ! FreeSnapshot(snapshot); ! /* These may have been pointing to the now-gone estate */ indexInfo-ii_ExpressionsState = NIL; indexInfo-ii_PredicateState = NIL; Index: src/backend/commands/cluster.c === RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v retrieving revision 1.157 diff -c -c -r1.157 cluster.c *** src/backend/commands/cluster.c 13 Mar 2007 00:33:39 - 1.157 --- src/backend/commands/cluster.c 26 Mar 2007 21:06:30 - *** *** 204,209 --- 204,210 /* Start a new transaction for each relation. */ StartTransactionCommand(); /* functions in indexes may want a snapshot set */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); cluster_rel(rvtc, true); CommitTransactionCommand(); Index: src/backend/commands/indexcmds.c
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Bruce Momjian [EMAIL PROTECTED] writes: This is an optimization so I was thinking of something simpler, like incrementing/decrementing a counter every time we allocate/free a snapshot (like in the patch), and updating MyProc-xmin only if there are no open snapshots. I don't think it is worth tracking anything more complicated than that. Is that now possible to do cleanly? I am having trouble getting this to work in the attached patch. No kidding. Even if you get it to pass the regression tests, it'll be a constant source of bugs. IMHO the *only* way to do this reliably is to have a centralized reference-count management module. I've been thinking more about the avoid-copy idea and I think it will indeed work, and be simpler than what we have now, to boot. Ideas: * There is only one (pair of) statically allocated, max-size XID arrays for GetSnapshotData to read into. When we want to acquire a new snap from GetSnapshotData, we palloc a new SnapshotData struct (and no more), make it point at the statically allocated arrays, fill and return it. * There can be at most one SnapshotData struct using the statically allocated arrays. We remember its address in a static variable (which goes NULL when there is no such struct). If we have to acquire a new snapshot before the old one's refcount has gone to zero, then at that time we palloc arrays just big enough for the old one, copy the static data to there, insert the palloc'd array pointers into the old one, and clear the static variable (or more likely, immediately set it to point to the new one). * All the current CopySnapshot operations are replaced by refcount increments. We need explicit refcount-decrement calls added, and ResourceOwner support to make sure they happen. (I'm imagining that snapshots will now live in a dedicated context, so they don't go away implicitly but need explicit release operations. The ResourceOwner infrastructure will help us find any missed releases.) * Decrementing the refcount on a SnapshotData pfree's it if the refcount has gone to zero, and also clears the aforementioned static pointer variable if it happens to point at that selfsame struct. Note that for a serializable transaction, or a simple read-committed transaction that doesn't use more than one snap at a time, this will always happen and so there will be zero copying cost. * In this scheme the distinction between SerializableSnapshot and LatestSnapshot vanishes, at least at the level of memory management. Probably SerializableSnapshot would translate into a refcount held by xact.c or some other upper-level module. * We keep all the live MVCC snapshots in a linked list managed by a snapshot cache module. We can traverse this list to determine the minimum xmin at any instant. Freeing a snap whose xmin equals the current MyProc-xmin causes us to traverse the list, determine the new backend-wide xmin, and update MyProc-xmin. Freeing the last snap causes us to reset MyProc-xmin to 0. (I'm not completely sure, but I think that we can just set MyProc-xmin in these cases, without bothering to lock the ProcArray --- any comments on that?) * GetSnapshotData no longer cares about serializable vs nonserializable snaps, either. Its rule is just if MyProc-xmin is 0, then set it. This can happen only when creating the first snap in a transaction that currently has no live snaps. * The above two rules clearly suffice to maintain the invariant MyProc-xmin is the minimum of the xmins of the live snapshots in the transaction, or zero when there are none. Which is what we want. On the whole though I think we should let this idea go till 8.4; we have a lot to deal with for 8.3 and a definite shortage of evidence that advancing xmin will buy much. Mu gut feeling is that the above design would save about enough in snapshot-copying costs to pay for its extra management logic, but we won't come out ahead unless advancing xmin intra-transaction really helps, and I'm not sure I believe that (except in the special case of VACUUM, and we already have a solution for that, which would be independent of this). regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Tom Lane wrote: On the whole though I think we should let this idea go till 8.4; we have a lot to deal with for 8.3 and a definite shortage of evidence that advancing xmin will buy much. Mu gut feeling is that the above design would save about enough in snapshot-copying costs to pay for its extra management logic, but we won't come out ahead unless advancing xmin intra-transaction really helps, and I'm not sure I believe that (except in the special case of VACUUM, and we already have a solution for that, which would be independent of this). The improvement is going to be a win for multi-statement transactions --- the only question is how often are they long-running. It does seem best to put this on the TODO for 8.4, and I will do that now. The only thing that makes it tempting to get into 8.3 is that we could advertise this release as a major space reuse release because of HOT, autovacuum on by default, multiple autovacuum processes, and, if we added it, improved VACUUM for multi-statement transactions. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
I have a question about what would happen for a transaction running a command like COPY FROM. Is it possible it would manage to arrange to have no live snapshots at all? So it would have no impact on concurrent VACUUMs? What about something running a large pg_restore? Tom Lane [EMAIL PROTECTED] writes: On the whole though I think we should let this idea go till 8.4; I tend to agree but for a different reason. I think it's something that will open the doors for a lot of new ideas. If we put it in CVS HEAD early in 8.4 I think (or hope at any rate) we'll think of at least a few new things we can do with the new more precise information it exposes. Just as an example, if you find you have no live snapshots can you throw out the combocid hash? Any tuple you find with a combocid that's been discarded that way must predate your current scan and therefore is deleted for you. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Gregory Stark wrote: I have a question about what would happen for a transaction running a command like COPY FROM. Is it possible it would manage to arrange to have no live snapshots at all? So it would have no impact on concurrent VACUUMs? What about something running a large pg_restore? Interesting idea. If the table had triggers, it would need a snapshot, but if not, yea, that is certainly possible. --- Tom Lane [EMAIL PROTECTED] writes: On the whole though I think we should let this idea go till 8.4; I tend to agree but for a different reason. I think it's something that will open the doors for a lot of new ideas. If we put it in CVS HEAD early in 8.4 I think (or hope at any rate) we'll think of at least a few new things we can do with the new more precise information it exposes. Just as an example, if you find you have no live snapshots can you throw out the combocid hash? Any tuple you find with a combocid that's been discarded that way must predate your current scan and therefore is deleted for you. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: ... but we won't come out ahead unless advancing xmin intra-transaction really helps, and I'm not sure I believe that (except in the special case of VACUUM, and we already have a solution for that, which would be independent of this). The improvement is going to be a win for multi-statement transactions --- the only question is how often are they long-running. Uh, no, that's not very clear. A long-running transaction will be a VACUUM bottleneck because of its own XID, never mind its xmin. To make this helpful, you have to posit a lot of overlapping long-running transactions (such that the distance back to GlobalXmin might average about twice the distance back to the oldest live XID). That's not impossible but I wonder whether it's not mostly a token of bad application design. It does seem best to put this on the TODO for 8.4, and I will do that now. Agreed. Quite aside from the time needed for a reasonable implementation, we'd really need to do more performance-testing than we have time for now. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Bruce Momjian [EMAIL PROTECTED] writes: Gregory Stark wrote: I have a question about what would happen for a transaction running a command like COPY FROM. Is it possible it would manage to arrange to have no live snapshots at all? So it would have no impact on concurrent VACUUMs? What about something running a large pg_restore? Interesting idea. Indeed. Currently, COPY forcibly sets a snapshot on the off chance something will use it, but I could certainly see making that happen lazily, ie not at all in the simple case. pg_restore is probably a lost cause, at least if you are running it in single-transaction mode. I guess there'd be tradeoffs as to whether to do that or not ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: ... but we won't come out ahead unless advancing xmin intra-transaction really helps, and I'm not sure I believe that (except in the special case of VACUUM, and we already have a solution for that, which would be independent of this). The improvement is going to be a win for multi-statement transactions --- the only question is how often are they long-running. Uh, no, that's not very clear. A long-running transaction will be a VACUUM bottleneck because of its own XID, never mind its xmin. To make this helpful, you have to posit a lot of overlapping long-running transactions (such that the distance back to GlobalXmin might average about twice the distance back to the oldest live XID). That's not impossible but I wonder whether it's not mostly a token of bad application design. Well, my secondary addition was to start MyProc-xmin with the current xid counter, rather than your own xid. This is because your xid is already in progress and so will not be touched, so why not start with the current xid counter. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Uh, no, that's not very clear. A long-running transaction will be a VACUUM bottleneck because of its own XID, never mind its xmin. Well, my secondary addition was to start MyProc-xmin with the current xid counter, rather than your own xid. Don't tell me you seriously believe that would work. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Gregory Stark [EMAIL PROTECTED] writes: Well I think this would be the same infrastructure we would need to do the other discussed improvement to address pg_dump's impact. That would require us to publish the youngest xmax of the live snapshots. Vacuum could deduce that that xid cannot possibly see any transactions between the youngest extant xmax and the oldest in-progress transaction. ... and do what with the knowledge? Not remove tuples, because any such tuples would be part of RECENTLY_DEAD update chains that that xact might be following now or in the future. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Well I think this would be the same infrastructure we would need to do the other discussed improvement to address pg_dump's impact. That would require us to publish the youngest xmax of the live snapshots. Vacuum could deduce that that xid cannot possibly see any transactions between the youngest extant xmax and the oldest in-progress transaction. ... and do what with the knowledge? Not remove tuples, because any such tuples would be part of RECENTLY_DEAD update chains that that xact might be following now or in the future. Well that just means it might require extra work, not that it would be impossible. Firstly, some tuples would not be part of a chain and could be cleaned up anyways. Others would be part of a HOT chain which might make it easier to clean them up. But even for tuples that are part of a chain there may be solutions. We could truncate the tuple to just the MVCC information so subsequent transactions can find the head. Or we might be able to go back and edit the forward link to skip the dead intermediate tuples (and somehow deal with the race conditions...) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I have been thinking we could improve how quickly VACUUM can expire rows if we update procArray.xmin more frequently for non-SERIALIZABLE transactions. The attached patch updates procArray.xmin in this manner. This patch is incredibly broken. Didn't you understand what I said about how we don't track which snapshots are still alive? You can't advance procArray.xmin past the xmin of the oldest live snapshot in the backend, and you can't assume that there are no live snapshots at the places where this patch assumes that. (Open cursors are one obvious counterexample, but I think there are more.) To make intra-transaction advancing of xmin possible, we'd need to explicitly track all of the backend's live snapshots, probably by introducing a snapshot cache manager that gives out tracked refcounts as we do for some other structures like catcache entries. This might have some other advantages (I think most of the current CopySnapshot operations could be replaced by refcount increments) but it's a whole lot more complicated and invasive than what you've got here. I updated the patch to save the MyProc-xid at the time the first cursor is created, and not allow the MyProc-xid to be set lower than that saved value in the current transaction. It added only a few more lines to the patch. It seems to me a lot cleaner to do the reference counting like Tom suggested. Increase the refcount on CopySnapshot, and decrease it on FreeSnapshot. Assuming that all callers of CopySnapshot free the snapshot with FreeSnapshot when they're done with it. BTW: I really like the idea of doing this. It's a relatively simple thing to do and gives some immediate benefit. And it opens the door for more tricks to vacuum more aggressively in the presence of long-running transactions. And it allows us to vacuum tuples that were inserted and deleted in the same transactions even while the transaction is still running, which helps some pathological cases where a transaction updates a counter column many times within a transaction. We could also use it to free entries in the new combocids hash table earlier (not that it's a problem as it is, though). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Heikki Linnakangas [EMAIL PROTECTED] writes: It seems to me a lot cleaner to do the reference counting like Tom suggested. Increase the refcount on CopySnapshot, and decrease it on FreeSnapshot. Assuming that all callers of CopySnapshot free the snapshot with FreeSnapshot when they're done with it. I don't believe we bother at the moment; which is one of the reasons it'd be a nontrivial patch. I do think it might be worth doing though. In the simple case where you're just issuing successive non-cursor commands within a READ COMMITTED transaction, a refcounted implementation would be able to recognize that there are *no* live snapshots between commands and therefore reset MyProc-xmin to 0 whenever the backend is idle. OTOH, do we have any evidence that this is worth bothering with at all? I fear that the cases of long-running transactions that are problems in the real world wouldn't be helped much --- for instance, pg_dump wouldn't change behavior because it uses a serializable transaction. Also, at some point a long-running transaction becomes a bottleneck simply because its XID is itself the oldest thing visible in the ProcArray and is determining everyone's xmin. How much daylight is there really between your xmin is old and your xid is old? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: It seems to me a lot cleaner to do the reference counting like Tom suggested. Increase the refcount on CopySnapshot, and decrease it on FreeSnapshot. Assuming that all callers of CopySnapshot free the snapshot with FreeSnapshot when they're done with it. I don't believe we bother at the moment; which is one of the reasons it'd be a nontrivial patch. I do think it might be worth doing though. In the simple case where you're just issuing successive non-cursor commands within a READ COMMITTED transaction, a refcounted implementation would be able to recognize that there are *no* live snapshots between commands and therefore reset MyProc-xmin to 0 whenever the backend is idle. Attached is my current version of the patch. It doesn't work now that I tried to do reference count for Snapshots, but will stop now that Tom is considering redesigning the snapshot mechanism. OTOH, do we have any evidence that this is worth bothering with at all? I fear that the cases of long-running transactions that are problems in the real world wouldn't be helped much --- for instance, pg_dump wouldn't change behavior because it uses a serializable transaction. Also, at some point a long-running transaction becomes a bottleneck simply because its XID is itself the oldest thing visible in the ProcArray and is determining everyone's xmin. How much daylight is there really between your xmin is old and your xid is old? Well, interesting you mention that, because I have a second idea on how to improve things. We start with MyProc-xmin equal to our own xid, and then look for earlier transactions. It should be possible to skip considering our own xid for MyProc-xmin. This would obviously help VACUUM during long-running transactions. While our transaction is running, our xid isn't committed, so VACUUM isn't going to touch any of our rows, and if other transactions complete before our multi-transaction _statement_ starts, we can't see deleted rows from them transaction, so why keep the deleted rows around? Consider this case: Session #: 1 2 3 BEGIN; SELECT 1; CREATE TABLE test(x int); INSERT INTO test VALUES (1); DELETE FROM test; SELECT 1; VACUUM VERBOSE test; (row can be reused) COMMIT; VACUUM VERBOSE test; (normal row reuse) As I understand it, in READ COMMITTED mode, we have to skip transactions in progress when our _statement_ starts, but anything committed before that we see and we don't see dead rows created by them. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: src/backend/catalog/index.c === RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v retrieving revision 1.280 diff -c -c -r1.280 index.c *** src/backend/catalog/index.c 3 Mar 2007 20:08:41 - 1.280 --- src/backend/catalog/index.c 24 Mar 2007 19:17:56 - *** *** 1358,1364 ExprContext *econtext; Snapshot snapshot; TransactionId OldestXmin; ! /* * sanity checks */ --- 1358,1364 ExprContext *econtext; Snapshot snapshot; TransactionId OldestXmin; ! /* * sanity checks */ *** *** 1555,1561 ExecDropSingleTupleTableSlot(slot); FreeExecutorState(estate); ! /* These may have been pointing to the now-gone estate */ indexInfo-ii_ExpressionsState = NIL; indexInfo-ii_PredicateState = NIL; --- 1555,1563 ExecDropSingleTupleTableSlot(slot); FreeExecutorState(estate); ! if (snapshot != SnapshotNow) ! FreeSnapshot(snapshot); ! /* These may have been pointing to the now-gone estate */ indexInfo-ii_ExpressionsState = NIL; indexInfo-ii_PredicateState = NIL; Index: src/backend/commands/cluster.c === RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v retrieving revision 1.157 diff -c -c -r1.157 cluster.c *** src/backend/commands/cluster.c 13 Mar 2007 00:33:39 - 1.157 --- src/backend/commands/cluster.c 24 Mar 2007 19:17:56 - *** *** 204,209 --- 204,210 /* Start a new transaction for each relation. */ StartTransactionCommand(); /* functions in indexes may want a snapshot set */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
OTOH, do we have any evidence that this is worth bothering with at all? I fear that the cases of long-running transactions that are problems in the real world wouldn't be helped much --- for instance, pg_dump wouldn't change behavior because it uses a serializable transaction. Well I think this would be the same infrastructure we would need to do the other discussed improvement to address pg_dump's impact. That would require us to publish the youngest xmax of the live snapshots. Vacuum could deduce that that xid cannot possibly see any transactions between the youngest extant xmax and the oldest in-progress transaction. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Bruce Momjian wrote: The attached patch updates procArray.xmin in this manner. Here is an example of how the patch improves dead row reuse: I don't think this really works. Consider what happens if I change session 2 this way: Session #: 1 2 3 CREATE TABLE test(x int); INSERT INTO test VALUES (1); BEGIN; DELETE FROM test; BEGIN; DECLARE foo CURSOR FOR SELECT * FROM test; SELECT 1; COMMIT; VACUUM VERBOSE test; (row not reused) SELECT 1; FETCH * FROM foo; (At this point #2 doesn't see the test row anymore. Patch updates procArray.xmin.) VACUUM VERBOSE test; (row reused with patch) COMMIT; VACUUM VERBOSE test; (normal row reuse) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I have been thinking we could improve how quickly VACUUM can expire rows if we update procArray.xmin more frequently for non-SERIALIZABLE transactions. The attached patch updates procArray.xmin in this manner. This patch is incredibly broken. Didn't you understand what I said about how we don't track which snapshots are still alive? You can't advance procArray.xmin past the xmin of the oldest live snapshot in the backend, and you can't assume that there are no live snapshots at the places where this patch assumes that. (Open cursors are one obvious counterexample, but I think there are more.) To make intra-transaction advancing of xmin possible, we'd need to explicitly track all of the backend's live snapshots, probably by introducing a snapshot cache manager that gives out tracked refcounts as we do for some other structures like catcache entries. This might have some other advantages (I think most of the current CopySnapshot operations could be replaced by refcount increments) but it's a whole lot more complicated and invasive than what you've got here. I updated the patch to save the MyProc-xid at the time the first cursor is created, and not allow the MyProc-xid to be set lower than that saved value in the current transaction. It added only a few more lines to the patch. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: src/backend/access/transam/xact.c === RCS file: /cvsroot/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.238 diff -c -c -r1.238 xact.c *** src/backend/access/transam/xact.c 22 Mar 2007 19:55:04 - 1.238 --- src/backend/access/transam/xact.c 24 Mar 2007 02:18:41 - *** *** 1563,1569 LWLockRelease(ProcArrayLock); } ! PG_TRACE1(transaction__commit, s-transactionId); /* --- 1563,1570 LWLockRelease(ProcArrayLock); } ! set_portal_min_xid(InvalidTransactionId); ! PG_TRACE1(transaction__commit, s-transactionId); /* *** *** 1802,1807 --- 1803,1810 LWLockRelease(ProcArrayLock); + set_portal_min_xid(InvalidTransactionId); + /* * This is all post-transaction cleanup. Note that if an error is raised * here, it's too late to abort the transaction. This should be just *** *** 1969,1974 --- 1972,1979 LWLockRelease(ProcArrayLock); } + set_portal_min_xid(InvalidTransactionId); + PG_TRACE1(transaction__abort, s-transactionId); /* Index: src/backend/catalog/index.c === RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v retrieving revision 1.280 diff -c -c -r1.280 index.c *** src/backend/catalog/index.c 3 Mar 2007 20:08:41 - 1.280 --- src/backend/catalog/index.c 24 Mar 2007 02:18:42 - *** *** 1394,1400 } else if (indexInfo-ii_Concurrent) { ! snapshot = CopySnapshot(GetTransactionSnapshot()); OldestXmin = InvalidTransactionId; /* not used */ } else --- 1394,1400 } else if (indexInfo-ii_Concurrent) { ! snapshot = CopySnapshot(GetTransactionSnapshot(false)); OldestXmin = InvalidTransactionId; /* not used */ } else Index: src/backend/commands/cluster.c === RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v retrieving revision 1.157 diff -c -c -r1.157 cluster.c *** src/backend/commands/cluster.c 13 Mar 2007 00:33:39 - 1.157 --- src/backend/commands/cluster.c 24 Mar 2007 02:18:42 - *** *** 204,210 /* Start a new transaction for each relation. */ StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); cluster_rel(rvtc, true); CommitTransactionCommand(); } --- 204,210 /* Start a new transaction for each relation. */ StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true)); cluster_rel(rvtc, true); CommitTransactionCommand(); } Index: src/backend/commands/indexcmds.c === RCS file: /cvsroot/pgsql/src/backend/commands/indexcmds.c,v retrieving revision 1.157 diff -c -c -r1.157 indexcmds.c *** src/backend/commands/indexcmds.c 13 Mar 2007 00:33:39 - 1.157 --- src/backend/commands/indexcmds.c 24 Mar 2007 02:18:42 - *** *** 493,499 * We also set ActiveSnapshot to this snap, since functions in indexes may * need a snapshot. */ ! snapshot = CopySnapshot(GetTransactionSnapshot());