Re: [HACKERS] COPY with hints, rebirth
On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch n...@leadboat.com wrote: On Fri, Mar 02, 2012 at 08:46:45AM +, Simon Riggs wrote: On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: It's still broken: [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO] So this approach isn't the one... The COPY FREEZE patch provides a way for the user to say explicitly that they don't really care about these MVCC corner cases and as a result allows us to avoid touching XidInMVCCSnapshot() at all. So there is still a patch on the table. You can salvage the optimization by tightening its prerequisite: use it when the current subtransaction or a child thereof created or truncated the table. A parent subtransaction having done so is acceptable for the WAL avoidance optimization but not for this. I misread your earlier comment. Yes, that will make this work correctly. Incidentally, I contend that we should write frozen tuples to new/truncated tables unconditionally. The current behavior of making old snapshots see the table as empty violates atomicity at least as badly as letting those snapshots see the future-snapshot contents. But Marti has a sound proposal that would interact with your efforts here to avoid violating atomicity at all: http://archives.postgresql.org/message-id/cabrt9rbrmdsoz8kxgehfb4lg-ev9u67-6dlqvoiibpkkhtl...@mail.gmail.com Thank you for bringing this to my attention. This will make this optimisation work correctly without adding anything to the main code path of XidInMVCCSnapshot() and without the annoying FREEZE syntax. So this puts the patch squarely back on the table. I'll do another version of this later today designed to work with the StrictMVCC patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
On Sat, Mar 3, 2012 at 1:14 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch n...@leadboat.com wrote: On Fri, Mar 02, 2012 at 08:46:45AM +, Simon Riggs wrote: On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: It's still broken: [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO] So this approach isn't the one... The COPY FREEZE patch provides a way for the user to say explicitly that they don't really care about these MVCC corner cases and as a result allows us to avoid touching XidInMVCCSnapshot() at all. So there is still a patch on the table. You can salvage the optimization by tightening its prerequisite: use it when the current subtransaction or a child thereof created or truncated the table. A parent subtransaction having done so is acceptable for the WAL avoidance optimization but not for this. I misread your earlier comment. Yes, that will make this work correctly. Incidentally, I contend that we should write frozen tuples to new/truncated tables unconditionally. The current behavior of making old snapshots see the table as empty violates atomicity at least as badly as letting those snapshots see the future-snapshot contents. But Marti has a sound proposal that would interact with your efforts here to avoid violating atomicity at all: http://archives.postgresql.org/message-id/cabrt9rbrmdsoz8kxgehfb4lg-ev9u67-6dlqvoiibpkkhtl...@mail.gmail.com Thank you for bringing this to my attention. This will make this optimisation work correctly without adding anything to the main code path of XidInMVCCSnapshot() and without the annoying FREEZE syntax. So this puts the patch squarely back on the table. I'll do another version of this later today designed to work with the StrictMVCC patch. OK, so latest version of patch handling the subtransaction issues correctly. Thanks to Noah and Heikki for identifying them and hitting me with the clue stick. So this version of patch has negligible effect on mainline performance though leaves newly loaded data visible in ways that would break MVCC. So this patch relies on the safe_truncate.v2.patch posted on separate thread. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index c910863..b891327 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2050,12 +2050,27 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, tup-t_data-t_infomask = ~(HEAP_XACT_MASK); tup-t_data-t_infomask2 = ~(HEAP2_XACT_MASK); tup-t_data-t_infomask |= HEAP_XMAX_INVALID; - HeapTupleHeaderSetXmin(tup-t_data, xid); HeapTupleHeaderSetCmin(tup-t_data, cid); HeapTupleHeaderSetXmax(tup-t_data, 0); /* for cleanliness */ tup-t_tableOid = RelationGetRelid(relation); /* + * If we are inserting into a new relation invisible as yet to other + * backends and our session has no prior snapshots and no ready portals + * then we can also set the hint bit for the rows we are inserting. The + * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives + * the right answer if the current transaction inspects the data after + * we load it. + */ + if (options HEAP_INSERT_HINT_XMIN) + { + tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED; + HeapTupleHeaderSetXmin(tup-t_data, FrozenTransactionId); + } + else + HeapTupleHeaderSetXmin(tup-t_data, xid); + + /* * If the new tuple is too big for storage or contains already toasted * out-of-line attributes from some other relation, invoke the toaster. */ diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 110480f..a34f072 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -43,6 +43,7 @@ #include utils/builtins.h #include utils/lsyscache.h #include utils/memutils.h +#include utils/portal.h #include utils/rel.h #include utils/snapmgr.h @@ -1922,6 +1923,21 @@ CopyFrom(CopyState cstate) hi_options |= HEAP_INSERT_SKIP_FSM; if (!XLogIsNeeded()) hi_options |= HEAP_INSERT_SKIP_WAL; + + if (ThereAreNoPriorRegisteredSnapshots() + ThereAreNoReadyPortals()) + { + SubTransactionId currxid = GetCurrentSubTransactionId(); + + /* + * If the relfilenode has been created in this subtransaction + * then we can further optimise the data load by setting hint + * bits and pre-freezing tuples. + */ + if (cstate-rel-rd_createSubid == currxid || +cstate-rel-rd_newRelfilenodeSubid == currxid) +hi_options |= HEAP_INSERT_HINT_XMIN; + } } /* diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c index cfb73c1..24075db 100644 --- a/src/backend/utils/mmgr/portalmem.c +++ b/src/backend/utils/mmgr/portalmem.c @@ -1055,3 +1055,22 @@ pg_cursor(PG_FUNCTION_ARGS) return (Datum) 0; } + +bool
Re: [HACKERS] COPY with hints, rebirth
On Fri, Mar 2, 2012 at 11:15 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Simon Riggs si...@2ndquadrant.com wrote: I like Marti's idea. At present, making his idea work could easily mean checksums sink, so not sure whether to attempt to make that work in detail. For my part, improving bulk load performance and TRUNCATE transactional semantics would trump checksums. If we have to defer one or the other to a later release, I would prefer that we bump checksums. While I understand the desire for checksums, and understand others have had situations where they would have helped, so far I have yet to run into a situation where I feel they would have helped me. The hint bit and freeze issues require ongoing attention and have real performance consequences on a regular basis. And closing the window for odd transactional semantics on TRUNCATE versus DELETE of all rows in a table would be one less thing to worry about, in my world. Since you supported this, I've invested the time to make it work. It doesn't look like it needs to be either-or. Please review the safe_truncate.v2.patch and copy_autofrozen.v359.patch, copied here to assist testing and inspection. At present those patches handle only the TRUNCATE/COPY optimisation but we could easily add CTAS, CREATE/COPY, CLUSTERVACUUM FULL etc.. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index c910863..b891327 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2050,12 +2050,27 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, tup-t_data-t_infomask = ~(HEAP_XACT_MASK); tup-t_data-t_infomask2 = ~(HEAP2_XACT_MASK); tup-t_data-t_infomask |= HEAP_XMAX_INVALID; - HeapTupleHeaderSetXmin(tup-t_data, xid); HeapTupleHeaderSetCmin(tup-t_data, cid); HeapTupleHeaderSetXmax(tup-t_data, 0); /* for cleanliness */ tup-t_tableOid = RelationGetRelid(relation); /* + * If we are inserting into a new relation invisible as yet to other + * backends and our session has no prior snapshots and no ready portals + * then we can also set the hint bit for the rows we are inserting. The + * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives + * the right answer if the current transaction inspects the data after + * we load it. + */ + if (options HEAP_INSERT_HINT_XMIN) + { + tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED; + HeapTupleHeaderSetXmin(tup-t_data, FrozenTransactionId); + } + else + HeapTupleHeaderSetXmin(tup-t_data, xid); + + /* * If the new tuple is too big for storage or contains already toasted * out-of-line attributes from some other relation, invoke the toaster. */ diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 110480f..a34f072 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -43,6 +43,7 @@ #include utils/builtins.h #include utils/lsyscache.h #include utils/memutils.h +#include utils/portal.h #include utils/rel.h #include utils/snapmgr.h @@ -1922,6 +1923,21 @@ CopyFrom(CopyState cstate) hi_options |= HEAP_INSERT_SKIP_FSM; if (!XLogIsNeeded()) hi_options |= HEAP_INSERT_SKIP_WAL; + + if (ThereAreNoPriorRegisteredSnapshots() + ThereAreNoReadyPortals()) + { + SubTransactionId currxid = GetCurrentSubTransactionId(); + + /* + * If the relfilenode has been created in this subtransaction + * then we can further optimise the data load by setting hint + * bits and pre-freezing tuples. + */ + if (cstate-rel-rd_createSubid == currxid || +cstate-rel-rd_newRelfilenodeSubid == currxid) +hi_options |= HEAP_INSERT_HINT_XMIN; + } } /* diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c index cfb73c1..24075db 100644 --- a/src/backend/utils/mmgr/portalmem.c +++ b/src/backend/utils/mmgr/portalmem.c @@ -1055,3 +1055,22 @@ pg_cursor(PG_FUNCTION_ARGS) return (Datum) 0; } + +bool +ThereAreNoReadyPortals(void) +{ + HASH_SEQ_STATUS status; + PortalHashEnt *hentry; + + hash_seq_init(status, PortalHashTable); + + while ((hentry = (PortalHashEnt *) hash_seq_search(status)) != NULL) + { + Portal portal = hentry-portal; + + if (portal-status == PORTAL_READY) + return false; + } + + return true; +} diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c index 5aebbd1..5d9e3bf 100644 --- a/src/backend/utils/time/snapmgr.c +++ b/src/backend/utils/time/snapmgr.c @@ -1183,3 +1183,12 @@ DeleteAllExportedSnapshotFiles(void) FreeDir(s_dir); } + +bool +ThereAreNoPriorRegisteredSnapshots(void) +{ + if (RegisteredSnapshots = 1) + return true; + + return false; +} diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index fa38803..0381785 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -26,6
Re: [HACKERS] COPY with hints, rebirth
Simon Riggs wrote: Kevin Grittner wrote: Simon Riggs wrote: I like Marti's idea. At present, making his idea work could easily mean checksums sink For my part, improving bulk load performance and TRUNCATE transactional semantics would trump checksums It doesn't look like it needs to be either-or. Great news! Please review the safe_truncate.v2.patch and copy_autofrozen.v359.patch, copied here to assist testing and inspection. I'll look at it today. At present those patches handle only the TRUNCATE/COPY optimisation but we could easily add CTAS, CREATE/COPY, CLUSTERVACUUM FULL etc.. CREATE/COPY would be important so that pg_dump | psql -1 would benefit. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 01.03.2012 18:40, Simon Riggs wrote: On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 24.02.2012 22:55, Simon Riggs wrote: What exactly does it do? Previously, we optimised COPY when it was loading data into a newly created table or a freshly truncated table. This patch extends that and actually sets the tuple header flag as HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of code. The patch also adds some tests for corner cases that would make that action break MVCC - though those cases are minor and typical data loads will benefit fully from this. This doesn't work with subtransactions: ... The query should return the row copied in the same subtransaction. Thanks for pointing that out. New patch with corrected logic and test case attached. It's still broken: Agreed. Thanks for your detailed input. Performance test results show that playing with XidInMVCCSnapshot() causes a small but growing performance regression that I find unacceptable, so while I might fix the above, I doubt if I can improve this as well. So this approach isn't the one... The COPY FREEZE patch provides a way for the user to say explicitly that they don't really care about these MVCC corner cases and as a result allows us to avoid touching XidInMVCCSnapshot() at all. So there is still a patch on the table. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
On Fri, Mar 02, 2012 at 08:46:45AM +, Simon Riggs wrote: On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: It's still broken: [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO] So this approach isn't the one... The COPY FREEZE patch provides a way for the user to say explicitly that they don't really care about these MVCC corner cases and as a result allows us to avoid touching XidInMVCCSnapshot() at all. So there is still a patch on the table. You can salvage the optimization by tightening its prerequisite: use it when the current subtransaction or a child thereof created or truncated the table. A parent subtransaction having done so is acceptable for the WAL avoidance optimization but not for this. A COPY FREEZE ignoring that stronger restriction would be an interesting feature in its own right, but it brings up other problems. For example, suppose you write a tuple, then fail while writing its index entries. The tuple is already frozen and visible, but it lacks a full set of live index entries. The subtransaction aborts, but the top transaction commits and makes the situation permanent. Incidentally, I contend that we should write frozen tuples to new/truncated tables unconditionally. The current behavior of making old snapshots see the table as empty violates atomicity at least as badly as letting those snapshots see the future-snapshot contents. But Marti has a sound proposal that would interact with your efforts here to avoid violating atomicity at all: http://archives.postgresql.org/message-id/cabrt9rbrmdsoz8kxgehfb4lg-ev9u67-6dlqvoiibpkkhtl...@mail.gmail.com Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
Noah Misch n...@leadboat.com wrote: Incidentally, I contend that we should write frozen tuples to new/truncated tables unconditionally. +1 The current behavior of making old snapshots see the table as empty violates atomicity at least as badly as letting those snapshots see the future-snapshot contents. Right, there was no point where the table existed as empty at the end of a transaction, so it is quite broken as things stand now. But Marti has a sound proposal that would interact with your efforts here to avoid violating atomicity at all Well, getting it right is certainly better than moving from a slow non-conforming behavior to a fast non-conforming behavior. ;-) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch n...@leadboat.com wrote: On Fri, Mar 02, 2012 at 08:46:45AM +, Simon Riggs wrote: On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: It's still broken: [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO] So this approach isn't the one... The COPY FREEZE patch provides a way for the user to say explicitly that they don't really care about these MVCC corner cases and as a result allows us to avoid touching XidInMVCCSnapshot() at all. So there is still a patch on the table. You can salvage the optimization by tightening its prerequisite: use it when the current subtransaction or a child thereof created or truncated the table. A parent subtransaction having done so is acceptable for the WAL avoidance optimization but not for this. That's already what it does. The problem is what happens after the COPY. If we said you can't see the rows you just loaded it would be problem over, but in my opinion that needs an explicit request from the user to show they accept that. A COPY FREEZE ignoring that stronger restriction would be an interesting feature in its own right, but it brings up other problems. For example, suppose you write a tuple, then fail while writing its index entries. The tuple is already frozen and visible, but it lacks a full set of live index entries. The subtransaction aborts, but the top transaction commits and makes the situation permanent. The optimisation only works in the subtransaction that created the relfilenode so that isn't an issue. Thanks for your input. Incidentally, I contend that we should write frozen tuples to new/truncated tables unconditionally. The current behavior of making old snapshots see the table as empty violates atomicity at least as badly as letting those snapshots see the future-snapshot contents. But Marti has a sound proposal that would interact with your efforts here to avoid violating atomicity at all: http://archives.postgresql.org/message-id/cabrt9rbrmdsoz8kxgehfb4lg-ev9u67-6dlqvoiibpkkhtl...@mail.gmail.com Personally, that seems a pretty reasonable thing. I like Marti's idea. At present, making his idea work could easily mean checksums sink, so not sure whether to attempt to make that work in detail. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
Simon Riggs si...@2ndquadrant.com wrote: I like Marti's idea. At present, making his idea work could easily mean checksums sink, so not sure whether to attempt to make that work in detail. For my part, improving bulk load performance and TRUNCATE transactional semantics would trump checksums. If we have to defer one or the other to a later release, I would prefer that we bump checksums. While I understand the desire for checksums, and understand others have had situations where they would have helped, so far I have yet to run into a situation where I feel they would have helped me. The hint bit and freeze issues require ongoing attention and have real performance consequences on a regular basis. And closing the window for odd transactional semantics on TRUNCATE versus DELETE of all rows in a table would be one less thing to worry about, in my world. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 24.02.2012 22:55, Simon Riggs wrote: A long time ago, in a galaxy far away, we discussed ways to speed up data loads/COPY. http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php In particular, the idea that we could mark tuples as committed while we are still loading them, to avoid negative behaviour for the first reader. Simple patch to implement this is attached, together with test case. ... What exactly does it do? Previously, we optimised COPY when it was loading data into a newly created table or a freshly truncated table. This patch extends that and actually sets the tuple header flag as HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of code. The patch also adds some tests for corner cases that would make that action break MVCC - though those cases are minor and typical data loads will benefit fully from this. This doesn't work with subtransactions: ... The query should return the row copied in the same subtransaction. Thanks for pointing that out. New patch with corrected logic and test case attached. In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC() and adding current xid to snapshots. That is an invasive change that I would wish to avoid at any time and explains the long delay in tackling this. The way I've implemented it, is just as a short test during XidInMVCCSnapshot() so that we trap the case when the xid == xmax and so would appear to be running. This is much less invasive and just as performant as Tom's original suggestion. TransactionIdIsCurrentTransactionId() can be fairly expensive if you have a lot of subtransactions open... I've put in something to avoid that cost for the common case - just a boolean. This seems like the best plan rather than the explicit FREEZE option. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index c910863..3899282 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2056,6 +2056,17 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, tup-t_tableOid = RelationGetRelid(relation); /* + * If we are inserting into a new relation invisible as yet to other + * backends and our session has no prior snapshots and no ready portals + * then we can also set the hint bit for the rows we are inserting. The + * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives + * the right answer if the current transaction inspects the data after + * we load it. + */ + if (options HEAP_INSERT_HINT_XMIN) + tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED; + + /* * If the new tuple is too big for storage or contains already toasted * out-of-line attributes from some other relation, invoke the toaster. */ diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index e22bdac..de7504c 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -146,6 +146,7 @@ typedef struct TransactionStateData int prevSecContext; /* previous SecurityRestrictionContext */ bool prevXactReadOnly; /* entry-time xact r/o state */ bool startedInRecovery; /* did we start in recovery? */ + bool maySeePreHintedTuples; /* did subtrans write pre-hinted rows? */ struct TransactionStateData *parent; /* back link to parent */ } TransactionStateData; @@ -175,6 +176,7 @@ static TransactionStateData TopTransactionStateData = { 0, /* previous SecurityRestrictionContext */ false, /* entry-time xact r/o state */ false, /* startedInRecovery */ + false, /* maySeePreHintedTuples */ NULL /* link to parent state block */ }; @@ -704,6 +706,18 @@ TransactionStartedDuringRecovery(void) return CurrentTransactionState-startedInRecovery; } +bool +TransactionMaySeePreHintedTuples(void) +{ + return CurrentTransactionState-maySeePreHintedTuples; +} + +void +TransactionMayWritePreHintedTuples(void) +{ + CurrentTransactionState-maySeePreHintedTuples = true; +} + /* * CommandCounterIncrement */ @@ -1689,6 +1703,7 @@ StartTransaction(void) s-startedInRecovery = false; XactReadOnly = DefaultXactReadOnly; } + s-maySeePreHintedTuples = false; XactDeferrable = DefaultXactDeferrable; XactIsoLevel = DefaultXactIsoLevel; forceSyncCommit = false; diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 110480f..1419e33 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -43,6 +43,7 @@ #include utils/builtins.h #include utils/lsyscache.h #include utils/memutils.h +#include utils/portal.h #include utils/rel.h #include utils/snapmgr.h @@ -1922,6 +1923,13 @@ CopyFrom(CopyState cstate) hi_options |= HEAP_INSERT_SKIP_FSM; if
Re: [HACKERS] COPY with hints, rebirth
On 01.03.2012 18:40, Simon Riggs wrote: On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 24.02.2012 22:55, Simon Riggs wrote: What exactly does it do? Previously, we optimised COPY when it was loading data into a newly created table or a freshly truncated table. This patch extends that and actually sets the tuple header flag as HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of code. The patch also adds some tests for corner cases that would make that action break MVCC - though those cases are minor and typical data loads will benefit fully from this. This doesn't work with subtransactions: ... The query should return the row copied in the same subtransaction. Thanks for pointing that out. New patch with corrected logic and test case attached. It's still broken: -- create test table and file create table a as select 1 as id; copy a to '/tmp/a'; -- start test postgres=# begin; BEGIN postgres=# truncate a; TRUNCATE TABLE postgres=# savepoint sp1; SAVEPOINT postgres=# copy a from '/tmp/a'; COPY 1 postgres=# select * from a; id 1 (1 row) postgres=# rollback to savepoint sp1; ROLLBACK postgres=# select * from a; id 1 (1 row) That last select should not have seen the tuple. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
On Sat, Feb 25, 2012 at 6:58 PM, Simon Riggs si...@2ndquadrant.com wrote: On Sat, Feb 25, 2012 at 6:24 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Simon Riggs si...@2ndquadrant.com wrote: This patch extends that and actually sets the tuple header flag as HEAP_XMIN_COMMITTED during the load. Fantastic! I think we could add that as an option on COPY, since breaking MVCC in that way is only a bad thing if it happens accidentally without the user's permission and knowledge - which they may wish to give in many cases, such as reloading a database from a dump. I've coded up COPY FREEZE to do just that. This version doesn't require any changes to transaction machinery. But it is very effective at avoiding 4 out of the 5 writes you mention. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index a73b022..4912449 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -34,6 +34,7 @@ COPY { replaceable class=parametertable_name/replaceable [ ( replaceable FORMAT replaceable class=parameterformat_name/replaceable OIDS [ replaceable class=parameterboolean/replaceable ] +FREEZE [ replaceable class=parameterboolean/replaceable ] DELIMITER 'replaceable class=parameterdelimiter_character/replaceable' NULL 'replaceable class=parameternull_string/replaceable' HEADER [ replaceable class=parameterboolean/replaceable ] @@ -181,6 +182,23 @@ COPY { replaceable class=parametertable_name/replaceable [ ( replaceable /varlistentry varlistentry +termliteralFREEZE/literal/term +listitem + para + Specifies copying the data with rows already frozen, just as they + would be after running the commandVACUUM FREEZE/ command. + This only occurs if the table being loaded has been created in this + subtransaction, there are no cursors open and there are no older + snapshots held by this transaction. + Note that all sessions will immediately be able to see the data + if it has been successfully loaded, which specifically operates + outside of the normal definitions of MVCC. This is a performance + option intended for use only during initial data loads. + /para +/listitem + /varlistentry + + varlistentry termliteralDELIMITER/literal/term listitem para diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index c910863..68534bf 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2050,7 +2050,13 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, tup-t_data-t_infomask = ~(HEAP_XACT_MASK); tup-t_data-t_infomask2 = ~(HEAP2_XACT_MASK); tup-t_data-t_infomask |= HEAP_XMAX_INVALID; - HeapTupleHeaderSetXmin(tup-t_data, xid); + if (options HEAP_INSERT_FREEZE) + { + HeapTupleHeaderSetXmin(tup-t_data, FrozenTransactionId); + tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED; + } + else + HeapTupleHeaderSetXmin(tup-t_data, xid); HeapTupleHeaderSetCmin(tup-t_data, cid); HeapTupleHeaderSetXmax(tup-t_data, 0); /* for cleanliness */ tup-t_tableOid = RelationGetRelid(relation); diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 110480f..ec0bed8 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -43,6 +43,7 @@ #include utils/builtins.h #include utils/lsyscache.h #include utils/memutils.h +#include utils/portal.h #include utils/rel.h #include utils/snapmgr.h @@ -108,6 +109,7 @@ typedef struct CopyStateData char *filename; /* filename, or NULL for STDIN/STDOUT */ bool binary; /* binary format? */ bool oids; /* include OIDs? */ + bool freeze; /* freeze rows on loading? */ bool csv_mode; /* Comma Separated Value format? */ bool header_line; /* CSV header line? */ char *null_print; /* NULL marker string (server encoding!) */ @@ -889,6 +891,14 @@ ProcessCopyOptions(CopyState cstate, errmsg(conflicting or redundant options))); cstate-oids = defGetBoolean(defel); } + else if (strcmp(defel-defname, freeze) == 0) + { + if (cstate-freeze) +ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg(conflicting or redundant options))); + cstate-freeze = defGetBoolean(defel); + } else if (strcmp(defel-defname, delimiter) == 0) { if (cstate-delim) @@ -1922,6 +1932,11 @@ CopyFrom(CopyState cstate) hi_options |= HEAP_INSERT_SKIP_FSM; if (!XLogIsNeeded()) hi_options |= HEAP_INSERT_SKIP_WAL; + + if (cstate-freeze + ThereAreNoPriorRegisteredSnapshots() + ThereAreNoReadyPortals()) + hi_options |= HEAP_INSERT_FREEZE; } /* diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index d1ce2ab..cd2b243 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -2315,6 +2315,10 @@ copy_opt_item:
Re: [HACKERS] COPY with hints, rebirth
On Wed, Feb 29, 2012 at 11:14 AM, Simon Riggs si...@2ndquadrant.com wrote: But it is very effective at avoiding 4 out of the 5 writes you mention. For the common case, would we not want to have (1) [WAL] and (2) [Writing the pre-frozen tuple]? If we only write the tuple (2), and don't capture WAL, then the COPY wouldn't be replicable, right? -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
On Wed, Feb 29, 2012 at 6:14 PM, Christopher Browne cbbro...@gmail.com wrote: On Wed, Feb 29, 2012 at 11:14 AM, Simon Riggs si...@2ndquadrant.com wrote: But it is very effective at avoiding 4 out of the 5 writes you mention. For the common case, would we not want to have (1) [WAL] and (2) [Writing the pre-frozen tuple]? If we only write the tuple (2), and don't capture WAL, then the COPY wouldn't be replicable, right? Well, my answer is a question: how would you like it to work? The way I coded it is that it will still write WAL if wal_level is set, so it would be replicable. So it only works when writing to a newly created table but is otherwise separate to whether WAL is skipped. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
On 24.02.2012 22:55, Simon Riggs wrote: A long time ago, in a galaxy far away, we discussed ways to speed up data loads/COPY. http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php In particular, the idea that we could mark tuples as committed while we are still loading them, to avoid negative behaviour for the first reader. Simple patch to implement this is attached, together with test case. ... What exactly does it do? Previously, we optimised COPY when it was loading data into a newly created table or a freshly truncated table. This patch extends that and actually sets the tuple header flag as HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of code. The patch also adds some tests for corner cases that would make that action break MVCC - though those cases are minor and typical data loads will benefit fully from this. This doesn't work with subtransactions: postgres=# create table a as select 1 as id; SELECT 1 postgres=# copy a to '/tmp/a'; COPY 1 postgres=# begin; BEGIN postgres=# truncate a; TRUNCATE TABLE postgres=# savepoint sp1; SAVEPOINT postgres=# copy a from '/tmp/a'; COPY 1 postgres=# select * from a; id (0 rows) The query should return the row copied in the same subtransaction. In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC() and adding current xid to snapshots. That is an invasive change that I would wish to avoid at any time and explains the long delay in tackling this. The way I've implemented it, is just as a short test during XidInMVCCSnapshot() so that we trap the case when the xid == xmax and so would appear to be running. This is much less invasive and just as performant as Tom's original suggestion. TransactionIdIsCurrentTransactionId() can be fairly expensive if you have a lot of subtransactions open... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
Simon Riggs si...@2ndquadrant.com wrote: This patch extends that and actually sets the tuple header flag as HEAP_XMIN_COMMITTED during the load. Fantastic! So, without bulk-load conditions, a long-lived tuple in PostgreSQL is written to disk at least five times[1]: (1) The WAL record for the inserted tuple is written. (2) The inserted tuple is written. (3) The HEAP_XMIN_COMMITTED bit is set and the tuple is re-written in place some time after the inserting transaction's COMMIT. (4) The WAL record for the freeze in write 5 is written. (5) The xmin is set to frozen and the tuple is rewritten in place some time after every other connection can see it. Prior to your patch, bulk load omitted write 1. With your patch we will also omit write 3. Since you've just been looking at this area, do you have any thoughts about writes 4 and 5 being rendered unnecessary by writing bulk-loaded tuples with a frozen xmin, and having transactions with a snapshot which doesn't include the bulk load's transaction just not seeing the table? (Or am I just dreaming about those?) -Kevin [1] If you are archiving, it could be more. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
On Sat, Feb 25, 2012 at 6:24 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Simon Riggs si...@2ndquadrant.com wrote: This patch extends that and actually sets the tuple header flag as HEAP_XMIN_COMMITTED during the load. Fantastic! So, without bulk-load conditions, a long-lived tuple in PostgreSQL is written to disk at least five times[1]: (1) The WAL record for the inserted tuple is written. (2) The inserted tuple is written. (3) The HEAP_XMIN_COMMITTED bit is set and the tuple is re-written in place some time after the inserting transaction's COMMIT. (4) The WAL record for the freeze in write 5 is written. (5) The xmin is set to frozen and the tuple is rewritten in place some time after every other connection can see it. Prior to your patch, bulk load omitted write 1. With your patch we will also omit write 3. Yes, well explained. Since you've just been looking at this area, do you have any thoughts about writes 4 and 5 being rendered unnecessary by writing bulk-loaded tuples with a frozen xmin, and having transactions with a snapshot which doesn't include the bulk load's transaction just not seeing the table? (Or am I just dreaming about those?) Setting straight to frozen breaks MVCC, unless/until we use MVCC for catalog access because we can see the table immediately and then read the contents as if they had always been there. I think we could add that as an option on COPY, since breaking MVCC in that way is only a bad thing if it happens accidentally without the user's permission and knowledge - which they may wish to give in many cases, such as reloading a database from a dump. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] COPY with hints, rebirth
A long time ago, in a galaxy far away, we discussed ways to speed up data loads/COPY. http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php In particular, the idea that we could mark tuples as committed while we are still loading them, to avoid negative behaviour for the first reader. Simple patch to implement this is attached, together with test case. Current behaviour is shown here Run COPY and then... SELECT count(*) FROM table with no indexes 1st SELECT Time: 1518.571 ms --- slowed dramatically by setting hint bits 2nd SELECT Time: 914.141 ms 3rd SELECT Time: 914.921 ms With this patch I observed the following results 1st SELECT Time: 890.820 ms 2nd SELECT Time: 884.799 ms 3rd SELECT Time: 882.405 ms What exactly does it do? Previously, we optimised COPY when it was loading data into a newly created table or a freshly truncated table. This patch extends that and actually sets the tuple header flag as HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of code. The patch also adds some tests for corner cases that would make that action break MVCC - though those cases are minor and typical data loads will benefit fully from this. In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC() and adding current xid to snapshots. That is an invasive change that I would wish to avoid at any time and explains the long delay in tackling this. The way I've implemented it, is just as a short test during XidInMVCCSnapshot() so that we trap the case when the xid == xmax and so would appear to be running. This is much less invasive and just as performant as Tom's original suggestion. Why do we need this now? Setting checksums on page requires us to write WAL for hints, so the situation of the 1st SELECT after a load would get somewhat worse when page_checksums are enabled, but we already know there is a price. However, this is a situation we can solve, and add value for all cases, not just when checksums enabled. So I'm posting this as a separate patch rather than including that as a tuning feature of the checksums patch. Your input will be generously received, -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index c910863..3899282 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2056,6 +2056,17 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, tup-t_tableOid = RelationGetRelid(relation); /* + * If we are inserting into a new relation invisible as yet to other + * backends and our session has no prior snapshots and no ready portals + * then we can also set the hint bit for the rows we are inserting. The + * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives + * the right answer if the current transaction inspects the data after + * we load it. + */ + if (options HEAP_INSERT_HINT_XMIN) + tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED; + + /* * If the new tuple is too big for storage or contains already toasted * out-of-line attributes from some other relation, invoke the toaster. */ diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 110480f..6a60eb8 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -43,6 +43,7 @@ #include utils/builtins.h #include utils/lsyscache.h #include utils/memutils.h +#include utils/portal.h #include utils/rel.h #include utils/snapmgr.h @@ -1922,6 +1923,10 @@ CopyFrom(CopyState cstate) hi_options |= HEAP_INSERT_SKIP_FSM; if (!XLogIsNeeded()) hi_options |= HEAP_INSERT_SKIP_WAL; + + if (ThereAreNoPriorRegisteredSnapshots() + ThereAreNoReadyPortals()) + hi_options |= HEAP_INSERT_HINT_XMIN; } /* diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c index cfb73c1..24075db 100644 --- a/src/backend/utils/mmgr/portalmem.c +++ b/src/backend/utils/mmgr/portalmem.c @@ -1055,3 +1055,22 @@ pg_cursor(PG_FUNCTION_ARGS) return (Datum) 0; } + +bool +ThereAreNoReadyPortals(void) +{ + HASH_SEQ_STATUS status; + PortalHashEnt *hentry; + + hash_seq_init(status, PortalHashTable); + + while ((hentry = (PortalHashEnt *) hash_seq_search(status)) != NULL) + { + Portal portal = hentry-portal; + + if (portal-status == PORTAL_READY) + return false; + } + + return true; +} diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c index 5aebbd1..5d9e3bf 100644 --- a/src/backend/utils/time/snapmgr.c +++ b/src/backend/utils/time/snapmgr.c @@ -1183,3 +1183,12 @@ DeleteAllExportedSnapshotFiles(void) FreeDir(s_dir); } + +bool +ThereAreNoPriorRegisteredSnapshots(void) +{ + if (RegisteredSnapshots = 1) + return true; + + return false; +} diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c index 31791a7..a2e5524 100644 --- a/src/backend/utils/time/tqual.c +++