Re: [HACKERS] COPY with hints, rebirth

2012-03-03 Thread Simon Riggs
On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch n...@leadboat.com wrote:
 On Fri, Mar 02, 2012 at 08:46:45AM +, Simon Riggs wrote:
 On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas 
 heikki.linnakan...@enterprisedb.com wrote:
  It's still broken:

 [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO]

 So this approach isn't the one...

 The COPY FREEZE patch provides a way for the user to say explicitly
 that they don't really care about these MVCC corner cases and as a
 result allows us to avoid touching XidInMVCCSnapshot() at all. So
 there is still a patch on the table.

 You can salvage the optimization by tightening its prerequisite: use it when
 the current subtransaction or a child thereof created or truncated the table.
 A parent subtransaction having done so is acceptable for the WAL avoidance
 optimization but not for this.

I misread your earlier comment. Yes, that will make this work correctly.

 Incidentally, I contend that we should write frozen tuples to new/truncated
 tables unconditionally.  The current behavior of making old snapshots see the
 table as empty violates atomicity at least as badly as letting those snapshots
 see the future-snapshot contents.  But Marti has a sound proposal that would
 interact with your efforts here to avoid violating atomicity at all:
 http://archives.postgresql.org/message-id/cabrt9rbrmdsoz8kxgehfb4lg-ev9u67-6dlqvoiibpkkhtl...@mail.gmail.com

Thank you for bringing this to my attention.

This will make this optimisation work correctly without adding
anything to the main code path of XidInMVCCSnapshot() and without the
annoying FREEZE syntax. So this puts the patch squarely back on the
table.

I'll do another version of this later today designed to work with the
StrictMVCC patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-03 Thread Simon Riggs
On Sat, Mar 3, 2012 at 1:14 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch n...@leadboat.com wrote:
 On Fri, Mar 02, 2012 at 08:46:45AM +, Simon Riggs wrote:
 On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas 
 heikki.linnakan...@enterprisedb.com wrote:
  It's still broken:

 [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO]

 So this approach isn't the one...

 The COPY FREEZE patch provides a way for the user to say explicitly
 that they don't really care about these MVCC corner cases and as a
 result allows us to avoid touching XidInMVCCSnapshot() at all. So
 there is still a patch on the table.

 You can salvage the optimization by tightening its prerequisite: use it when
 the current subtransaction or a child thereof created or truncated the table.
 A parent subtransaction having done so is acceptable for the WAL avoidance
 optimization but not for this.

 I misread your earlier comment. Yes, that will make this work correctly.

 Incidentally, I contend that we should write frozen tuples to new/truncated
 tables unconditionally.  The current behavior of making old snapshots see the
 table as empty violates atomicity at least as badly as letting those 
 snapshots
 see the future-snapshot contents.  But Marti has a sound proposal that would
 interact with your efforts here to avoid violating atomicity at all:
 http://archives.postgresql.org/message-id/cabrt9rbrmdsoz8kxgehfb4lg-ev9u67-6dlqvoiibpkkhtl...@mail.gmail.com

 Thank you for bringing this to my attention.

 This will make this optimisation work correctly without adding
 anything to the main code path of XidInMVCCSnapshot() and without the
 annoying FREEZE syntax. So this puts the patch squarely back on the
 table.

 I'll do another version of this later today designed to work with the
 StrictMVCC patch.

OK, so latest version of patch handling the subtransaction issues
correctly. Thanks to Noah and Heikki for identifying them and hitting
me with the clue stick.

So this version of patch has negligible effect on mainline performance
though leaves newly loaded data visible in ways that would break MVCC.
So this patch relies on the safe_truncate.v2.patch posted on separate
thread.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c910863..b891327 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2050,12 +2050,27 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup-t_data-t_infomask = ~(HEAP_XACT_MASK);
 	tup-t_data-t_infomask2 = ~(HEAP2_XACT_MASK);
 	tup-t_data-t_infomask |= HEAP_XMAX_INVALID;
-	HeapTupleHeaderSetXmin(tup-t_data, xid);
 	HeapTupleHeaderSetCmin(tup-t_data, cid);
 	HeapTupleHeaderSetXmax(tup-t_data, 0);		/* for cleanliness */
 	tup-t_tableOid = RelationGetRelid(relation);
 
 	/*
+	 * If we are inserting into a new relation invisible as yet to other
+	 * backends and our session has no prior snapshots and no ready portals
+	 * then we can also set the hint bit for the rows we are inserting. The
+	 * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives
+	 * the right answer if the current transaction inspects the data after
+	 * we load it.
+	 */
+	if (options  HEAP_INSERT_HINT_XMIN)
+	{
+		tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED;
+		HeapTupleHeaderSetXmin(tup-t_data, FrozenTransactionId);
+	}
+	else
+		HeapTupleHeaderSetXmin(tup-t_data, xid);
+
+	/*
 	 * If the new tuple is too big for storage or contains already toasted
 	 * out-of-line attributes from some other relation, invoke the toaster.
 	 */
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 110480f..a34f072 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -43,6 +43,7 @@
 #include utils/builtins.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/portal.h
 #include utils/rel.h
 #include utils/snapmgr.h
 
@@ -1922,6 +1923,21 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_SKIP_FSM;
 		if (!XLogIsNeeded())
 			hi_options |= HEAP_INSERT_SKIP_WAL;
+
+		if (ThereAreNoPriorRegisteredSnapshots() 
+			ThereAreNoReadyPortals())
+		{
+			SubTransactionId	currxid = GetCurrentSubTransactionId();
+
+			/*
+			 * If the relfilenode has been created in this subtransaction
+			 * then we can further optimise the data load by setting hint
+			 * bits and pre-freezing tuples.
+			 */
+			if (cstate-rel-rd_createSubid == currxid ||
+cstate-rel-rd_newRelfilenodeSubid == currxid)
+hi_options |= HEAP_INSERT_HINT_XMIN;
+		}
 	}
 
 	/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index cfb73c1..24075db 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -1055,3 +1055,22 @@ pg_cursor(PG_FUNCTION_ARGS)
 
 	return (Datum) 0;
 }
+
+bool

Re: [HACKERS] COPY with hints, rebirth

2012-03-03 Thread Simon Riggs
On Fri, Mar 2, 2012 at 11:15 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Simon Riggs si...@2ndquadrant.com wrote:

 I like Marti's idea. At present, making his idea work could easily
 mean checksums sink, so not sure whether to attempt to make that
 work in detail.

 For my part, improving bulk load performance and TRUNCATE
 transactional semantics would trump checksums.  If we have to defer
 one or the other to a later release, I would prefer that we bump
 checksums.

 While I understand the desire for checksums, and understand others
 have had situations where they would have helped, so far I have yet
 to run into a situation where I feel they would have helped me.  The
 hint bit and freeze issues require ongoing attention and have real
 performance consequences on a regular basis.  And closing the window
 for odd transactional semantics on TRUNCATE versus DELETE of all
 rows in a table would be one less thing to worry about, in my world.

Since you supported this, I've invested the time to make it work. It
doesn't look like it needs to be either-or.

Please review the safe_truncate.v2.patch and
copy_autofrozen.v359.patch, copied here to assist testing and
inspection.

At present those patches handle only the TRUNCATE/COPY optimisation
but we could easily add CTAS, CREATE/COPY, CLUSTERVACUUM FULL etc..

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c910863..b891327 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2050,12 +2050,27 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup-t_data-t_infomask = ~(HEAP_XACT_MASK);
 	tup-t_data-t_infomask2 = ~(HEAP2_XACT_MASK);
 	tup-t_data-t_infomask |= HEAP_XMAX_INVALID;
-	HeapTupleHeaderSetXmin(tup-t_data, xid);
 	HeapTupleHeaderSetCmin(tup-t_data, cid);
 	HeapTupleHeaderSetXmax(tup-t_data, 0);		/* for cleanliness */
 	tup-t_tableOid = RelationGetRelid(relation);
 
 	/*
+	 * If we are inserting into a new relation invisible as yet to other
+	 * backends and our session has no prior snapshots and no ready portals
+	 * then we can also set the hint bit for the rows we are inserting. The
+	 * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives
+	 * the right answer if the current transaction inspects the data after
+	 * we load it.
+	 */
+	if (options  HEAP_INSERT_HINT_XMIN)
+	{
+		tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED;
+		HeapTupleHeaderSetXmin(tup-t_data, FrozenTransactionId);
+	}
+	else
+		HeapTupleHeaderSetXmin(tup-t_data, xid);
+
+	/*
 	 * If the new tuple is too big for storage or contains already toasted
 	 * out-of-line attributes from some other relation, invoke the toaster.
 	 */
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 110480f..a34f072 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -43,6 +43,7 @@
 #include utils/builtins.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/portal.h
 #include utils/rel.h
 #include utils/snapmgr.h
 
@@ -1922,6 +1923,21 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_SKIP_FSM;
 		if (!XLogIsNeeded())
 			hi_options |= HEAP_INSERT_SKIP_WAL;
+
+		if (ThereAreNoPriorRegisteredSnapshots() 
+			ThereAreNoReadyPortals())
+		{
+			SubTransactionId	currxid = GetCurrentSubTransactionId();
+
+			/*
+			 * If the relfilenode has been created in this subtransaction
+			 * then we can further optimise the data load by setting hint
+			 * bits and pre-freezing tuples.
+			 */
+			if (cstate-rel-rd_createSubid == currxid ||
+cstate-rel-rd_newRelfilenodeSubid == currxid)
+hi_options |= HEAP_INSERT_HINT_XMIN;
+		}
 	}
 
 	/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index cfb73c1..24075db 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -1055,3 +1055,22 @@ pg_cursor(PG_FUNCTION_ARGS)
 
 	return (Datum) 0;
 }
+
+bool
+ThereAreNoReadyPortals(void)
+{
+	HASH_SEQ_STATUS status;
+	PortalHashEnt *hentry;
+
+	hash_seq_init(status, PortalHashTable);
+
+	while ((hentry = (PortalHashEnt *) hash_seq_search(status)) != NULL)
+	{
+		Portal		portal = hentry-portal;
+
+		if (portal-status == PORTAL_READY)
+			return false;
+	}
+
+	return true;
+}
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 5aebbd1..5d9e3bf 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1183,3 +1183,12 @@ DeleteAllExportedSnapshotFiles(void)
 
 	FreeDir(s_dir);
 }
+
+bool
+ThereAreNoPriorRegisteredSnapshots(void)
+{
+	if (RegisteredSnapshots = 1)
+		return true;
+
+	return false;
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fa38803..0381785 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -26,6 

Re: [HACKERS] COPY with hints, rebirth

2012-03-03 Thread Kevin Grittner
 Simon Riggs  wrote:
 Kevin Grittner  wrote:
 Simon Riggs  wrote:

 I like Marti's idea. At present, making his idea work could
 easily mean checksums sink
 
 For my part, improving bulk load performance and TRUNCATE
 transactional semantics would trump checksums
 
 It doesn't look like it needs to be either-or.
 
Great news!
 
 Please review the safe_truncate.v2.patch and
 copy_autofrozen.v359.patch, copied here to assist testing and
 inspection.
 
I'll look at it today.
 
 At present those patches handle only the TRUNCATE/COPY optimisation
 but we could easily add CTAS, CREATE/COPY, CLUSTERVACUUM FULL etc..
 
CREATE/COPY would be important so that pg_dump | psql -1 would
benefit.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-02 Thread Simon Riggs
On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 01.03.2012 18:40, Simon Riggs wrote:

 On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com  wrote:

 On 24.02.2012 22:55, Simon Riggs wrote:


 What exactly does it do? Previously, we optimised COPY when it was
 loading data into a newly created table or a freshly truncated table.
 This patch extends that and actually sets the tuple header flag as
 HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
 code. The patch also adds some tests for corner cases that would make
 that action break MVCC - though those cases are minor and typical data
 loads will benefit fully from this.


 This doesn't work with subtransactions:

 ...

 The query should return the row copied in the same subtransaction.


 Thanks for pointing that out.

 New patch with corrected logic and test case attached.


 It's still broken:

Agreed. Thanks for your detailed input.

Performance test results show that playing with XidInMVCCSnapshot()
causes a small but growing performance regression that I find
unacceptable, so while I might fix the above, I doubt if I can improve
this as well.

So this approach isn't the one...

The COPY FREEZE patch provides a way for the user to say explicitly
that they don't really care about these MVCC corner cases and as a
result allows us to avoid touching XidInMVCCSnapshot() at all. So
there is still a patch on the table.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-02 Thread Noah Misch
On Fri, Mar 02, 2012 at 08:46:45AM +, Simon Riggs wrote:
 On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas 
 heikki.linnakan...@enterprisedb.com wrote:
  It's still broken:

[BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO]

 So this approach isn't the one...
 
 The COPY FREEZE patch provides a way for the user to say explicitly
 that they don't really care about these MVCC corner cases and as a
 result allows us to avoid touching XidInMVCCSnapshot() at all. So
 there is still a patch on the table.

You can salvage the optimization by tightening its prerequisite: use it when
the current subtransaction or a child thereof created or truncated the table.
A parent subtransaction having done so is acceptable for the WAL avoidance
optimization but not for this.

A COPY FREEZE ignoring that stronger restriction would be an interesting
feature in its own right, but it brings up other problems.  For example,
suppose you write a tuple, then fail while writing its index entries.  The
tuple is already frozen and visible, but it lacks a full set of live index
entries.  The subtransaction aborts, but the top transaction commits and makes
the situation permanent.

Incidentally, I contend that we should write frozen tuples to new/truncated
tables unconditionally.  The current behavior of making old snapshots see the
table as empty violates atomicity at least as badly as letting those snapshots
see the future-snapshot contents.  But Marti has a sound proposal that would
interact with your efforts here to avoid violating atomicity at all:
http://archives.postgresql.org/message-id/cabrt9rbrmdsoz8kxgehfb4lg-ev9u67-6dlqvoiibpkkhtl...@mail.gmail.com

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-02 Thread Kevin Grittner
Noah Misch n...@leadboat.com wrote:
 
 Incidentally, I contend that we should write frozen tuples to
 new/truncated tables unconditionally.
 
+1
 
 The current behavior of making old snapshots see the table as
 empty violates atomicity at least as badly as letting those 
 snapshots see the future-snapshot contents.
 
Right, there was no point where the table existed as empty at the
end of a transaction, so it is quite broken as things stand now.
 
 But Marti has a sound proposal that would interact with your
 efforts here to avoid violating atomicity at all
 
Well, getting it right is certainly better than moving from a slow
non-conforming behavior to a fast non-conforming behavior.  ;-)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-02 Thread Simon Riggs
On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch n...@leadboat.com wrote:
 On Fri, Mar 02, 2012 at 08:46:45AM +, Simon Riggs wrote:
 On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas 
 heikki.linnakan...@enterprisedb.com wrote:
  It's still broken:

 [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO]

 So this approach isn't the one...

 The COPY FREEZE patch provides a way for the user to say explicitly
 that they don't really care about these MVCC corner cases and as a
 result allows us to avoid touching XidInMVCCSnapshot() at all. So
 there is still a patch on the table.

 You can salvage the optimization by tightening its prerequisite: use it when
 the current subtransaction or a child thereof created or truncated the table.
 A parent subtransaction having done so is acceptable for the WAL avoidance
 optimization but not for this.

That's already what it does. The problem is what happens after the COPY.

If we said you can't see the rows you just loaded it would be
problem over, but in my opinion that needs an explicit request from
the user to show they accept that.

 A COPY FREEZE ignoring that stronger restriction would be an interesting
 feature in its own right, but it brings up other problems.  For example,
 suppose you write a tuple, then fail while writing its index entries.  The
 tuple is already frozen and visible, but it lacks a full set of live index
 entries.  The subtransaction aborts, but the top transaction commits and makes
 the situation permanent.

The optimisation only works in the subtransaction that created the
relfilenode so that isn't an issue.

Thanks for your input.

 Incidentally, I contend that we should write frozen tuples to new/truncated
 tables unconditionally.  The current behavior of making old snapshots see the
 table as empty violates atomicity at least as badly as letting those snapshots
 see the future-snapshot contents.  But Marti has a sound proposal that would
 interact with your efforts here to avoid violating atomicity at all:
 http://archives.postgresql.org/message-id/cabrt9rbrmdsoz8kxgehfb4lg-ev9u67-6dlqvoiibpkkhtl...@mail.gmail.com

Personally, that seems a pretty reasonable thing.

I like Marti's idea. At present, making his idea work could easily
mean checksums sink, so not sure whether to attempt to make that work
in detail.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-02 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 
 I like Marti's idea. At present, making his idea work could easily
 mean checksums sink, so not sure whether to attempt to make that
 work in detail.
 
For my part, improving bulk load performance and TRUNCATE
transactional semantics would trump checksums.  If we have to defer
one or the other to a later release, I would prefer that we bump
checksums.
 
While I understand the desire for checksums, and understand others
have had situations where they would have helped, so far I have yet
to run into a situation where I feel they would have helped me.  The
hint bit and freeze issues require ongoing attention and have real
performance consequences on a regular basis.  And closing the window
for odd transactional semantics on TRUNCATE versus DELETE of all
rows in a table would be one less thing to worry about, in my world.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-01 Thread Simon Riggs
On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 24.02.2012 22:55, Simon Riggs wrote:

 A long time ago, in a galaxy far away, we discussed ways to speed up
 data loads/COPY.
 http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php

 In particular, the idea that we could mark tuples as committed while
 we are still loading them, to avoid negative behaviour for the first
 reader.

 Simple patch to implement this is attached, together with test case.

  ...


 What exactly does it do? Previously, we optimised COPY when it was
 loading data into a newly created table or a freshly truncated table.
 This patch extends that and actually sets the tuple header flag as
 HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
 code. The patch also adds some tests for corner cases that would make
 that action break MVCC - though those cases are minor and typical data
 loads will benefit fully from this.


 This doesn't work with subtransactions:
...
 The query should return the row copied in the same subtransaction.

Thanks for pointing that out.

New patch with corrected logic and test case attached.

 In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC()
 and adding current xid to snapshots. That is an invasive change that I
 would wish to avoid at any time and explains the long delay in
 tackling this. The way I've implemented it, is just as a short test
 during XidInMVCCSnapshot() so that we trap the case when the xid ==
 xmax and so would appear to be running. This is much less invasive and
 just as performant as Tom's original suggestion.


 TransactionIdIsCurrentTransactionId() can be fairly expensive if you have a
 lot of subtransactions open...

I've put in something to avoid that cost for the common case - just a boolean.

This seems like the best plan rather than the explicit FREEZE option.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c910863..3899282 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2056,6 +2056,17 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup-t_tableOid = RelationGetRelid(relation);
 
 	/*
+	 * If we are inserting into a new relation invisible as yet to other
+	 * backends and our session has no prior snapshots and no ready portals
+	 * then we can also set the hint bit for the rows we are inserting. The
+	 * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives
+	 * the right answer if the current transaction inspects the data after
+	 * we load it.
+	 */
+	if (options  HEAP_INSERT_HINT_XMIN)
+		tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED;
+
+	/*
 	 * If the new tuple is too big for storage or contains already toasted
 	 * out-of-line attributes from some other relation, invoke the toaster.
 	 */
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index e22bdac..de7504c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -146,6 +146,7 @@ typedef struct TransactionStateData
 	int			prevSecContext; /* previous SecurityRestrictionContext */
 	bool		prevXactReadOnly;		/* entry-time xact r/o state */
 	bool		startedInRecovery;		/* did we start in recovery? */
+	bool		maySeePreHintedTuples;	/* did subtrans write pre-hinted rows? */
 	struct TransactionStateData *parent;		/* back link to parent */
 } TransactionStateData;
 
@@ -175,6 +176,7 @@ static TransactionStateData TopTransactionStateData = {
 	0,			/* previous SecurityRestrictionContext */
 	false,		/* entry-time xact r/o state */
 	false,		/* startedInRecovery */
+	false,		/* maySeePreHintedTuples */
 	NULL		/* link to parent state block */
 };
 
@@ -704,6 +706,18 @@ TransactionStartedDuringRecovery(void)
 	return CurrentTransactionState-startedInRecovery;
 }
 
+bool
+TransactionMaySeePreHintedTuples(void)
+{
+	return CurrentTransactionState-maySeePreHintedTuples;
+}
+
+void
+TransactionMayWritePreHintedTuples(void)
+{
+	CurrentTransactionState-maySeePreHintedTuples = true;
+}
+
 /*
  *	CommandCounterIncrement
  */
@@ -1689,6 +1703,7 @@ StartTransaction(void)
 		s-startedInRecovery = false;
 		XactReadOnly = DefaultXactReadOnly;
 	}
+	s-maySeePreHintedTuples = false;
 	XactDeferrable = DefaultXactDeferrable;
 	XactIsoLevel = DefaultXactIsoLevel;
 	forceSyncCommit = false;
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 110480f..1419e33 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -43,6 +43,7 @@
 #include utils/builtins.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/portal.h
 #include utils/rel.h
 #include utils/snapmgr.h
 
@@ -1922,6 +1923,13 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_SKIP_FSM;
 		if 

Re: [HACKERS] COPY with hints, rebirth

2012-03-01 Thread Heikki Linnakangas

On 01.03.2012 18:40, Simon Riggs wrote:

On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

On 24.02.2012 22:55, Simon Riggs wrote:


What exactly does it do? Previously, we optimised COPY when it was
loading data into a newly created table or a freshly truncated table.
This patch extends that and actually sets the tuple header flag as
HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
code. The patch also adds some tests for corner cases that would make
that action break MVCC - though those cases are minor and typical data
loads will benefit fully from this.


This doesn't work with subtransactions:

...

The query should return the row copied in the same subtransaction.


Thanks for pointing that out.

New patch with corrected logic and test case attached.


It's still broken:

-- create test table and file
create table a as select 1 as id;
copy a to '/tmp/a';

-- start test
postgres=# begin;
BEGIN
postgres=# truncate a;
TRUNCATE TABLE
postgres=# savepoint sp1;
SAVEPOINT
postgres=# copy a from '/tmp/a';
COPY 1
postgres=# select * from a;
 id

  1
(1 row)

postgres=# rollback to savepoint sp1;
ROLLBACK
postgres=# select * from a;
 id

  1
(1 row)

That last select should not have seen the tuple.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-02-29 Thread Simon Riggs
On Sat, Feb 25, 2012 at 6:58 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Sat, Feb 25, 2012 at 6:24 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Simon Riggs si...@2ndquadrant.com wrote:

 This patch extends that and actually sets the tuple header flag as
 HEAP_XMIN_COMMITTED during the load.

 Fantastic!

 I think we could add that as an option on COPY, since breaking MVCC
 in that way is only a bad thing if it happens accidentally without the
 user's permission and knowledge - which they may wish to give in many
 cases, such as reloading a database from a dump.

I've coded up COPY FREEZE to do just that.

This version doesn't require any changes to transaction machinery.

But it is very effective at avoiding 4 out of the 5 writes you mention.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index a73b022..4912449 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { replaceable class=parametertable_name/replaceable [ ( replaceable
 
 FORMAT replaceable class=parameterformat_name/replaceable
 OIDS [ replaceable class=parameterboolean/replaceable ]
+FREEZE [ replaceable class=parameterboolean/replaceable ]
 DELIMITER 'replaceable class=parameterdelimiter_character/replaceable'
 NULL 'replaceable class=parameternull_string/replaceable'
 HEADER [ replaceable class=parameterboolean/replaceable ]
@@ -181,6 +182,23 @@ COPY { replaceable class=parametertable_name/replaceable [ ( replaceable
/varlistentry
 
varlistentry
+termliteralFREEZE/literal/term
+listitem
+ para
+  Specifies copying the data with rows already frozen, just as they
+  would be after running the commandVACUUM FREEZE/ command.
+  This only occurs if the table being loaded has been created in this
+  subtransaction, there are no cursors open and there are no older
+  snapshots held by this transaction.
+  Note that all sessions will immediately be able to see the data
+  if it has been successfully loaded, which specifically operates
+  outside of the normal definitions of MVCC. This is a performance
+  option intended for use only during initial data loads.
+ /para
+/listitem
+   /varlistentry
+
+   varlistentry
 termliteralDELIMITER/literal/term
 listitem
  para
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c910863..68534bf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2050,7 +2050,13 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup-t_data-t_infomask = ~(HEAP_XACT_MASK);
 	tup-t_data-t_infomask2 = ~(HEAP2_XACT_MASK);
 	tup-t_data-t_infomask |= HEAP_XMAX_INVALID;
-	HeapTupleHeaderSetXmin(tup-t_data, xid);
+	if (options  HEAP_INSERT_FREEZE)
+	{
+		HeapTupleHeaderSetXmin(tup-t_data, FrozenTransactionId);
+		tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED;
+	}
+	else
+		HeapTupleHeaderSetXmin(tup-t_data, xid);
 	HeapTupleHeaderSetCmin(tup-t_data, cid);
 	HeapTupleHeaderSetXmax(tup-t_data, 0);		/* for cleanliness */
 	tup-t_tableOid = RelationGetRelid(relation);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 110480f..ec0bed8 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -43,6 +43,7 @@
 #include utils/builtins.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/portal.h
 #include utils/rel.h
 #include utils/snapmgr.h
 
@@ -108,6 +109,7 @@ typedef struct CopyStateData
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		binary;			/* binary format? */
 	bool		oids;			/* include OIDs? */
+	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
 	bool		header_line;	/* CSV header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
@@ -889,6 +891,14 @@ ProcessCopyOptions(CopyState cstate,
 		 errmsg(conflicting or redundant options)));
 			cstate-oids = defGetBoolean(defel);
 		}
+		else if (strcmp(defel-defname, freeze) == 0)
+		{
+			if (cstate-freeze)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg(conflicting or redundant options)));
+			cstate-freeze = defGetBoolean(defel);
+		}
 		else if (strcmp(defel-defname, delimiter) == 0)
 		{
 			if (cstate-delim)
@@ -1922,6 +1932,11 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_SKIP_FSM;
 		if (!XLogIsNeeded())
 			hi_options |= HEAP_INSERT_SKIP_WAL;
+
+		if (cstate-freeze 
+			ThereAreNoPriorRegisteredSnapshots() 
+			ThereAreNoReadyPortals())
+			hi_options |= HEAP_INSERT_FREEZE;
 	}
 
 	/*
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d1ce2ab..cd2b243 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2315,6 +2315,10 @@ copy_opt_item:
 		

Re: [HACKERS] COPY with hints, rebirth

2012-02-29 Thread Christopher Browne
On Wed, Feb 29, 2012 at 11:14 AM, Simon Riggs si...@2ndquadrant.com wrote:
 But it is very effective at avoiding 4 out of the 5 writes you mention.

For the common case, would we not want to have (1) [WAL] and (2)
[Writing the pre-frozen tuple]?

If we only write the tuple (2), and don't capture WAL, then the COPY
wouldn't be replicable, right?
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-02-29 Thread Simon Riggs
On Wed, Feb 29, 2012 at 6:14 PM, Christopher Browne cbbro...@gmail.com wrote:
 On Wed, Feb 29, 2012 at 11:14 AM, Simon Riggs si...@2ndquadrant.com wrote:
 But it is very effective at avoiding 4 out of the 5 writes you mention.

 For the common case, would we not want to have (1) [WAL] and (2)
 [Writing the pre-frozen tuple]?

 If we only write the tuple (2), and don't capture WAL, then the COPY
 wouldn't be replicable, right?

Well, my answer is a question: how would you like it to work?

The way I coded it is that it will still write WAL if wal_level is
set, so it would be replicable. So it only works when writing to a
newly created table but is otherwise separate to whether WAL is
skipped.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-02-26 Thread Heikki Linnakangas

On 24.02.2012 22:55, Simon Riggs wrote:

A long time ago, in a galaxy far away, we discussed ways to speed up
data loads/COPY.
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php

In particular, the idea that we could mark tuples as committed while
we are still loading them, to avoid negative behaviour for the first
reader.

Simple patch to implement this is attached, together with test case.

 ...

What exactly does it do? Previously, we optimised COPY when it was
loading data into a newly created table or a freshly truncated table.
This patch extends that and actually sets the tuple header flag as
HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
code. The patch also adds some tests for corner cases that would make
that action break MVCC - though those cases are minor and typical data
loads will benefit fully from this.


This doesn't work with subtransactions:

postgres=# create table a as select 1 as id;
SELECT 1
postgres=# copy a to '/tmp/a';
COPY 1
postgres=# begin;
BEGIN
postgres=# truncate a;
TRUNCATE TABLE
postgres=# savepoint sp1;
SAVEPOINT
postgres=# copy a from '/tmp/a';
COPY 1
postgres=# select * from a;
 id

(0 rows)

The query should return the row copied in the same subtransaction.



In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC()
and adding current xid to snapshots. That is an invasive change that I
would wish to avoid at any time and explains the long delay in
tackling this. The way I've implemented it, is just as a short test
during XidInMVCCSnapshot() so that we trap the case when the xid ==
xmax and so would appear to be running. This is much less invasive and
just as performant as Tom's original suggestion.


TransactionIdIsCurrentTransactionId() can be fairly expensive if you 
have a lot of subtransactions open...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-02-25 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 
 This patch extends that and actually sets the tuple header flag as
 HEAP_XMIN_COMMITTED during the load.
 
Fantastic!
 
So, without bulk-load conditions, a long-lived tuple in PostgreSQL
is written to disk at least five times[1]:
 
(1) The WAL record for the inserted tuple is written.
(2) The inserted tuple is written.
(3) The HEAP_XMIN_COMMITTED bit is set and the tuple is re-written
in place some time after the inserting transaction's COMMIT.
(4) The WAL record for the freeze in write 5 is written.
(5) The xmin is set to frozen and the tuple is rewritten in place
some time after every other connection can see it.
 
Prior to your patch, bulk load omitted write 1.  With your patch we
will also omit write 3.
 
Since you've just been looking at this area, do you have any
thoughts about writes 4 and 5 being rendered unnecessary by writing
bulk-loaded tuples with a frozen xmin, and having transactions with
a snapshot which doesn't include the bulk load's transaction just
not seeing the table?  (Or am I just dreaming about those?)
 
-Kevin
 
[1] If you are archiving, it could be more.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-02-25 Thread Simon Riggs
On Sat, Feb 25, 2012 at 6:24 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Simon Riggs si...@2ndquadrant.com wrote:

 This patch extends that and actually sets the tuple header flag as
 HEAP_XMIN_COMMITTED during the load.

 Fantastic!

 So, without bulk-load conditions, a long-lived tuple in PostgreSQL
 is written to disk at least five times[1]:

 (1) The WAL record for the inserted tuple is written.
 (2) The inserted tuple is written.
 (3) The HEAP_XMIN_COMMITTED bit is set and the tuple is re-written
    in place some time after the inserting transaction's COMMIT.
 (4) The WAL record for the freeze in write 5 is written.
 (5) The xmin is set to frozen and the tuple is rewritten in place
    some time after every other connection can see it.

 Prior to your patch, bulk load omitted write 1.  With your patch we
 will also omit write 3.

Yes, well explained.

 Since you've just been looking at this area, do you have any
 thoughts about writes 4 and 5 being rendered unnecessary by writing
 bulk-loaded tuples with a frozen xmin, and having transactions with
 a snapshot which doesn't include the bulk load's transaction just
 not seeing the table?  (Or am I just dreaming about those?)

Setting straight to frozen breaks MVCC, unless/until we use MVCC for
catalog access because we can see the table immediately and then read
the contents as if they had always been there.

I think we could add that as an option on COPY, since breaking MVCC
in that way is only a bad thing if it happens accidentally without the
user's permission and knowledge - which they may wish to give in many
cases, such as reloading a database from a dump.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] COPY with hints, rebirth

2012-02-24 Thread Simon Riggs
A long time ago, in a galaxy far away, we discussed ways to speed up
data loads/COPY.
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php

In particular, the idea that we could mark tuples as committed while
we are still loading them, to avoid negative behaviour for the first
reader.

Simple patch to implement this is attached, together with test case.

Current behaviour is shown here
Run COPY and then... SELECT count(*) FROM table with no indexes
1st SELECT Time: 1518.571 ms --- slowed dramatically by setting hint bits
2nd SELECT Time: 914.141 ms
3rd SELECT Time: 914.921 ms

With this patch I observed the following results
1st SELECT Time: 890.820 ms
2nd SELECT Time: 884.799 ms
3rd SELECT Time: 882.405 ms

What exactly does it do? Previously, we optimised COPY when it was
loading data into a newly created table or a freshly truncated table.
This patch extends that and actually sets the tuple header flag as
HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
code. The patch also adds some tests for corner cases that would make
that action break MVCC - though those cases are minor and typical data
loads will benefit fully from this.

In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC()
and adding current xid to snapshots. That is an invasive change that I
would wish to avoid at any time and explains the long delay in
tackling this. The way I've implemented it, is just as a short test
during XidInMVCCSnapshot() so that we trap the case when the xid ==
xmax and so would appear to be running. This is much less invasive and
just as performant as Tom's original suggestion.

Why do we need this now? Setting checksums on page requires us to
write WAL for hints, so the situation of the 1st SELECT after a load
would get somewhat worse when page_checksums are enabled, but we
already know there is a price. However, this is a situation we can
solve, and add value for all cases, not just when checksums enabled.
So I'm posting this as a separate patch rather than including that as
a tuning feature of the checksums patch.

Your input will be generously received,

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c910863..3899282 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2056,6 +2056,17 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup-t_tableOid = RelationGetRelid(relation);
 
 	/*
+	 * If we are inserting into a new relation invisible as yet to other
+	 * backends and our session has no prior snapshots and no ready portals
+	 * then we can also set the hint bit for the rows we are inserting. The
+	 * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives
+	 * the right answer if the current transaction inspects the data after
+	 * we load it.
+	 */
+	if (options  HEAP_INSERT_HINT_XMIN)
+		tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED;
+
+	/*
 	 * If the new tuple is too big for storage or contains already toasted
 	 * out-of-line attributes from some other relation, invoke the toaster.
 	 */
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 110480f..6a60eb8 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -43,6 +43,7 @@
 #include utils/builtins.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/portal.h
 #include utils/rel.h
 #include utils/snapmgr.h
 
@@ -1922,6 +1923,10 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_SKIP_FSM;
 		if (!XLogIsNeeded())
 			hi_options |= HEAP_INSERT_SKIP_WAL;
+
+		if (ThereAreNoPriorRegisteredSnapshots() 
+			ThereAreNoReadyPortals())
+			hi_options |= HEAP_INSERT_HINT_XMIN;
 	}
 
 	/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index cfb73c1..24075db 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -1055,3 +1055,22 @@ pg_cursor(PG_FUNCTION_ARGS)
 
 	return (Datum) 0;
 }
+
+bool
+ThereAreNoReadyPortals(void)
+{
+	HASH_SEQ_STATUS status;
+	PortalHashEnt *hentry;
+
+	hash_seq_init(status, PortalHashTable);
+
+	while ((hentry = (PortalHashEnt *) hash_seq_search(status)) != NULL)
+	{
+		Portal		portal = hentry-portal;
+
+		if (portal-status == PORTAL_READY)
+			return false;
+	}
+
+	return true;
+}
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 5aebbd1..5d9e3bf 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1183,3 +1183,12 @@ DeleteAllExportedSnapshotFiles(void)
 
 	FreeDir(s_dir);
 }
+
+bool
+ThereAreNoPriorRegisteredSnapshots(void)
+{
+	if (RegisteredSnapshots = 1)
+		return true;
+
+	return false;
+}
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 31791a7..a2e5524 100644
--- a/src/backend/utils/time/tqual.c
+++