On 2015-02-23 15:48:25 +0000, Thom Brown wrote:
> On 23 February 2015 at 15:42, Andres Freund <and...@2ndquadrant.com> wrote:
> 
> > On 2015-02-23 16:38:44 +0100, Andres Freund wrote:
> > > I unfortunately don't remember enough of the thread to reference it
> > > here.
> >
> > Found the right keywords. The threads below
> >
> > http://archives.postgresql.org/message-id/369698E947874884A77849D8FE3680C2%40maumau
> > and
> >
> > http://www.postgresql.org/message-id/5CF4ABBA67674088B3941894E22A0D25@maumau
> >
> 
> Yes, this seems to be virtually the same issue reported.  The trace looks
> the same except for RecordTransactionCommit.

So, I proposed in
http://www.postgresql.org/message-id/20140707155113.gb1...@alap3.anarazel.de
that we make sequences assign a xid and only wait for syncrep when a xid
is assigned. The biggest blocker was that somebody would have to do some
code reviewing to find other locations that might need similar
treatment.

I did a, quick, grep for XLogInsert() and I think we're otherwise
fine. There's some debatable cases:

* XLOG_STANDBY_LOCK  doesn't force a xid to be assigned. I think it's
  harmless though, as we really only need to wait for that to be
  replicated if the transaction did something relevant (i.e. catalog
  changes). And those will force xid assignment.
* 2pc records don't assign a xid. But twophase.c does it's own waiting,
  so that's fine.
* Plain vacuums will not trigger waits. But I think that's good. There's
  really no need to wait if all that's been done is some cleanup without
  visible consequences.
* Fujii brought up that we might want to wait for XLOG_SWITCH - I don't
  really see why.
* XLOG_RESTORE_POINT is a similar candidate - I don't see really valid
  arguments for making 2pc wait.


The attached, untested, patch changes things so that we
a) only wait for syncrep if we both wrote WAL and had a xid assigned
b) use an async commit if we just had a xid assigned, without having
   written WAL, even if synchronous_commit = off
c) acquire a xid when WAL logging sequence changes (arguable at least
   one of the xid assignments is redundant, but it doesn't cost
   anything, so ...)

I think it makes sense to change a) and b) that way because there's no
need to wait for WAL flushes/syncrep waits when all that happened is
manipulations of temporary/unlogged tables or HOT pruning. It's slightly
wierd that the on-disk flush and the syncrep wait essentially used two
different mechanisms for deciding when to flush.

Comments? This is obviously just a POC, but I think something like this
does make a great deal of sense.

Thom, does that help?

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 16828b249b4d2ebfc04678f2570d6c439981e472 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Mon, 23 Feb 2015 17:38:40 +0100
Subject: [PATCH] Reconsider when to flush WAL and wait for syncrep while
 committing.

---
 src/backend/access/transam/xact.c | 30 ++++++++++++++++++------------
 src/backend/commands/sequence.c   | 23 +++++++++++++++++++++++
 2 files changed, 41 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 97000ef..2818a03 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1031,10 +1031,9 @@ RecordTransactionCommit(void)
 
 		/*
 		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should flush those entries the same as a commit record.  (An
-		 * example of a possible record that wouldn't cause an XID to be
-		 * assigned is a sequence advance record due to nextval() --- we want
-		 * to flush that to disk before reporting commit.)
+		 * should trigger flushing those entries the same as a commit record
+		 * would.  This will primarily happen for HOT pruning and the like; we
+		 * want these to be flushed to disk in due time.
 		 */
 		if (!wrote_xlog)
 			goto cleanup;
@@ -1153,11 +1152,13 @@ RecordTransactionCommit(void)
 	/*
 	 * Check if we want to commit asynchronously.  We can allow the XLOG flush
 	 * to happen asynchronously if synchronous_commit=off, or if the current
-	 * transaction has not performed any WAL-logged operation.  The latter
-	 * case can arise if the current transaction wrote only to temporary
-	 * and/or unlogged tables.  In case of a crash, the loss of such a
-	 * transaction will be irrelevant since temp tables will be lost anyway,
-	 * and unlogged tables will be truncated.  (Given the foregoing, you might
+	 * transaction has not performed any WAL-logged operation or didn't assign
+	 * a xid.  The transaction can end up not writing any WAL, even if it has
+	 * a xid, if it only wrote to temporary and/or unlogged tables.  It can
+	 * end up having written WAL without and xid if it did HOT pruning.  In
+	 * case of a crash, the loss of such a transaction will be irrelevant;
+	 * temp tables will be lost anyway, unlogged tables will be truncated and
+	 * HOT pruning will be done again later. (Given the foregoing, you might
 	 * think that it would be unnecessary to emit the XLOG record at all in
 	 * this case, but we don't currently try to do that.  It would certainly
 	 * cause problems at least in Hot Standby mode, where the
@@ -1173,7 +1174,8 @@ RecordTransactionCommit(void)
 	 * if all to-be-deleted tables are temporary though, since they are lost
 	 * anyway if we crash.)
 	 */
-	if ((wrote_xlog && synchronous_commit > SYNCHRONOUS_COMMIT_OFF) ||
+	if ((wrote_xlog && markXidCommitted &&
+		 synchronous_commit > SYNCHRONOUS_COMMIT_OFF) ||
 		forceSyncCommit || nrels > 0)
 	{
 		XLogFlush(XactLastRecEnd);
@@ -1222,12 +1224,16 @@ RecordTransactionCommit(void)
 	latestXid = TransactionIdLatest(xid, nchildren, children);
 
 	/*
-	 * Wait for synchronous replication, if required.
+	 * Wait for synchronous replication, if required. With similar reasons to
+	 * the decision above about using using a async commit we only want to
+	 * wait if this backend assigned a xid and wrote WAL.  No need to wait if
+	 * a xid was assigned due to temporary/unlogged tables or due to HOT
+	 * pruning.
 	 *
 	 * Note that at this stage we have marked clog, but still show as running
 	 * in the procarray and continue to hold locks.
 	 */
-	if (wrote_xlog)
+	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd);
 
 	/* Reset XactLastRecEnd until the next transaction writes something */
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 622ccf7..0070c4f 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -17,6 +17,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
+#include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
@@ -358,6 +359,10 @@ fill_seq_with_data(Relation rel, HeapTuple tuple)
 	tuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
 	ItemPointerSet(&tuple->t_data->t_ctid, 0, FirstOffsetNumber);
 
+	/* check the comment above nextval_internal()'s equivalent call. */
+	if (RelationNeedsWAL(rel))
+		GetTopTransactionId();
+
 	START_CRIT_SECTION();
 
 	MarkBufferDirty(buf);
@@ -438,6 +443,10 @@ AlterSequence(AlterSeqStmt *stmt)
 	/* Note that we do not change the currval() state */
 	elm->cached = elm->last;
 
+	/* check the comment above nextval_internal()'s equivalent call. */
+	if (RelationNeedsWAL(seqrel))
+		GetTopTransactionId();
+
 	/* Now okay to update the on-disk tuple */
 	START_CRIT_SECTION();
 
@@ -679,6 +688,16 @@ nextval_internal(Oid relid)
 
 	last_used_seq = elm;
 
+	/*
+	 * If something needs to be WAL logged, acquire an xid, so this
+	 * transaction's commit will trigger a WAL flush and wait for
+	 * syncrep. It's sufficient to ensure the toplevel transaction has a xid,
+	 * no need to assign xids subxacts, that'll already trigger a appropriate
+	 * wait.  (Have to do that here, so we're outside the critical section)
+	 */
+	if (logit && RelationNeedsWAL(seqrel))
+		GetTopTransactionId();
+
 	/* ready to change the on-disk (or really, in-buffer) tuple */
 	START_CRIT_SECTION();
 
@@ -867,6 +886,10 @@ do_setval(Oid relid, int64 next, bool iscalled)
 	/* In any case, forget any future cached numbers */
 	elm->cached = elm->last;
 
+	/* check the comment above nextval_internal()'s equivalent call. */
+	if (RelationNeedsWAL(seqrel))
+		GetTopTransactionId();
+
 	/* ready to change the on-disk (or really, in-buffer) tuple */
 	START_CRIT_SECTION();
 
-- 
2.3.0.149.gf3f4077.dirty

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to