On Tuesday, June 26, 2012 04:06:08 PM Andres Freund wrote:
> On Tuesday, June 26, 2012 04:01:26 PM Robert Haas wrote:
> > On Fri, Jun 22, 2012 at 12:35 PM, Andres Freund <and...@2ndquadrant.com>
> 
> wrote:
> > >> Can you elaborate on that a bit?  What scenarios did you play around
> > >> with, and what does "win" mean in this context?
> > > 
> > > I had two machines connected locally and setup HS and my prototype
> > > between them (not at once obviously).
> > > The patch reduced all the average latency between both nodes (measured
> > > by 'ticker' rows arriving in a table on the standby), the jitter in
> > > latency and the amount of load I had to put on the master before the
> > > standby couldn't keep up anymore.
> > > 
> > > I played with different loads:
> > > * multple concurrent ~50MB COPY's
> > > * multple concurrent ~50MB COPY's, pgbench
> > > * pgbench
> > > 
> > > All three had a ticker running concurrently with synchronous_commit=off
> > > (because it shouldn't cause any difference in the replication pattern
> > > itself).
> > > 
> > > The difference in averagelag and cutoff were smallest with just pgbench
> > > running alone and biggest with COPY running alone. Highjitter was most
> > > visible with just pgbench running alone but thats likely just because
> > > the average lag was smaller.
> > 
> > OK, that sounds pretty promising.  I'd like to run a few performance
> > tests on this just to convince myself that it doesn't lead to a
> > significant regression in other scenarios.  Assuming that doesn't turn
> > up anything major, I'll go ahead and commit this.
> 
> Independent testing would be great, its definitely possible that I oversaw
> something although I obviously don't think so ;).
> 
> > Can you provide a rebased version?  It seems that one of the hunks in
> > xlog.c no longer applies.
> 
> Will do so. Not sure if I can finish it today though, I am in the midst of
> redoing the ilist and xlogreader patches. I guess tomorrow will suffice
> otherwise...
Ok, attached are two patches:
The first is the rebased version of the original patch with 
WalSndWakeupProcess renamed to WalSndWakeupProcessRequests (seems clearer).

The second changes WalSndWakeupRequest and WalSndWakeupProcessRequests into 
macros as you requested before. I am not sure if its a good idea or not.

Anything else?

Greetings,

Andres
-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 164276afc60fa2451f28de996fcc54567e6168e2 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Tue, 29 May 2012 20:00:16 +0200
Subject: [PATCH 1/2] Overhaul walsender wakeup handling

The previous coding could miss xlog writeouts at several places. E.g. when wal
was written out by the background writer or even after a commit if
synchronous_commit=off.
This could lead to delays in sending data to the standby of up to 7 seconds.

To fix this move the responsibility of notification to the layer where the
neccessary information is actually present. We take some care not to do the
notification while we hold conteded locks like WALInsertLock or WalWriteLock
locks.

Document the preexisting fact that we rely on SetLatch to be safe from within
signal handlers and critical sections.

This removes the temporary bandaid from 2c8a4e9be2730342cbca85150a2a9d876aa77ff6
---
 src/backend/access/transam/twophase.c |   21 -----------------
 src/backend/access/transam/xact.c     |    7 ------
 src/backend/access/transam/xlog.c     |   25 ++++++++++++++------
 src/backend/port/unix_latch.c         |    3 +++
 src/backend/port/win32_latch.c        |    4 ++++
 src/backend/replication/walsender.c   |   41 ++++++++++++++++++++++++++++++++-
 src/include/replication/walsender.h   |    2 ++
 7 files changed, 67 insertions(+), 36 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index e8fb78b..7f198c2 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1042,13 +1042,6 @@ EndPrepare(GlobalTransaction gxact)
 
 	/* If we crash now, we have prepared: WAL replay will fix things */
 
-	/*
-	 * Wake up all walsenders to send WAL up to the PREPARE record immediately
-	 * if replication is enabled
-	 */
-	if (max_wal_senders > 0)
-		WalSndWakeup();
-
 	/* write correct CRC and close file */
 	if ((write(fd, &statefile_crc, sizeof(pg_crc32))) != sizeof(pg_crc32))
 	{
@@ -2045,13 +2038,6 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	/* Flush XLOG to disk */
 	XLogFlush(recptr);
 
-	/*
-	 * Wake up all walsenders to send WAL up to the COMMIT PREPARED record
-	 * immediately if replication is enabled
-	 */
-	if (max_wal_senders > 0)
-		WalSndWakeup();
-
 	/* Mark the transaction committed in pg_clog */
 	TransactionIdCommitTree(xid, nchildren, children);
 
@@ -2133,13 +2119,6 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	XLogFlush(recptr);
 
 	/*
-	 * Wake up all walsenders to send WAL up to the ABORT PREPARED record
-	 * immediately if replication is enabled
-	 */
-	if (max_wal_senders > 0)
-		WalSndWakeup();
-
-	/*
 	 * Mark the transaction aborted in clog.  This is not absolutely necessary
 	 * but we may as well do it while we are here.
 	 */
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 4755ee6..86b1afa 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1142,13 +1142,6 @@ RecordTransactionCommit(void)
 		XLogFlush(XactLastRecEnd);
 
 		/*
-		 * Wake up all walsenders to send WAL up to the COMMIT record
-		 * immediately if replication is enabled
-		 */
-		if (max_wal_senders > 0)
-			WalSndWakeup();
-
-		/*
 		 * Now we may update the CLOG, if we wrote a COMMIT record above
 		 */
 		if (markXidCommitted)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 642c129..4b7daee 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1025,6 +1025,8 @@ begin:;
 
 		END_CRIT_SECTION();
 
+		/* wakeup the WalSnd now that we released the WALWriteLock */
+		WalSndWakeupProcessRequests();
 		return RecPtr;
 	}
 
@@ -1208,6 +1210,9 @@ begin:;
 
 	END_CRIT_SECTION();
 
+	/* wakeup the WalSnd now that we outside contented locks */
+	WalSndWakeupProcessRequests();
+
 	return RecPtr;
 }
 
@@ -1792,6 +1797,10 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch)
 			if (finishing_seg || (xlog_switch && last_iteration))
 			{
 				issue_xlog_fsync(openLogFile, openLogSegNo);
+
+				/* signal that we need to wakeup WalSnd later */
+				WalSndWakeupRequest();
+
 				LogwrtResult.Flush = LogwrtResult.Write;		/* end of page */
 
 				if (XLogArchivingActive())
@@ -1854,7 +1863,11 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch)
 				openLogFile = XLogFileOpen(openLogSegNo);
 				openLogOff = 0;
 			}
+
 			issue_xlog_fsync(openLogFile, openLogSegNo);
+
+			/* signal that we need to wakeup WalSnd later */
+			WalSndWakeupRequest();
 		}
 		LogwrtResult.Flush = LogwrtResult.Write;
 	}
@@ -2120,6 +2133,9 @@ XLogFlush(XLogRecPtr record)
 
 	END_CRIT_SECTION();
 
+	/* wakeup the WalSnd now that we released the WALWriteLock */
+	WalSndWakeupProcessRequests();
+
 	/*
 	 * If we still haven't flushed to the request point then we have a
 	 * problem; most likely, the requested flush point is past end of XLOG.
@@ -2245,13 +2261,8 @@ XLogBackgroundFlush(void)
 
 	END_CRIT_SECTION();
 
-	/*
-	 * If we wrote something then we have something to send to standbys also,
-	 * otherwise the replication delay become around 7s with just async
-	 * commit.
-	 */
-	if (wrote_something)
-		WalSndWakeup();
+	/* wakeup the WalSnd now that we released the WALWriteLock */
+	WalSndWakeupProcessRequests();
 
 	return wrote_something;
 }
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 65b2fc5..335e9f6 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -418,6 +418,9 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
  * NB: when calling this in a signal handler, be sure to save and restore
  * errno around it.  (That's standard practice in most signal handlers, of
  * course, but we used to omit it in handlers that only set a flag.)
+ *
+ * NB: this function is called from critical sections and signal handlers so
+ * throwing an error is not a good idea.
  */
 void
 SetLatch(volatile Latch *latch)
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index eb46dca..1f1ed33 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -247,6 +247,10 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 	return result;
 }
 
+/*
+ * The comments above the unix implementation (unix_latch.c) of this function
+ * apply here as well.
+ */
 void
 SetLatch(volatile Latch *latch)
 {
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 616d4e7..fc3ffe1 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -106,6 +106,11 @@ static StringInfoData reply_message;
  */
 static TimestampTz last_reply_timestamp;
 
+/*
+ * State for WalSndWakeupRequest
+ */
+static bool wroteNewXlogData = false;
+
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
 volatile sig_atomic_t walsender_shutdown_requested = false;
@@ -1395,7 +1400,12 @@ WalSndShmemInit(void)
 	}
 }
 
-/* Wake up all walsenders */
+/*
+ * Wake up all walsenders
+ *
+ * This will be called inside critical sections, so throwing an error is not
+ * adviseable.
+ */
 void
 WalSndWakeup(void)
 {
@@ -1405,6 +1415,35 @@ WalSndWakeup(void)
 		SetLatch(&WalSndCtl->walsnds[i].latch);
 }
 
+/*
+ * Remember that we want to wakeup walsenders later
+ *
+ * This is separated from doing the actual wakeup because the writeout is done
+ * while holding contended locks.
+ */
+void
+WalSndWakeupRequest(void)
+{
+	wroteNewXlogData = true;
+}
+
+/*
+ * wakeup walsenders if there is work to be done
+ */
+void
+WalSndWakeupProcessRequests(void)
+{
+	if(wroteNewXlogData){
+		wroteNewXlogData = false;
+		/*
+		 * Wake up all walsenders to send WAL up to the point where its flushed
+		 * safely to disk.
+		 */
+		if (max_wal_senders > 0)
+			WalSndWakeup();
+	}
+}
+
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 6553601..483e77e 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -31,6 +31,8 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
+extern void WalSndWakeupRequest(void);
+extern void WalSndWakeupProcessRequests(void);
 extern void WalSndRqstFileReload(void);
 
 extern Datum pg_stat_get_wal_senders(PG_FUNCTION_ARGS);
-- 
1.7.10.rc3.3.g19a6c.dirty

From aafd463ae415d3ce06526d72dbd85e8a571c634a Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Wed, 27 Jun 2012 12:23:01 +0200
Subject: [PATCH 2/2] Change WalSndWakeupProcessRequests and
 WalSndWakeupRequest into macros

---
 src/backend/replication/walsender.c |   38 ++++-------------------------------
 src/include/replication/walsender.h |   27 +++++++++++++++++++++++--
 2 files changed, 29 insertions(+), 36 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index fc3ffe1..a3ec1bc 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -81,6 +81,10 @@ bool		am_cascading_walsender = false;		/* Am I cascading WAL to
 int			max_wal_senders = 0;	/* the maximum number of concurrent walsenders */
 int			replication_timeout = 60 * 1000;	/* maximum time to send one
 												 * WAL data message */
+/*
+ * State for WalSndWakeupRequest
+ */
+bool wroteNewXlogData = false;
 
 /*
  * These variables are used similarly to openLogFile/Id/Seg/Off,
@@ -106,11 +110,6 @@ static StringInfoData reply_message;
  */
 static TimestampTz last_reply_timestamp;
 
-/*
- * State for WalSndWakeupRequest
- */
-static bool wroteNewXlogData = false;
-
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
 volatile sig_atomic_t walsender_shutdown_requested = false;
@@ -1415,35 +1414,6 @@ WalSndWakeup(void)
 		SetLatch(&WalSndCtl->walsnds[i].latch);
 }
 
-/*
- * Remember that we want to wakeup walsenders later
- *
- * This is separated from doing the actual wakeup because the writeout is done
- * while holding contended locks.
- */
-void
-WalSndWakeupRequest(void)
-{
-	wroteNewXlogData = true;
-}
-
-/*
- * wakeup walsenders if there is work to be done
- */
-void
-WalSndWakeupProcessRequests(void)
-{
-	if(wroteNewXlogData){
-		wroteNewXlogData = false;
-		/*
-		 * Wake up all walsenders to send WAL up to the point where its flushed
-		 * safely to disk.
-		 */
-		if (max_wal_senders > 0)
-			WalSndWakeup();
-	}
-}
-
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 483e77e..07ffa40 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -21,6 +21,7 @@ extern bool am_walsender;
 extern bool am_cascading_walsender;
 extern volatile sig_atomic_t walsender_shutdown_requested;
 extern volatile sig_atomic_t walsender_ready_to_stop;
+extern bool wroteNewXlogData;
 
 /* user-settable parameters */
 extern int	max_wal_senders;
@@ -31,10 +32,32 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
-extern void WalSndWakeupRequest(void);
-extern void WalSndWakeupProcessRequests(void);
 extern void WalSndRqstFileReload(void);
 
 extern Datum pg_stat_get_wal_senders(PG_FUNCTION_ARGS);
 
+/*
+ * Remember that we want to wakeup walsenders later
+ *
+ * This is separated from doing the actual wakeup because the writeout is done
+ * while holding contended locks.
+ */
+#define WalSndWakeupRequest() do { wroteNewXlogData = true; } while(0)
+
+/*
+ * wakeup walsenders if there is work to be done
+ */
+#define WalSndWakeupProcessRequests()										\
+do {																		\
+	if(wroteNewXlogData){													\
+		wroteNewXlogData = false;											\
+		/*																	\
+		 * Wake up all walsenders to send WAL up to the point where its		\
+		 * flushed safely to disk.											\
+		 */																	\
+		if (max_wal_senders > 0)											\
+			WalSndWakeup();													\
+	}																		\
+} while (0)
+
 #endif   /* _WALSENDER_H */
-- 
1.7.10.rc3.3.g19a6c.dirty

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to