Re: [HACKERS] Walsender timeouts and large transactions

Petr Jelinek Wed, 01 Nov 2017 07:50:10 -0700

Hi,

sorry for the delay but I didn't have much time in past weeks to follow
this thread.

On 02/10/17 05:44, Kyotaro HORIGUCHI wrote:
> Hello Sokolov.
> 
> At Fri, 29 Sep 2017 15:19:23 +0300, Sokolov Yura 
> <[email protected]> wrote in 
> <[email protected]>
>> I don't want to make test to lasts so long and generate so many data.
>> That is why I used such small timeouts for tests.
> 
> I understand your point, but still *I* think such a short timeout
> is out of expectation by design. (But it can be set.)
> 
> Does anyone have opinions on this?

Yes, it's not practically useful to have such small wal_sender_timeout
given that the main purpose of that is to detect network issues.

> 
>> Test is failing if there is "short quit" after
>> `!pq_is_send_pending()`,
>> so I doubt your patch will pass the test.
> 
> It is because I think that the test "should" fail since the
> timeout is out of expected range. I (and perhaps also Petr) is
> thinking that the problem is just that a large transaction causes
> a timeout with an ordinary timeout. My test case is based on the
> assumption.
> 
> Your test is for a timeout during replication-startup with
> extremely short timeout. This may be a different problem to
> discuss, but perhaps better to be solved together.
> 
> I'd like to have opnions from others on this point.

I think it's different problem and because of what I wrote above it does
not seem to me like something we should spend out time on trying to fix.
> 
> Uggh! I misunderstood there. It wais for writing socket so the
> sleep is wrong and WaitLatchOrSocket is right.
> 
> After all, I put +1 for Petr's latest patch. Sorry for my
> carelessness.
> 

Great, I attached version 3 which is just rebased on current master and
also does not move the GetCurrentTimestamp() call because the concern
about skewing the timestamp during config reload (and also network flush
as I realized later) is valid.

It's rather hard to test all the scenarios that this patch fixes in
automated fashion without generating a lot of wal or adding sleeps to
the processing. That's why I didn't produce usable TAP test.

Since it seems like some of my reasoning is unclear I will try to
describe it once more.

The main problem we have is that unless we call the
ProcessRepliesIfAny() before the wal_sender_timeout expires we'll die
because of timeout eventually. The current coding will skip that call if
there is a long transaction being processed (if network is fast enough).
This is what the first part (first 2 hunks) of the patch solves. There
is also issue that while this is happening the walsender ignores signals
so it's impossible to stop it which is why I added the
CHECK_FOR_INTERRUPTS to the fast path.

The second problem is that if we solved just the first one, then if
downstream (and again network) is fast enough, the ProcessRepliesIfAny()
will not do anything useful because downstream is not going to send any
response while the network buffer contains any data. This is caused by
the fact that we normally code the receiver side to receive until there
is something and only send reply when there is a "pause" in the stream.
To get around this problem we also need to make sure that we send
WalSndKeepaliveIfNecessary() periodically and that will not happen on
fast network unless we do the second part of the patch (3rd hunk), ie,
move the pq_is_send_pending() after the keepalive handling.

This code is specific to logical decoding walsender interface so it only
happens for logical decoding/replication (which means it should be
backported all the way to 9.4). The physical one

These two issues happen quite normally in the wild as all we need is big
data load in single transaction, or update of large part of an table or
something similar for this to happen with default settings.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

From df2d5b09a74cb31537e2bb74895a8e31febce5f8 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <[email protected]>
Date: Tue, 31 Oct 2017 14:00:37 +0100
Subject: [PATCH] Fix walsender timeouts when decoding large transaction

The logical slots have fast code path for sending data in order to not
impose too high per message overhead. The fast path skips checks for
interrupts and timeouts. However, the existing coding failed to consider
the fact that transaction with large number of changes may take very long
to be processed and sent to the client. This causes walsender to ignore
interrupts for potentially long time and more importantly it will cause
walsender being killed due to timeout at the end of such transaction.

This commit changes the fast path to also check for interrupts and only
allows calling the fast path when last keeplaive check happened less
than half of walsender timeout ago, otherwise the slower code path will
be taken.
---
 src/backend/replication/walsender.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index fa1db748b5..79c5153ac7 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1151,6 +1151,8 @@ static void
 WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 				bool last_write)
 {
+	TimestampTz	now = GetCurrentTimestamp();
+
 	/* output previously gathered data in a CopyData packet */
 	pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
 
@@ -1160,23 +1162,28 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 	 * several releases by streaming physical replication.
 	 */
 	resetStringInfo(&tmpbuf);
-	pq_sendint64(&tmpbuf, GetCurrentTimestamp());
+	pq_sendint64(&tmpbuf, now);
 	memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
 		   tmpbuf.data, sizeof(int64));
 
-	/* fast path */
-	/* Try to flush pending output to the client */
-	if (pq_flush_if_writable() != 0)
-		WalSndShutdown();
+	/* Try taking fast path unless we get too close to walsender timeout. */
+	if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
+										  wal_sender_timeout / 2))
+	{
+		CHECK_FOR_INTERRUPTS();
 
-	if (!pq_is_send_pending())
-		return;
+		/* Try to flush pending output to the client */
+		if (pq_flush_if_writable() != 0)
+			WalSndShutdown();
+
+		if (!pq_is_send_pending())
+			return;
+	}
 
 	for (;;)
 	{
 		int			wakeEvents;
 		long		sleeptime;
-		TimestampTz now;
 
 		/*
 		 * Emergency bailout if postmaster has died.  This is to avoid the
@@ -1205,10 +1212,6 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 		if (pq_flush_if_writable() != 0)
 			WalSndShutdown();
 
-		/* If we finished clearing the buffered data, we're done here. */
-		if (!pq_is_send_pending())
-			break;
-
 		now = GetCurrentTimestamp();
 
 		/* die if timeout was reached */
@@ -1217,6 +1220,10 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 		/* Send keepalive if the time has come */
 		WalSndKeepaliveIfNecessary(now);
 
+		/* If we finished clearing the buffered data, we're done here. */
+		if (!pq_is_send_pending())
+			break;
+
 		sleeptime = WalSndComputeSleeptime(now);
 
 		wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
-- 
2.11.0

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Walsender timeouts and large transactions

Reply via email to