Re: [HACKERS] Walsender timeouts and large transactions

2017-11-03 Thread Petr Jelinek
Hi,

thanks for checking.

On 02/11/17 10:00, Robert Haas wrote:
> On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek
>  wrote:
>> sorry for the delay but I didn't have much time in past weeks to follow
>> this thread.
> 
> +TimestampTz now = GetCurrentTimestamp();
> +
>  /* output previously gathered data in a CopyData packet */
>  pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
> 
>  /*
>   * Fill the send timestamp last, so that it is taken as late as possible.
>   * This is somewhat ugly, but the protocol is set as it's already used 
> for
>   * several releases by streaming physical replication.
>   */
>  resetStringInfo();
> -pq_sendint64(, GetCurrentTimestamp());
> +pq_sendint64(, now);
>  memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)],
> tmpbuf.data, sizeof(int64));
> 
> This change falsifies the comments.  Maybe initialize now just after
> resetSetringInfo() is done.

Eh, right, I can do that.

> 
> -/* fast path */
> -/* Try to flush pending output to the client */
> -if (pq_flush_if_writable() != 0)
> -WalSndShutdown();
> +/* Try taking fast path unless we get too close to walsender timeout. */
> +if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
> +  wal_sender_timeout / 2))
> +{
> +CHECK_FOR_INTERRUPTS();
> 
> -if (!pq_is_send_pending())
> -return;
> +/* Try to flush pending output to the client */
> +if (pq_flush_if_writable() != 0)
> +WalSndShutdown();
> +
> +if (!pq_is_send_pending())
> +return;
> +}
> 
> I think it's only the if (!pq_is_send_pending()) return; that needs to
> be conditional here, isn't it?  The pq_flush_if_writable() can be done
> unconditionally.
> 

Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes.
It just seems like it's needless call as we'll call both in for loop
anyway if we take the "slow" path. I admit it's not exactly big win
though. If you think it would improve readability I can move it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-11-02 Thread Robert Haas
On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek
 wrote:
> sorry for the delay but I didn't have much time in past weeks to follow
> this thread.

+TimestampTz now = GetCurrentTimestamp();
+
 /* output previously gathered data in a CopyData packet */
 pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);

 /*
  * Fill the send timestamp last, so that it is taken as late as possible.
  * This is somewhat ugly, but the protocol is set as it's already used for
  * several releases by streaming physical replication.
  */
 resetStringInfo();
-pq_sendint64(, GetCurrentTimestamp());
+pq_sendint64(, now);
 memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)],
tmpbuf.data, sizeof(int64));

This change falsifies the comments.  Maybe initialize now just after
resetSetringInfo() is done.

-/* fast path */
-/* Try to flush pending output to the client */
-if (pq_flush_if_writable() != 0)
-WalSndShutdown();
+/* Try taking fast path unless we get too close to walsender timeout. */
+if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
+  wal_sender_timeout / 2))
+{
+CHECK_FOR_INTERRUPTS();

-if (!pq_is_send_pending())
-return;
+/* Try to flush pending output to the client */
+if (pq_flush_if_writable() != 0)
+WalSndShutdown();
+
+if (!pq_is_send_pending())
+return;
+}

I think it's only the if (!pq_is_send_pending()) return; that needs to
be conditional here, isn't it?  The pq_flush_if_writable() can be done
unconditionally.

Other than that this looks like a reasonable change to me, but I'm not
an expert on this code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-11-01 Thread Petr Jelinek
Hi,

sorry for the delay but I didn't have much time in past weeks to follow
this thread.

On 02/10/17 05:44, Kyotaro HORIGUCHI wrote:
> Hello Sokolov.
> 
> At Fri, 29 Sep 2017 15:19:23 +0300, Sokolov Yura 
>  wrote in 
> 
>> I don't want to make test to lasts so long and generate so many data.
>> That is why I used such small timeouts for tests.
> 
> I understand your point, but still *I* think such a short timeout
> is out of expectation by design. (But it can be set.)
> 
> Does anyone have opinions on this?

Yes, it's not practically useful to have such small wal_sender_timeout
given that the main purpose of that is to detect network issues.

> 
>> Test is failing if there is "short quit" after
>> `!pq_is_send_pending()`,
>> so I doubt your patch will pass the test.
> 
> It is because I think that the test "should" fail since the
> timeout is out of expected range. I (and perhaps also Petr) is
> thinking that the problem is just that a large transaction causes
> a timeout with an ordinary timeout. My test case is based on the
> assumption.
> 
> Your test is for a timeout during replication-startup with
> extremely short timeout. This may be a different problem to
> discuss, but perhaps better to be solved together.
> 
> I'd like to have opnions from others on this point.

I think it's different problem and because of what I wrote above it does
not seem to me like something we should spend out time on trying to fix.
> 
> Uggh! I misunderstood there. It wais for writing socket so the
> sleep is wrong and WaitLatchOrSocket is right.
> 
> After all, I put +1 for Petr's latest patch. Sorry for my
> carelessness.
> 

Great, I attached version 3 which is just rebased on current master and
also does not move the GetCurrentTimestamp() call because the concern
about skewing the timestamp during config reload (and also network flush
as I realized later) is valid.

It's rather hard to test all the scenarios that this patch fixes in
automated fashion without generating a lot of wal or adding sleeps to
the processing. That's why I didn't produce usable TAP test.

Since it seems like some of my reasoning is unclear I will try to
describe it once more.

The main problem we have is that unless we call the
ProcessRepliesIfAny() before the wal_sender_timeout expires we'll die
because of timeout eventually. The current coding will skip that call if
there is a long transaction being processed (if network is fast enough).
This is what the first part (first 2 hunks) of the patch solves. There
is also issue that while this is happening the walsender ignores signals
so it's impossible to stop it which is why I added the
CHECK_FOR_INTERRUPTS to the fast path.

The second problem is that if we solved just the first one, then if
downstream (and again network) is fast enough, the ProcessRepliesIfAny()
will not do anything useful because downstream is not going to send any
response while the network buffer contains any data. This is caused by
the fact that we normally code the receiver side to receive until there
is something and only send reply when there is a "pause" in the stream.
To get around this problem we also need to make sure that we send
WalSndKeepaliveIfNecessary() periodically and that will not happen on
fast network unless we do the second part of the patch (3rd hunk), ie,
move the pq_is_send_pending() after the keepalive handling.

This code is specific to logical decoding walsender interface so it only
happens for logical decoding/replication (which means it should be
backported all the way to 9.4). The physical one

These two issues happen quite normally in the wild as all we need is big
data load in single transaction, or update of large part of an table or
something similar for this to happen with default settings.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From df2d5b09a74cb31537e2bb74895a8e31febce5f8 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Tue, 31 Oct 2017 14:00:37 +0100
Subject: [PATCH] Fix walsender timeouts when decoding large transaction

The logical slots have fast code path for sending data in order to not
impose too high per message overhead. The fast path skips checks for
interrupts and timeouts. However, the existing coding failed to consider
the fact that transaction with large number of changes may take very long
to be processed and sent to the client. This causes walsender to ignore
interrupts for potentially long time and more importantly it will cause
walsender being killed due to timeout at the end of such transaction.

This commit changes the fast path to also check for interrupts and only
allows calling the fast path when last keeplaive check happened less
than half of walsender timeout ago, otherwise the slower code path will
be taken.
---
 src/backend/replication/walsender.c | 31 

Re: [HACKERS] Walsender timeouts and large transactions

2017-10-01 Thread Kyotaro HORIGUCHI
Hello Sokolov.

At Fri, 29 Sep 2017 15:19:23 +0300, Sokolov Yura  
wrote in 
> I don't want to make test to lasts so long and generate so many data.
> That is why I used such small timeouts for tests.

I understand your point, but still *I* think such a short timeout
is out of expectation by design. (But it can be set.)

Does anyone have opinions on this?

> Test is failing if there is "short quit" after
> `!pq_is_send_pending()`,
> so I doubt your patch will pass the test.

It is because I think that the test "should" fail since the
timeout is out of expected range. I (and perhaps also Petr) is
thinking that the problem is just that a large transaction causes
a timeout with an ordinary timeout. My test case is based on the
assumption.

Your test is for a timeout during replication-startup with
extremely short timeout. This may be a different problem to
discuss, but perhaps better to be solved together.

I'd like to have opnions from others on this point.

> And you've change calculated sleep time with sane waiting on all
> insteresting events (using WaitLatchOrSocket) to semi-busy loop.
> It at least could affect throughput.

Uggh! I misunderstood there. It wais for writing socket so the
sleep is wrong and WaitLatchOrSocket is right.

After all, I put +1 for Petr's latest patch. Sorry for my
carelessness.

> And why did you remove `SetLatch(MyLatch)` in the end of function?
> Probably this change is correct, but not obvious.

Latch is needless there if it waited a fixed duration, but if it
waits writefd events there, also latch should be waited.


> > Any thoughts?
> 
> It certainly could be my test and my patch is wrong. But my point
> is that test should be written first. Especially for such difficult
> case. Without test it is impossible to say does our patches fix
> something. And it is impossible to say if patch does something
> wrong. And impossible to say if patch fixes this problem but
> introduce new problem.
> 
> Please, write test for your remarks. If you think, my patch breaks
> something, write test for the case my patch did broke. If you think
> my test is wrong, write your test that is more correct.
> 
> Without tests it will be just bird's hubbub.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-09-29 Thread Sokolov Yura

Good day, Kyoutaro

On 2017-09-29 11:26, Kyotaro HORIGUCHI wrote:

Hello,

At Wed, 27 Sep 2017 14:28:37 +0300, Sokolov Yura
 wrote in
<90bb67da7131e6186b50897c4b0f0...@postgrespro.ru>

On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote:
> Hello,
> At Wed, 06 Sep 2017 13:46:16 +, Yura Sokolov
>  wrote in
> <20170906134616.18925.88390.p...@coridan.postgresql.org>
> As the result, I think that the functions should be modified as
> the following.
> - Forcing slow-path if time elapses a half of a ping period is
>   right. (GetCurrentTimestamp is anyway requried.)
> - The slow-path should not sleep waiting Latch. It should just
>   pg_usleep() for maybe 1-10ms.
> - We should go to the fast path just after keepalive or response
>   message has been sent. In other words, the "if (now <" block
>   should be in the "for (;;)" loop. This avoids needless runs on
>   the slow-path.
> It would be refactorable as the following.
>   prepare for the send buffer;
>   for (;;)
>   {
> now = GetCurrentTimeStamp();
> if (now < )...
> {
>   fast-path
> }
> else
> {
>   slow-path
> }
> return if finished
> sleep for 1ms?
>   }
> What do you think about this?
> regards,
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center

Good day, Petr, Kyotaro

I've created failing test for issue (0001-Add-failing-test...) .
It tests insertion of 2 rows with 10ms wal_sender_timeout
(it fails in WalSndWriteData on master) and then deletion of
those rows with 1ms wal_sender_timeout (it fails in WalSndLoop).

Both Peter's patch and my simplified suggestion didn't pass the
test. I didn't checked Kyotaro's suggestion, though, cause I
didn't understand it well.


Mmm. The test seems broken. wal_sender_timeout = 10ms with
wal_receiver_status_interval=10s immediately causes a
timeout. Avoiding the timeout is just breaking the sane code.


I think you're not right. `wal_receiver_status_interval` is just
for status update, not for replies. Before I made my patch version,
I've added logging to every `now` and `last_reply_timestamp` during
test run. `last_reply_timestamp` definitely updates more often than
once in 10s. So, `wal_receiver_status_interval = 10s` has nothing
in common with receiver's replies, as I see.

(btw, logging slows down sender enough to "fix" test :-) )

And my patch doesn't avoid timeout check, so it doesn't break
anything. It just delays timeout on 1ms. Given, it is unpractical
to set wal_sender_timeout less than 50ms in production, this 1ms
increase in timeout check is negligible.

And I've checked just now that my patch passes test even with
wal_receiver_status_interval = 10s.



wal_sender_timeout = 3s and wal_receiver_status_interval=1s
effectively causes the problem with about 100 lines of (int)
insertion on UNIX socket connection, on my poor box.


I don't want to make test to lasts so long and generate so many data.
That is why I used such small timeouts for tests.


The original complain here came from the fact that
WalSndWriteData skips processing of replies for a long time on a
fast network. However Petr's patch fixed the problem, I pointed
that just letting the function take the slow path leads to
another problem, that is, waiting for new WAL records can result
in a unwanted pause in the slow path.

Combining the solutions for the two problem is my proposal sited
above. The sentence seems in quite bad style but the attached
file is the concorete patch of that.


Test is failing if there is "short quit" after `!pq_is_send_pending()`,
so I doubt your patch will pass the test.
And you've change calculated sleep time with sane waiting on all
insteresting events (using WaitLatchOrSocket) to semi-busy loop.
It at least could affect throughput.

And why did you remove `SetLatch(MyLatch)` in the end of function?
Probably this change is correct, but not obvious.


Any thoughts?


It certainly could be my test and my patch is wrong. But my point
is that test should be written first. Especially for such difficult
case. Without test it is impossible to say does our patches fix
something. And it is impossible to say if patch does something
wrong. And impossible to say if patch fixes this problem but
introduce new problem.

Please, write test for your remarks. If you think, my patch breaks
something, write test for the case my patch did broke. If you think
my test is wrong, write your test that is more correct.

Without tests it will be just bird's hubbub.


regards,


regards,
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-09-29 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 27 Sep 2017 14:28:37 +0300, Sokolov Yura  
wrote in <90bb67da7131e6186b50897c4b0f0...@postgrespro.ru>
> On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote:
> > Hello,
> > At Wed, 06 Sep 2017 13:46:16 +, Yura Sokolov
> >  wrote in
> > <20170906134616.18925.88390.p...@coridan.postgresql.org>
> > As the result, I think that the functions should be modified as
> > the following.
> > - Forcing slow-path if time elapses a half of a ping period is
> >   right. (GetCurrentTimestamp is anyway requried.)
> > - The slow-path should not sleep waiting Latch. It should just
> >   pg_usleep() for maybe 1-10ms.
> > - We should go to the fast path just after keepalive or response
> >   message has been sent. In other words, the "if (now <" block
> >   should be in the "for (;;)" loop. This avoids needless runs on
> >   the slow-path.
> > It would be refactorable as the following.
> >   prepare for the send buffer;
> >   for (;;)
> >   {
> > now = GetCurrentTimeStamp();
> > if (now < )...
> > {
> >   fast-path
> > }
> > else
> > {
> >   slow-path
> > }
> > return if finished
> > sleep for 1ms?
> >   }
> > What do you think about this?
> > regards,
> > --
> > Kyotaro Horiguchi
> > NTT Open Source Software Center
> 
> Good day, Petr, Kyotaro
> 
> I've created failing test for issue (0001-Add-failing-test...) .
> It tests insertion of 2 rows with 10ms wal_sender_timeout
> (it fails in WalSndWriteData on master) and then deletion of
> those rows with 1ms wal_sender_timeout (it fails in WalSndLoop).
> 
> Both Peter's patch and my simplified suggestion didn't pass the
> test. I didn't checked Kyotaro's suggestion, though, cause I
> didn't understand it well.

Mmm. The test seems broken. wal_sender_timeout = 10ms with
wal_receiver_status_interval=10s immediately causes a
timeout. Avoiding the timeout is just breaking the sane code.

wal_sender_timeout = 3s and wal_receiver_status_interval=1s
effectively causes the problem with about 100 lines of (int)
insertion on UNIX socket connection, on my poor box.

The original complain here came from the fact that
WalSndWriteData skips processing of replies for a long time on a
fast network. However Petr's patch fixed the problem, I pointed
that just letting the function take the slow path leads to
another problem, that is, waiting for new WAL records can result
in a unwanted pause in the slow path.

Combining the solutions for the two problem is my proposal sited
above. The sentence seems in quite bad style but the attached
file is the concorete patch of that.

Any thoughts?

regards,
-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***
*** 1151,1156  static void
--- 1151,1158 
  WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
  bool last_write)
  {
+ 	TimestampTz now = GetCurrentTimestamp();
+ 
  	/* output previously gathered data in a CopyData packet */
  	pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
  
***
*** 1160,1235  WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
  	 * several releases by streaming physical replication.
  	 */
  	resetStringInfo();
! 	pq_sendint64(, GetCurrentTimestamp());
  	memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)],
  		   tmpbuf.data, sizeof(int64));
  
- 	/* fast path */
- 	/* Try to flush pending output to the client */
- 	if (pq_flush_if_writable() != 0)
- 		WalSndShutdown();
- 
- 	if (!pq_is_send_pending())
- 		return;
- 
  	for (;;)
  	{
  		int			wakeEvents;
  		long		sleeptime;
  		TimestampTz now;
  
  		/*
! 		 * Emergency bailout if postmaster has died.  This is to avoid the
! 		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive())
! 			exit(1);
! 
! 		/* Clear any already-pending wakeups */
! 		ResetLatch(MyLatch);
! 
! 		CHECK_FOR_INTERRUPTS();
! 
! 		/* Process any requests or signals received recently */
! 		if (ConfigReloadPending)
! 		{
! 			ConfigReloadPending = false;
! 			ProcessConfigFile(PGC_SIGHUP);
! 			SyncRepInitConfig();
! 		}
! 
! 		/* Check for input from the client */
! 		ProcessRepliesIfAny();
! 
! 		/* Try to flush pending output to the client */
! 		if (pq_flush_if_writable() != 0)
! 			WalSndShutdown();
! 
! 		/* If we finished clearing the buffered data, we're done here. */
! 		if (!pq_is_send_pending())
! 			break;
! 
  		now = GetCurrentTimestamp();
- 
- 		/* die if timeout was reached */
- 		WalSndCheckTimeOut(now);
- 
- 		/* Send keepalive if the time has come */
- 		WalSndKeepaliveIfNecessary(now);
- 
- 		sleeptime = WalSndComputeSleeptime(now);
- 
- 		wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
- 			WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT;
- 
- 		/* Sleep until something happens or we time out */
- 		

Re: [HACKERS] Walsender timeouts and large transactions

2017-09-27 Thread Sokolov Yura

On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote:

Hello,

At Wed, 06 Sep 2017 13:46:16 +, Yura Sokolov
 wrote in
<20170906134616.18925.88390.p...@coridan.postgresql.org>

I've changed to "need review" to gain more attention from other.


I understand that the problem here is too fast network prohibits
walsender from sending replies.

In physical replication, WAL records are sent as soon as written
and the timeout is handled in the topmost loop in WalSndLoop. In
logical replication, data is sent at once at commit time in most
cases. So it can take a long time in ReorderBufferCommit without
returning to WalSndLoop (or even XLogSendLogical).

One problem here is that WalSndWriteData waits for the arrival of
the next *WAL record* in the slow-ptah because it is called by
cues of ReorderBuffer* functions (mainly *Commit) irrelevantly to
WAL insertion. This is I think the root cause of this problem.

On the other hand, it ought to take a sleep when network is
stalled, in other words, data to send remains after a flush. We
don't have a means to signal when the socket queue gets a new
room for another bytes. However, I suppose that such slow network
allows us to sleep several or several tens of milliseconds. Or,
if we could know how many bytes ps_flush_if_writable() pushed,
it's enough to wait only when the function returns pushing
nothing.

As the result, I think that the functions should be modified as
the following.

- Forcing slow-path if time elapses a half of a ping period is
  right. (GetCurrentTimestamp is anyway requried.)

- The slow-path should not sleep waiting Latch. It should just
  pg_usleep() for maybe 1-10ms.

- We should go to the fast path just after keepalive or response
  message has been sent. In other words, the "if (now <" block
  should be in the "for (;;)" loop. This avoids needless runs on
  the slow-path.


It would be refactorable as the following.

  prepare for the send buffer;

  for (;;)
  {
now = GetCurrentTimeStamp();
if (now < )...
{
  fast-path
}
else
{
  slow-path
}
return if finished
sleep for 1ms?
  }


What do you think about this?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


Good day, Petr, Kyotaro

I've created failing test for issue (0001-Add-failing-test...) .
It tests insertion of 2 rows with 10ms wal_sender_timeout
(it fails in WalSndWriteData on master) and then deletion of
those rows with 1ms wal_sender_timeout (it fails in WalSndLoop).

Both Peter's patch and my simplified suggestion didn't pass the
test. I didn't checked Kyotaro's suggestion, though, cause I
didn't understand it well.

I've made patch that passes the test (0002-Fix-walsender...) .
(I've used Petr's commit message. Don't you mind, Petr?)

In WalSndWriteData it adds CHECK_FOR_INTERRUPTS to fastpath and
falls through to slow path after half of wal_sender_timeout as
were discussed.
In a slow path, it just skips fast exit on `!pq_is_send_pending()`
and check for timeout for the first loop iteration. And it sets
sleeptime to 1ms even if timeout were reached. It gives a chance
to receiver's response to arrive.

In WalSndLoop it also skips check for timeout first iteration after
send_data were called, and also sleeps at least 1 ms.

I'm not sure about correctness of my patch. Given test exists,
you may suggest better solutions, or improve this solution.

I'll set commitfest topic status to 'Need review' assuming
my patch could be reviewed.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres CompanyFrom c4f7ffb300200d51647a283872a2326350c3f5b9 Mon Sep 17 00:00:00 2001
From: Sokolov Yura 
Date: Tue, 26 Sep 2017 19:28:57 +0300
Subject: [PATCH 1/2] Add failing test: wal_sender_timeout+logical decoding of
 big transaction.

Test case for timeout during sending huge transaction through logical
replication.
https://www.postgresql.org/message-id/flat/e082a56a-fd95-a250-3bae-0fff93832...@2ndquadrant.com#e082a56a-fd95-a250-3bae-0fff93832...@2ndquadrant.com
https://commitfest.postgresql.org/14/1151/
---
 src/test/subscription/t/008_sync_timeout.pl | 93 +
 1 file changed, 93 insertions(+)
 create mode 100644 src/test/subscription/t/008_sync_timeout.pl

diff --git a/src/test/subscription/t/008_sync_timeout.pl b/src/test/subscription/t/008_sync_timeout.pl
new file mode 100644
index 00..ef5363d7e4
--- /dev/null
+++ b/src/test/subscription/t/008_sync_timeout.pl
@@ -0,0 +1,93 @@
+# Tests for logical replication table syncing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 3;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf',
+	"wal_sender_timeout = 10ms");
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = 

Re: [HACKERS] Walsender timeouts and large transactions

2017-09-12 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 06 Sep 2017 13:46:16 +, Yura Sokolov  
wrote in <20170906134616.18925.88390.p...@coridan.postgresql.org>
> I've changed to "need review" to gain more attention from other.

I understand that the problem here is too fast network prohibits
walsender from sending replies.

In physical replication, WAL records are sent as soon as written
and the timeout is handled in the topmost loop in WalSndLoop. In
logical replication, data is sent at once at commit time in most
cases. So it can take a long time in ReorderBufferCommit without
returning to WalSndLoop (or even XLogSendLogical).

One problem here is that WalSndWriteData waits for the arrival of
the next *WAL record* in the slow-ptah because it is called by
cues of ReorderBuffer* functions (mainly *Commit) irrelevantly to
WAL insertion. This is I think the root cause of this problem.

On the other hand, it ought to take a sleep when network is
stalled, in other words, data to send remains after a flush. We
don't have a means to signal when the socket queue gets a new
room for another bytes. However, I suppose that such slow network
allows us to sleep several or several tens of milliseconds. Or,
if we could know how many bytes ps_flush_if_writable() pushed,
it's enough to wait only when the function returns pushing
nothing.

As the result, I think that the functions should be modified as
the following.

- Forcing slow-path if time elapses a half of a ping period is
  right. (GetCurrentTimestamp is anyway requried.)

- The slow-path should not sleep waiting Latch. It should just
  pg_usleep() for maybe 1-10ms.

- We should go to the fast path just after keepalive or response
  message has been sent. In other words, the "if (now <" block
  should be in the "for (;;)" loop. This avoids needless runs on
  the slow-path.


It would be refactorable as the following.

  prepare for the send buffer;

  for (;;)
  {
now = GetCurrentTimeStamp();
if (now < )...
{
  fast-path
}
else
{
  slow-path
}
return if finished
sleep for 1ms?
  }


What do you think about this?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-09-06 Thread Yura Sokolov
I've changed to "need review" to gain more attention from other.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-09-04 Thread Sokolov Yura
On 2017-05-25 17:52 Petr Jelinek wrote:
> Hi,
> 
> We have had issue with walsender timeout when used with logical
> decoding and the transaction is taking long time to be decoded
> (because it contains many changes)
> 
> I was looking today at the walsender code and realized that it's
> because if the network and downstream are fast enough, we'll always
> take fast path in WalSndWriteData which does not do reply or
> keepalive processing and is only reached once the transaction has
> finished by other code. So paradoxically we die of timeout because
> everything was fast enough to never fall back to slow code path.
> 
> I propose we only use fast path if the last processed reply is not
> older than half of walsender timeout, if it is then we'll force the
> slow code path to process the replies again. This is similar logic
> that we use to determine if to send keepalive message. I also added
> CHECK_INTERRUPRS call to fast code path because otherwise walsender
> might ignore them for too long on large transactions.
> 
> Thoughts?
> 

On 2017-08-10 14:20 Sokolov Yura wrote:
> On 2017-08-09 16:23, Petr Jelinek wrote:
> > On 02/08/17 19:35, Yura Sokolov wrote:  
> >> The following review has been posted through the commitfest 
> >> application:
> >> make installcheck-world:  tested, passed
> >> Implements feature:   not tested
> >> Spec compliant:   not tested
> >> Documentation:not tested
> >> 
> >> There is no check for (last_reply_timestamp <= 0 ||
> >> wal_sender_timeout <= 0) as in other places
> >> (in WalSndKeepaliveIfNecessary for example).
> >> 
> >> I don't think moving update of 'now' down to end of loop body is 
> >> correct:
> >> there are calls to ProcessConfigFile with SyncRepInitConfig, 
> >> ProcessRepliesIfAny that can
> >> last non-negligible time. It could lead to over sleeping due to
> >> larger computed sleeptime.
> >> Though I could be mistaken.
> >> 
> >> I'm not sure about moving `if (!pg_is_send_pending())` in a body
> >> loop after WalSndKeepaliveIfNecessary.
> >> Is it necessary? But it looks harmless at least.
> >>   
> > 
> > We also need to do actual timeout handing so that the timeout is not
> > deferred to the end of the transaction (Which is why I moved `if
> > (!pg_is_send_pending())` under WalSndCheckTimeOut() and
> > WalSndKeepaliveIfNecessary() calls).
> >   
> 
> If standby really stalled, then it will not read from socket, and then
> `pg_is_sendpending` eventually will return false, and timeout will be 
> checked.
> If standby doesn't stall, then `last_reply_timestamp` will be updated
> in `ProcessRepliesIfAny`, and so timeout will not be triggered.
> Do I understand correctly, or I missed something?
> 
> >> Could patch be reduced to check after first `if 
> >> (!pg_is_sendpending())` ? like:
> >> 
> >>if (!pq_is_send_pending())
> >> -  return;
> >> +  {
> >> +  if (last_reply_timestamp <= 0 ||
> >> wal_sender_timeout <= 0)
> >> +  {
> >> +  CHECK_FOR_INTERRUPTS();
> >> +  return;
> >> +  }
> >> +  if (now <=
> >> TimestampTzPlusMilliseconds(last_reply_timestamp,
> >> wal_sender_timeout / 2))
> >> +  return;
> >> +  }
> >> 
> >> If not, what problem prevents?  
> > 
> > We should do CHECK_FOR_INTERRUPTS() independently of
> > pq_is_send_pending so that it's possible to stop walsender while
> > it's processing large transaction.  
> 
> In this case CHECK_FOR_INTERRUPTS will be called after 
> wal_sender_timeout/2.
> (This diff is for first appearance of `pq_is_send_pending` in a 
> function).
> 
> If it should be called more often, then patch could be simplier:
> 
>   if (!pq_is_send_pending())
> - return;
> + {
> + CHECK_FOR_INTERRUPTS();
> + if (last_reply_timestamp <= 0 || wal_sender_timeout
> <= 0 ||
> + now <=
> TimestampTzPlusMilliseconds(last_reply_timestamp,
> wal_sender_timeout / 2))
> + return;
> + }
> 
> (Still, I could be mistaken, it is just suggestion).
> 
> Is it hard to add test for case this patch fixes?
> 
> With regards,

Tom, Robert,

I believe this bug have to be fixed in Pg10, and I don't wonna
be that guy who prevents it from being fixed.
What should/could I do?
( https://commitfest.postgresql.org/14/1151/ )

--
With regards,
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-08-10 Thread Sokolov Yura

On 2017-08-09 16:23, Petr Jelinek wrote:

On 02/08/17 19:35, Yura Sokolov wrote:
The following review has been posted through the commitfest 
application:

make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout 
<= 0) as in other places

(in WalSndKeepaliveIfNecessary for example).

I don't think moving update of 'now' down to end of loop body is 
correct:
there are calls to ProcessConfigFile with SyncRepInitConfig, 
ProcessRepliesIfAny that can
last non-negligible time. It could lead to over sleeping due to larger 
computed sleeptime.

Though I could be mistaken.

I'm not sure about moving `if (!pg_is_send_pending())` in a body loop 
after WalSndKeepaliveIfNecessary.

Is it necessary? But it looks harmless at least.



We also need to do actual timeout handing so that the timeout is not
deferred to the end of the transaction (Which is why I moved `if
(!pg_is_send_pending())` under WalSndCheckTimeOut() and
WalSndKeepaliveIfNecessary() calls).



If standby really stalled, then it will not read from socket, and then
`pg_is_sendpending` eventually will return false, and timeout will be 
checked.

If standby doesn't stall, then `last_reply_timestamp` will be updated in
`ProcessRepliesIfAny`, and so timeout will not be triggered.
Do I understand correctly, or I missed something?

Could patch be reduced to check after first `if 
(!pg_is_sendpending())` ? like:


if (!pq_is_send_pending())
-   return;
+   {
+   if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0)
+   {
+   CHECK_FOR_INTERRUPTS();
+   return;
+   }
+		if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, 
wal_sender_timeout / 2))

+   return;
+   }

If not, what problem prevents?


We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending
so that it's possible to stop walsender while it's processing large
transaction.


In this case CHECK_FOR_INTERRUPTS will be called after 
wal_sender_timeout/2.
(This diff is for first appearance of `pq_is_send_pending` in a 
function).


If it should be called more often, then patch could be simplier:

if (!pq_is_send_pending())
-   return;
+   {
+   CHECK_FOR_INTERRUPTS();
+   if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0 ||
+now <= TimestampTzPlusMilliseconds(last_reply_timestamp, 
wal_sender_timeout / 2))

+   return;
+   }

(Still, I could be mistaken, it is just suggestion).

Is it hard to add test for case this patch fixes?

With regards,
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-08-09 Thread Craig Ringer
On 9 August 2017 at 21:23, Petr Jelinek 
wrote:

> On 02/08/17 19:35, Yura Sokolov wrote:
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, passed
> > Implements feature:   not tested
> > Spec compliant:   not tested
> > Documentation:not tested
> >
> > There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout
> <= 0) as in other places
> > (in WalSndKeepaliveIfNecessary for example).
> >
> > I don't think moving update of 'now' down to end of loop body is correct:
> > there are calls to ProcessConfigFile with SyncRepInitConfig,
> ProcessRepliesIfAny that can
> > last non-negligible time. It could lead to over sleeping due to larger
> computed sleeptime.
> > Though I could be mistaken.
> >
> > I'm not sure about moving `if (!pg_is_send_pending())` in a body loop
> after WalSndKeepaliveIfNecessary.
> > Is it necessary? But it looks harmless at least.
> >
>
> We also need to do actual timeout handing so that the timeout is not
> deferred to the end of the transaction (Which is why I moved `if
> (!pg_is_send_pending())` under WalSndCheckTimeOut() and
> WalSndKeepaliveIfNecessary() calls).
>
> > Could patch be reduced to check after first `if (!pg_is_sendpending())`
> ? like:
> >
> >   if (!pq_is_send_pending())
> > - return;
> > + {
> > + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0)
> > + {
> > + CHECK_FOR_INTERRUPTS();
> > + return;
> > + }
> > + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp,
> wal_sender_timeout / 2))
> > + return;
> > + }
> >
> > If not, what problem prevents?
>
> We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending
> so that it's possible to stop walsender while it's processing large
> transaction.
>
>
Is there any chance of getting this bugfix into Pg 10?

We've just cut back branches, so it'd be a sensible time.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Walsender timeouts and large transactions

2017-08-09 Thread Petr Jelinek
On 02/08/17 19:35, Yura Sokolov wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:not tested
> 
> There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) 
> as in other places
> (in WalSndKeepaliveIfNecessary for example).
> 
> I don't think moving update of 'now' down to end of loop body is correct:
> there are calls to ProcessConfigFile with SyncRepInitConfig, 
> ProcessRepliesIfAny that can
> last non-negligible time. It could lead to over sleeping due to larger 
> computed sleeptime.
> Though I could be mistaken.
> 
> I'm not sure about moving `if (!pg_is_send_pending())` in a body loop after 
> WalSndKeepaliveIfNecessary.
> Is it necessary? But it looks harmless at least.
> 

We also need to do actual timeout handing so that the timeout is not
deferred to the end of the transaction (Which is why I moved `if
(!pg_is_send_pending())` under WalSndCheckTimeOut() and
WalSndKeepaliveIfNecessary() calls).

> Could patch be reduced to check after first `if (!pg_is_sendpending())` ? 
> like:
> 
>   if (!pq_is_send_pending())
> - return;
> + {
> + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0)
> + {
> + CHECK_FOR_INTERRUPTS();
> + return;
> + }
> + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, 
> wal_sender_timeout / 2))
> + return;
> + }
> 
> If not, what problem prevents?

We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending
so that it's possible to stop walsender while it's processing large
transaction.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-08-02 Thread Yura Sokolov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) as 
in other places
(in WalSndKeepaliveIfNecessary for example).

I don't think moving update of 'now' down to end of loop body is correct:
there are calls to ProcessConfigFile with SyncRepInitConfig, 
ProcessRepliesIfAny that can
last non-negligible time. It could lead to over sleeping due to larger computed 
sleeptime.
Though I could be mistaken.

I'm not sure about moving `if (!pg_is_send_pending())` in a body loop after 
WalSndKeepaliveIfNecessary.
Is it necessary? But it looks harmless at least.

Could patch be reduced to check after first `if (!pg_is_sendpending())` ? like:

if (!pq_is_send_pending())
-   return;
+   {
+   if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0)
+   {
+   CHECK_FOR_INTERRUPTS();
+   return;
+   }
+   if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, 
wal_sender_timeout / 2))
+   return;
+   }

If not, what problem prevents?

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-06-25 Thread Petr Jelinek
On 30/05/17 15:44, Petr Jelinek wrote:
> On 30/05/17 11:02, Kyotaro HORIGUCHI wrote:
>>
>> +TimestampTz now = GetCurrentTimestamp();
>>
>> I think it is not recommended to read the current time too
>> frequently, especially within a loop that hates slowness. (I
>> suppose that a loop that can fill up a send queue falls into that
> 
> Yeah that was my main worry for the patch as well, although given that
> the loop does tuple processing it might not be very noticeable.
> 

I realized we actually call GetCurrentTimestamp() there anyway (for the
pq_sendint64). So I just modified the patch to use the now variable
there instead which means there are no additional GetCurrentTimestamp()
calls compared to state before patch now.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From c6e9cdb81311371662c580db42a165ab0575d951 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Thu, 25 May 2017 17:35:24 +0200
Subject: [PATCH] Fix walsender timeouts when decoding large transaction

The logical slots have fast code path for sending data in order to not
impose too high per message overhead. The fast path skips checks for
interrupts and timeouts. However the fast path failed to consider the
fact that transaction with large number of changes may take very long to
be processed and sent to the client. This causes walsender to ignore
interrupts for potentially long time but more importantly it will cause
walsender being killed due to timeout at the end of such transaction.

This commit changes the fast path to also check for interrupts and only
allows calling the fast path when last keeplaive check happened less
than half of walsender timeout ago, otherwise the slower code path will
be taken.
---
 src/backend/replication/walsender.c | 36 ++--
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index f845180..278f882 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1160,6 +1160,8 @@ static void
 WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 bool last_write)
 {
+	TimestampTz	now = GetCurrentTimestamp();
+
 	/* output previously gathered data in a CopyData packet */
 	pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
 
@@ -1169,23 +1171,28 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 	 * several releases by streaming physical replication.
 	 */
 	resetStringInfo();
-	pq_sendint64(, GetCurrentTimestamp());
+	pq_sendint64(, now);
 	memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)],
 		   tmpbuf.data, sizeof(int64));
 
-	/* fast path */
-	/* Try to flush pending output to the client */
-	if (pq_flush_if_writable() != 0)
-		WalSndShutdown();
+	/* Try taking fast path unless we get too close to walsender timeout. */
+	if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
+		  wal_sender_timeout / 2))
+	{
+		CHECK_FOR_INTERRUPTS();
 
-	if (!pq_is_send_pending())
-		return;
+		/* Try to flush pending output to the client */
+		if (pq_flush_if_writable() != 0)
+			WalSndShutdown();
+
+		if (!pq_is_send_pending())
+			return;
+	}
 
 	for (;;)
 	{
 		int			wakeEvents;
 		long		sleeptime;
-		TimestampTz now;
 
 		/*
 		 * Emergency bailout if postmaster has died.  This is to avoid the
@@ -1214,18 +1221,16 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 		if (pq_flush_if_writable() != 0)
 			WalSndShutdown();
 
-		/* If we finished clearing the buffered data, we're done here. */
-		if (!pq_is_send_pending())
-			break;
-
-		now = GetCurrentTimestamp();
-
 		/* die if timeout was reached */
 		WalSndCheckTimeOut(now);
 
 		/* Send keepalive if the time has come */
 		WalSndKeepaliveIfNecessary(now);
 
+		/* If we finished clearing the buffered data, we're done here. */
+		if (!pq_is_send_pending())
+			break;
+
 		sleeptime = WalSndComputeSleeptime(now);
 
 		wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
@@ -1235,6 +1240,9 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 		WaitLatchOrSocket(MyLatch, wakeEvents,
 		  MyProcPort->sock, sleeptime,
 		  WAIT_EVENT_WAL_SENDER_WRITE_DATA);
+
+		/* Update our idea of current time for the next cycle. */
+		now = GetCurrentTimestamp();
 	}
 
 	/* reactivate latch so WalSndLoop knows to continue */
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-05-30 Thread Petr Jelinek
On 30/05/17 11:02, Kyotaro HORIGUCHI wrote:
> At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek 
>  wrote in 
> 
>> Hi,
>>
>> We have had issue with walsender timeout when used with logical decoding
>> and the transaction is taking long time to be decoded (because it
>> contains many changes)
>>
>> I was looking today at the walsender code and realized that it's because
>> if the network and downstream are fast enough, we'll always take fast
>> path in WalSndWriteData which does not do reply or keepalive processing
>> and is only reached once the transaction has finished by other code. So
>> paradoxically we die of timeout because everything was fast enough to
>> never fall back to slow code path.
>>
>> I propose we only use fast path if the last processed reply is not older
>> than half of walsender timeout, if it is then we'll force the slow code
>> path to process the replies again. This is similar logic that we use to
>> determine if to send keepalive message. I also added CHECK_INTERRUPRS
>> call to fast code path because otherwise walsender might ignore them for
>> too long on large transactions.
>>
>> Thoughts?
> 
> + TimestampTz now = GetCurrentTimestamp();
> 
> I think it is not recommended to read the current time too
> frequently, especially within a loop that hates slowness. (I
> suppose that a loop that can fill up a send queue falls into that

Yeah that was my main worry for the patch as well, although given that
the loop does tuple processing it might not be very noticeable.

> category.)  If you don't mind a certain amount of additional
> complexity for eliminating the possible slowdown by the check,
> timeout would be usable. Attached patch does almost the same
> thing with your patch but without busy time check.
> 
> What do you think about this?
> 

I think we could do it that way.

> # I saw that SIGQUIT doens't work for active publisher, which I
> # think mention in another thread.

Ah missed that email I guess, but that's the missing CHECK_INTERRUPTS();
in the fast-path which btw your updated patch is missing as well.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-05-30 Thread Kyotaro HORIGUCHI
At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek  
wrote in 
> Hi,
> 
> We have had issue with walsender timeout when used with logical decoding
> and the transaction is taking long time to be decoded (because it
> contains many changes)
> 
> I was looking today at the walsender code and realized that it's because
> if the network and downstream are fast enough, we'll always take fast
> path in WalSndWriteData which does not do reply or keepalive processing
> and is only reached once the transaction has finished by other code. So
> paradoxically we die of timeout because everything was fast enough to
> never fall back to slow code path.
> 
> I propose we only use fast path if the last processed reply is not older
> than half of walsender timeout, if it is then we'll force the slow code
> path to process the replies again. This is similar logic that we use to
> determine if to send keepalive message. I also added CHECK_INTERRUPRS
> call to fast code path because otherwise walsender might ignore them for
> too long on large transactions.
> 
> Thoughts?

+   TimestampTz now = GetCurrentTimestamp();

I think it is not recommended to read the current time too
frequently, especially within a loop that hates slowness. (I
suppose that a loop that can fill up a send queue falls into that
category.)  If you don't mind a certain amount of additional
complexity for eliminating the possible slowdown by the check,
timeout would be usable. Attached patch does almost the same
thing with your patch but without busy time check.

What do you think about this?

# I saw that SIGQUIT doens't work for active publisher, which I
# think mention in another thread.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 49cce38..ec33357 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -194,6 +194,15 @@ static volatile sig_atomic_t replication_active = false;
 static LogicalDecodingContext *logical_decoding_ctx = NULL;
 static XLogRecPtr logical_startptr = InvalidXLogRecPtr;
 
+/*
+ * For logical replication, WalSndWriteData needs to process replies from the
+ * client to check if keepalive to be sent.  WAL send loop may skip check
+ * replies only while check_replies_needed = false.
+ */
+#define REPLY_SEND_TIMEOUT USER_TIMEOUT
+static bool	keepalive_timeout_initialized = false;
+static bool	check_replies_needed = false;
+
 /* A sample associating a WAL location with the time it was written. */
 typedef struct
 {
@@ -1175,12 +1184,12 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 	memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)],
 		   tmpbuf.data, sizeof(int64));
 
-	/* fast path */
 	/* Try to flush pending output to the client */
 	if (pq_flush_if_writable() != 0)
 		WalSndShutdown();
 
-	if (!pq_is_send_pending())
+ 	/* fast path: return immediately if possible */
+	if (!check_replies_needed && !pq_is_send_pending())
 		return;
 
 	for (;;)
@@ -1216,10 +1225,6 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 		if (pq_flush_if_writable() != 0)
 			WalSndShutdown();
 
-		/* If we finished clearing the buffered data, we're done here. */
-		if (!pq_is_send_pending())
-			break;
-
 		now = GetCurrentTimestamp();
 
 		/* die if timeout was reached */
@@ -1228,6 +1233,10 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 		/* Send keepalive if the time has come */
 		WalSndKeepaliveIfNecessary(now);
 
+		/* If we finished clearing the buffered data, we're done here. */
+		if (!pq_is_send_pending())
+			break;
+
 		sleeptime = WalSndComputeSleeptime(now);
 
 		wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
@@ -1562,6 +1571,13 @@ exec_replication_command(const char *cmd_string)
 	return true;
 }
 
+static void
+LogicalDecodeReplyTimeoutHandler(void)
+{
+	check_replies_needed = true;
+}
+
+
 /*
  * Process any incoming messages while streaming. Also checks if the remote
  * end has closed the connection.
@@ -1662,6 +1678,22 @@ ProcessRepliesIfAny(void)
 	{
 		last_reply_timestamp = GetCurrentTimestamp();
 		waiting_for_ping_response = false;
+
+		if (wal_sender_timeout > 0)
+		{
+			if (!keepalive_timeout_initialized)
+			{
+RegisterTimeout(REPLY_SEND_TIMEOUT,
+LogicalDecodeReplyTimeoutHandler);
+keepalive_timeout_initialized = true;
+			}
+
+			check_replies_needed = false;
+			enable_timeout_at(REPLY_SEND_TIMEOUT,
+		  TimestampTzPlusMilliseconds(last_reply_timestamp,
+	  wal_sender_timeout / 2));
+		}
+		
 	}
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers