Re: [HACKERS] Walsender timeouts and large transactions
Hi, thanks for checking. On 02/11/17 10:00, Robert Haas wrote: > On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek > wrote: >> sorry for the delay but I didn't have much time in past weeks to follow >> this thread. > > +TimestampTz now = GetCurrentTimestamp(); > + > /* output previously gathered data in a CopyData packet */ > pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); > > /* > * Fill the send timestamp last, so that it is taken as late as possible. > * This is somewhat ugly, but the protocol is set as it's already used > for > * several releases by streaming physical replication. > */ > resetStringInfo(&tmpbuf); > -pq_sendint64(&tmpbuf, GetCurrentTimestamp()); > +pq_sendint64(&tmpbuf, now); > memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)], > tmpbuf.data, sizeof(int64)); > > This change falsifies the comments. Maybe initialize now just after > resetSetringInfo() is done. Eh, right, I can do that. > > -/* fast path */ > -/* Try to flush pending output to the client */ > -if (pq_flush_if_writable() != 0) > -WalSndShutdown(); > +/* Try taking fast path unless we get too close to walsender timeout. */ > +if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, > + wal_sender_timeout / 2)) > +{ > +CHECK_FOR_INTERRUPTS(); > > -if (!pq_is_send_pending()) > -return; > +/* Try to flush pending output to the client */ > +if (pq_flush_if_writable() != 0) > +WalSndShutdown(); > + > +if (!pq_is_send_pending()) > +return; > +} > > I think it's only the if (!pq_is_send_pending()) return; that needs to > be conditional here, isn't it? The pq_flush_if_writable() can be done > unconditionally. > Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes. It just seems like it's needless call as we'll call both in for loop anyway if we take the "slow" path. I admit it's not exactly big win though. If you think it would improve readability I can move it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek wrote: > sorry for the delay but I didn't have much time in past weeks to follow > this thread. +TimestampTz now = GetCurrentTimestamp(); + /* output previously gathered data in a CopyData packet */ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); /* * Fill the send timestamp last, so that it is taken as late as possible. * This is somewhat ugly, but the protocol is set as it's already used for * several releases by streaming physical replication. */ resetStringInfo(&tmpbuf); -pq_sendint64(&tmpbuf, GetCurrentTimestamp()); +pq_sendint64(&tmpbuf, now); memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)], tmpbuf.data, sizeof(int64)); This change falsifies the comments. Maybe initialize now just after resetSetringInfo() is done. -/* fast path */ -/* Try to flush pending output to the client */ -if (pq_flush_if_writable() != 0) -WalSndShutdown(); +/* Try taking fast path unless we get too close to walsender timeout. */ +if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, + wal_sender_timeout / 2)) +{ +CHECK_FOR_INTERRUPTS(); -if (!pq_is_send_pending()) -return; +/* Try to flush pending output to the client */ +if (pq_flush_if_writable() != 0) +WalSndShutdown(); + +if (!pq_is_send_pending()) +return; +} I think it's only the if (!pq_is_send_pending()) return; that needs to be conditional here, isn't it? The pq_flush_if_writable() can be done unconditionally. Other than that this looks like a reasonable change to me, but I'm not an expert on this code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
Hi, sorry for the delay but I didn't have much time in past weeks to follow this thread. On 02/10/17 05:44, Kyotaro HORIGUCHI wrote: > Hello Sokolov. > > At Fri, 29 Sep 2017 15:19:23 +0300, Sokolov Yura > wrote in > >> I don't want to make test to lasts so long and generate so many data. >> That is why I used such small timeouts for tests. > > I understand your point, but still *I* think such a short timeout > is out of expectation by design. (But it can be set.) > > Does anyone have opinions on this? Yes, it's not practically useful to have such small wal_sender_timeout given that the main purpose of that is to detect network issues. > >> Test is failing if there is "short quit" after >> `!pq_is_send_pending()`, >> so I doubt your patch will pass the test. > > It is because I think that the test "should" fail since the > timeout is out of expected range. I (and perhaps also Petr) is > thinking that the problem is just that a large transaction causes > a timeout with an ordinary timeout. My test case is based on the > assumption. > > Your test is for a timeout during replication-startup with > extremely short timeout. This may be a different problem to > discuss, but perhaps better to be solved together. > > I'd like to have opnions from others on this point. I think it's different problem and because of what I wrote above it does not seem to me like something we should spend out time on trying to fix. > > Uggh! I misunderstood there. It wais for writing socket so the > sleep is wrong and WaitLatchOrSocket is right. > > After all, I put +1 for Petr's latest patch. Sorry for my > carelessness. > Great, I attached version 3 which is just rebased on current master and also does not move the GetCurrentTimestamp() call because the concern about skewing the timestamp during config reload (and also network flush as I realized later) is valid. It's rather hard to test all the scenarios that this patch fixes in automated fashion without generating a lot of wal or adding sleeps to the processing. That's why I didn't produce usable TAP test. Since it seems like some of my reasoning is unclear I will try to describe it once more. The main problem we have is that unless we call the ProcessRepliesIfAny() before the wal_sender_timeout expires we'll die because of timeout eventually. The current coding will skip that call if there is a long transaction being processed (if network is fast enough). This is what the first part (first 2 hunks) of the patch solves. There is also issue that while this is happening the walsender ignores signals so it's impossible to stop it which is why I added the CHECK_FOR_INTERRUPTS to the fast path. The second problem is that if we solved just the first one, then if downstream (and again network) is fast enough, the ProcessRepliesIfAny() will not do anything useful because downstream is not going to send any response while the network buffer contains any data. This is caused by the fact that we normally code the receiver side to receive until there is something and only send reply when there is a "pause" in the stream. To get around this problem we also need to make sure that we send WalSndKeepaliveIfNecessary() periodically and that will not happen on fast network unless we do the second part of the patch (3rd hunk), ie, move the pq_is_send_pending() after the keepalive handling. This code is specific to logical decoding walsender interface so it only happens for logical decoding/replication (which means it should be backported all the way to 9.4). The physical one These two issues happen quite normally in the wild as all we need is big data load in single transaction, or update of large part of an table or something similar for this to happen with default settings. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From df2d5b09a74cb31537e2bb74895a8e31febce5f8 Mon Sep 17 00:00:00 2001 From: Petr Jelinek Date: Tue, 31 Oct 2017 14:00:37 +0100 Subject: [PATCH] Fix walsender timeouts when decoding large transaction The logical slots have fast code path for sending data in order to not impose too high per message overhead. The fast path skips checks for interrupts and timeouts. However, the existing coding failed to consider the fact that transaction with large number of changes may take very long to be processed and sent to the client. This causes walsender to ignore interrupts for potentially long time and more importantly it will cause walsender being killed due to timeout at the end of such transaction. This commit changes the fast path to also check for interrupts and only allows calling the fast path when last keeplaive check happened less than half of walsender timeout ago, otherwise the slower code path will be taken. --- src/backend/replication/walsender.c | 31 +++ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/src/backend/replication/wal
Re: [HACKERS] Walsender timeouts and large transactions
Hello Sokolov. At Fri, 29 Sep 2017 15:19:23 +0300, Sokolov Yura wrote in > I don't want to make test to lasts so long and generate so many data. > That is why I used such small timeouts for tests. I understand your point, but still *I* think such a short timeout is out of expectation by design. (But it can be set.) Does anyone have opinions on this? > Test is failing if there is "short quit" after > `!pq_is_send_pending()`, > so I doubt your patch will pass the test. It is because I think that the test "should" fail since the timeout is out of expected range. I (and perhaps also Petr) is thinking that the problem is just that a large transaction causes a timeout with an ordinary timeout. My test case is based on the assumption. Your test is for a timeout during replication-startup with extremely short timeout. This may be a different problem to discuss, but perhaps better to be solved together. I'd like to have opnions from others on this point. > And you've change calculated sleep time with sane waiting on all > insteresting events (using WaitLatchOrSocket) to semi-busy loop. > It at least could affect throughput. Uggh! I misunderstood there. It wais for writing socket so the sleep is wrong and WaitLatchOrSocket is right. After all, I put +1 for Petr's latest patch. Sorry for my carelessness. > And why did you remove `SetLatch(MyLatch)` in the end of function? > Probably this change is correct, but not obvious. Latch is needless there if it waited a fixed duration, but if it waits writefd events there, also latch should be waited. > > Any thoughts? > > It certainly could be my test and my patch is wrong. But my point > is that test should be written first. Especially for such difficult > case. Without test it is impossible to say does our patches fix > something. And it is impossible to say if patch does something > wrong. And impossible to say if patch fixes this problem but > introduce new problem. > > Please, write test for your remarks. If you think, my patch breaks > something, write test for the case my patch did broke. If you think > my test is wrong, write your test that is more correct. > > Without tests it will be just bird's hubbub. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
Good day, Kyoutaro On 2017-09-29 11:26, Kyotaro HORIGUCHI wrote: Hello, At Wed, 27 Sep 2017 14:28:37 +0300, Sokolov Yura wrote in <90bb67da7131e6186b50897c4b0f0...@postgrespro.ru> On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote: > Hello, > At Wed, 06 Sep 2017 13:46:16 +, Yura Sokolov > wrote in > <20170906134616.18925.88390.p...@coridan.postgresql.org> > As the result, I think that the functions should be modified as > the following. > - Forcing slow-path if time elapses a half of a ping period is > right. (GetCurrentTimestamp is anyway requried.) > - The slow-path should not sleep waiting Latch. It should just > pg_usleep() for maybe 1-10ms. > - We should go to the fast path just after keepalive or response > message has been sent. In other words, the "if (now <" block > should be in the "for (;;)" loop. This avoids needless runs on > the slow-path. > It would be refactorable as the following. > prepare for the send buffer; > for (;;) > { > now = GetCurrentTimeStamp(); > if (now < )... > { > fast-path > } > else > { > slow-path > } > return if finished > sleep for 1ms? > } > What do you think about this? > regards, > -- > Kyotaro Horiguchi > NTT Open Source Software Center Good day, Petr, Kyotaro I've created failing test for issue (0001-Add-failing-test...) . It tests insertion of 2 rows with 10ms wal_sender_timeout (it fails in WalSndWriteData on master) and then deletion of those rows with 1ms wal_sender_timeout (it fails in WalSndLoop). Both Peter's patch and my simplified suggestion didn't pass the test. I didn't checked Kyotaro's suggestion, though, cause I didn't understand it well. Mmm. The test seems broken. wal_sender_timeout = 10ms with wal_receiver_status_interval=10s immediately causes a timeout. Avoiding the timeout is just breaking the sane code. I think you're not right. `wal_receiver_status_interval` is just for status update, not for replies. Before I made my patch version, I've added logging to every `now` and `last_reply_timestamp` during test run. `last_reply_timestamp` definitely updates more often than once in 10s. So, `wal_receiver_status_interval = 10s` has nothing in common with receiver's replies, as I see. (btw, logging slows down sender enough to "fix" test :-) ) And my patch doesn't avoid timeout check, so it doesn't break anything. It just delays timeout on 1ms. Given, it is unpractical to set wal_sender_timeout less than 50ms in production, this 1ms increase in timeout check is negligible. And I've checked just now that my patch passes test even with wal_receiver_status_interval = 10s. wal_sender_timeout = 3s and wal_receiver_status_interval=1s effectively causes the problem with about 100 lines of (int) insertion on UNIX socket connection, on my poor box. I don't want to make test to lasts so long and generate so many data. That is why I used such small timeouts for tests. The original complain here came from the fact that WalSndWriteData skips processing of replies for a long time on a fast network. However Petr's patch fixed the problem, I pointed that just letting the function take the slow path leads to another problem, that is, waiting for new WAL records can result in a unwanted pause in the slow path. Combining the solutions for the two problem is my proposal sited above. The sentence seems in quite bad style but the attached file is the concorete patch of that. Test is failing if there is "short quit" after `!pq_is_send_pending()`, so I doubt your patch will pass the test. And you've change calculated sleep time with sane waiting on all insteresting events (using WaitLatchOrSocket) to semi-busy loop. It at least could affect throughput. And why did you remove `SetLatch(MyLatch)` in the end of function? Probably this change is correct, but not obvious. Any thoughts? It certainly could be my test and my patch is wrong. But my point is that test should be written first. Especially for such difficult case. Without test it is impossible to say does our patches fix something. And it is impossible to say if patch does something wrong. And impossible to say if patch fixes this problem but introduce new problem. Please, write test for your remarks. If you think, my patch breaks something, write test for the case my patch did broke. If you think my test is wrong, write your test that is more correct. Without tests it will be just bird's hubbub. regards, regards, -- Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
Hello, At Wed, 27 Sep 2017 14:28:37 +0300, Sokolov Yura wrote in <90bb67da7131e6186b50897c4b0f0...@postgrespro.ru> > On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote: > > Hello, > > At Wed, 06 Sep 2017 13:46:16 +, Yura Sokolov > > wrote in > > <20170906134616.18925.88390.p...@coridan.postgresql.org> > > As the result, I think that the functions should be modified as > > the following. > > - Forcing slow-path if time elapses a half of a ping period is > > right. (GetCurrentTimestamp is anyway requried.) > > - The slow-path should not sleep waiting Latch. It should just > > pg_usleep() for maybe 1-10ms. > > - We should go to the fast path just after keepalive or response > > message has been sent. In other words, the "if (now <" block > > should be in the "for (;;)" loop. This avoids needless runs on > > the slow-path. > > It would be refactorable as the following. > > prepare for the send buffer; > > for (;;) > > { > > now = GetCurrentTimeStamp(); > > if (now < )... > > { > > fast-path > > } > > else > > { > > slow-path > > } > > return if finished > > sleep for 1ms? > > } > > What do you think about this? > > regards, > > -- > > Kyotaro Horiguchi > > NTT Open Source Software Center > > Good day, Petr, Kyotaro > > I've created failing test for issue (0001-Add-failing-test...) . > It tests insertion of 2 rows with 10ms wal_sender_timeout > (it fails in WalSndWriteData on master) and then deletion of > those rows with 1ms wal_sender_timeout (it fails in WalSndLoop). > > Both Peter's patch and my simplified suggestion didn't pass the > test. I didn't checked Kyotaro's suggestion, though, cause I > didn't understand it well. Mmm. The test seems broken. wal_sender_timeout = 10ms with wal_receiver_status_interval=10s immediately causes a timeout. Avoiding the timeout is just breaking the sane code. wal_sender_timeout = 3s and wal_receiver_status_interval=1s effectively causes the problem with about 100 lines of (int) insertion on UNIX socket connection, on my poor box. The original complain here came from the fact that WalSndWriteData skips processing of replies for a long time on a fast network. However Petr's patch fixed the problem, I pointed that just letting the function take the slow path leads to another problem, that is, waiting for new WAL records can result in a unwanted pause in the slow path. Combining the solutions for the two problem is my proposal sited above. The sentence seems in quite bad style but the attached file is the concorete patch of that. Any thoughts? regards, -- Kyotaro Horiguchi NTT Open Source Software Center *** a/src/backend/replication/walsender.c --- b/src/backend/replication/walsender.c *** *** 1151,1156 static void --- 1151,1158 WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write) { + TimestampTz now = GetCurrentTimestamp(); + /* output previously gathered data in a CopyData packet */ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); *** *** 1160,1235 WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, * several releases by streaming physical replication. */ resetStringInfo(&tmpbuf); ! pq_sendint64(&tmpbuf, GetCurrentTimestamp()); memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)], tmpbuf.data, sizeof(int64)); - /* fast path */ - /* Try to flush pending output to the client */ - if (pq_flush_if_writable() != 0) - WalSndShutdown(); - - if (!pq_is_send_pending()) - return; - for (;;) { int wakeEvents; long sleeptime; TimestampTz now; /* ! * Emergency bailout if postmaster has died. This is to avoid the ! * necessity for manual cleanup of all postmaster children. */ ! if (!PostmasterIsAlive()) ! exit(1); ! ! /* Clear any already-pending wakeups */ ! ResetLatch(MyLatch); ! ! CHECK_FOR_INTERRUPTS(); ! ! /* Process any requests or signals received recently */ ! if (ConfigReloadPending) ! { ! ConfigReloadPending = false; ! ProcessConfigFile(PGC_SIGHUP); ! SyncRepInitConfig(); ! } ! ! /* Check for input from the client */ ! ProcessRepliesIfAny(); ! ! /* Try to flush pending output to the client */ ! if (pq_flush_if_writable() != 0) ! WalSndShutdown(); ! ! /* If we finished clearing the buffered data, we're done here. */ ! if (!pq_is_send_pending()) ! break; ! now = GetCurrentTimestamp(); - - /* die if timeout was reached */ - WalSndCheckTimeOut(now); - - /* Send keepalive if the time has come */ - WalSndKeepaliveIfNecessary(now); - - sleeptime = WalSndComputeSleeptime(now); - - wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | - WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT; - - /* Sleep until something happens or we time out */ - WaitLatchOrSocket(MyLatch, wakeEvents, - MyProcPort->s
Re: [HACKERS] Walsender timeouts and large transactions
On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote: Hello, At Wed, 06 Sep 2017 13:46:16 +, Yura Sokolov wrote in <20170906134616.18925.88390.p...@coridan.postgresql.org> I've changed to "need review" to gain more attention from other. I understand that the problem here is too fast network prohibits walsender from sending replies. In physical replication, WAL records are sent as soon as written and the timeout is handled in the topmost loop in WalSndLoop. In logical replication, data is sent at once at commit time in most cases. So it can take a long time in ReorderBufferCommit without returning to WalSndLoop (or even XLogSendLogical). One problem here is that WalSndWriteData waits for the arrival of the next *WAL record* in the slow-ptah because it is called by cues of ReorderBuffer* functions (mainly *Commit) irrelevantly to WAL insertion. This is I think the root cause of this problem. On the other hand, it ought to take a sleep when network is stalled, in other words, data to send remains after a flush. We don't have a means to signal when the socket queue gets a new room for another bytes. However, I suppose that such slow network allows us to sleep several or several tens of milliseconds. Or, if we could know how many bytes ps_flush_if_writable() pushed, it's enough to wait only when the function returns pushing nothing. As the result, I think that the functions should be modified as the following. - Forcing slow-path if time elapses a half of a ping period is right. (GetCurrentTimestamp is anyway requried.) - The slow-path should not sleep waiting Latch. It should just pg_usleep() for maybe 1-10ms. - We should go to the fast path just after keepalive or response message has been sent. In other words, the "if (now <" block should be in the "for (;;)" loop. This avoids needless runs on the slow-path. It would be refactorable as the following. prepare for the send buffer; for (;;) { now = GetCurrentTimeStamp(); if (now < )... { fast-path } else { slow-path } return if finished sleep for 1ms? } What do you think about this? regards, -- Kyotaro Horiguchi NTT Open Source Software Center Good day, Petr, Kyotaro I've created failing test for issue (0001-Add-failing-test...) . It tests insertion of 2 rows with 10ms wal_sender_timeout (it fails in WalSndWriteData on master) and then deletion of those rows with 1ms wal_sender_timeout (it fails in WalSndLoop). Both Peter's patch and my simplified suggestion didn't pass the test. I didn't checked Kyotaro's suggestion, though, cause I didn't understand it well. I've made patch that passes the test (0002-Fix-walsender...) . (I've used Petr's commit message. Don't you mind, Petr?) In WalSndWriteData it adds CHECK_FOR_INTERRUPTS to fastpath and falls through to slow path after half of wal_sender_timeout as were discussed. In a slow path, it just skips fast exit on `!pq_is_send_pending()` and check for timeout for the first loop iteration. And it sets sleeptime to 1ms even if timeout were reached. It gives a chance to receiver's response to arrive. In WalSndLoop it also skips check for timeout first iteration after send_data were called, and also sleeps at least 1 ms. I'm not sure about correctness of my patch. Given test exists, you may suggest better solutions, or improve this solution. I'll set commitfest topic status to 'Need review' assuming my patch could be reviewed. -- Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres CompanyFrom c4f7ffb300200d51647a283872a2326350c3f5b9 Mon Sep 17 00:00:00 2001 From: Sokolov Yura Date: Tue, 26 Sep 2017 19:28:57 +0300 Subject: [PATCH 1/2] Add failing test: wal_sender_timeout+logical decoding of big transaction. Test case for timeout during sending huge transaction through logical replication. https://www.postgresql.org/message-id/flat/e082a56a-fd95-a250-3bae-0fff93832...@2ndquadrant.com#e082a56a-fd95-a250-3bae-0fff93832...@2ndquadrant.com https://commitfest.postgresql.org/14/1151/ --- src/test/subscription/t/008_sync_timeout.pl | 93 + 1 file changed, 93 insertions(+) create mode 100644 src/test/subscription/t/008_sync_timeout.pl diff --git a/src/test/subscription/t/008_sync_timeout.pl b/src/test/subscription/t/008_sync_timeout.pl new file mode 100644 index 00..ef5363d7e4 --- /dev/null +++ b/src/test/subscription/t/008_sync_timeout.pl @@ -0,0 +1,93 @@ +# Tests for logical replication table syncing +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More tests => 3; + +# Initialize publisher node +my $node_publisher = get_new_node('publisher'); +$node_publisher->init(allows_streaming => 'logical'); +$node_publisher->append_conf('postgresql.conf', + "wal_sender_timeout = 10ms"); +$node_publisher->start; + +# Create subscriber node +my $node_subscriber = get_new_node('subscriber'); +$node_subscriber->init(allows_streaming =>
Re: [HACKERS] Walsender timeouts and large transactions
Hello, At Wed, 06 Sep 2017 13:46:16 +, Yura Sokolov wrote in <20170906134616.18925.88390.p...@coridan.postgresql.org> > I've changed to "need review" to gain more attention from other. I understand that the problem here is too fast network prohibits walsender from sending replies. In physical replication, WAL records are sent as soon as written and the timeout is handled in the topmost loop in WalSndLoop. In logical replication, data is sent at once at commit time in most cases. So it can take a long time in ReorderBufferCommit without returning to WalSndLoop (or even XLogSendLogical). One problem here is that WalSndWriteData waits for the arrival of the next *WAL record* in the slow-ptah because it is called by cues of ReorderBuffer* functions (mainly *Commit) irrelevantly to WAL insertion. This is I think the root cause of this problem. On the other hand, it ought to take a sleep when network is stalled, in other words, data to send remains after a flush. We don't have a means to signal when the socket queue gets a new room for another bytes. However, I suppose that such slow network allows us to sleep several or several tens of milliseconds. Or, if we could know how many bytes ps_flush_if_writable() pushed, it's enough to wait only when the function returns pushing nothing. As the result, I think that the functions should be modified as the following. - Forcing slow-path if time elapses a half of a ping period is right. (GetCurrentTimestamp is anyway requried.) - The slow-path should not sleep waiting Latch. It should just pg_usleep() for maybe 1-10ms. - We should go to the fast path just after keepalive or response message has been sent. In other words, the "if (now <" block should be in the "for (;;)" loop. This avoids needless runs on the slow-path. It would be refactorable as the following. prepare for the send buffer; for (;;) { now = GetCurrentTimeStamp(); if (now < )... { fast-path } else { slow-path } return if finished sleep for 1ms? } What do you think about this? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
I've changed to "need review" to gain more attention from other. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
On 2017-05-25 17:52 Petr Jelinek wrote: > Hi, > > We have had issue with walsender timeout when used with logical > decoding and the transaction is taking long time to be decoded > (because it contains many changes) > > I was looking today at the walsender code and realized that it's > because if the network and downstream are fast enough, we'll always > take fast path in WalSndWriteData which does not do reply or > keepalive processing and is only reached once the transaction has > finished by other code. So paradoxically we die of timeout because > everything was fast enough to never fall back to slow code path. > > I propose we only use fast path if the last processed reply is not > older than half of walsender timeout, if it is then we'll force the > slow code path to process the replies again. This is similar logic > that we use to determine if to send keepalive message. I also added > CHECK_INTERRUPRS call to fast code path because otherwise walsender > might ignore them for too long on large transactions. > > Thoughts? > On 2017-08-10 14:20 Sokolov Yura wrote: > On 2017-08-09 16:23, Petr Jelinek wrote: > > On 02/08/17 19:35, Yura Sokolov wrote: > >> The following review has been posted through the commitfest > >> application: > >> make installcheck-world: tested, passed > >> Implements feature: not tested > >> Spec compliant: not tested > >> Documentation:not tested > >> > >> There is no check for (last_reply_timestamp <= 0 || > >> wal_sender_timeout <= 0) as in other places > >> (in WalSndKeepaliveIfNecessary for example). > >> > >> I don't think moving update of 'now' down to end of loop body is > >> correct: > >> there are calls to ProcessConfigFile with SyncRepInitConfig, > >> ProcessRepliesIfAny that can > >> last non-negligible time. It could lead to over sleeping due to > >> larger computed sleeptime. > >> Though I could be mistaken. > >> > >> I'm not sure about moving `if (!pg_is_send_pending())` in a body > >> loop after WalSndKeepaliveIfNecessary. > >> Is it necessary? But it looks harmless at least. > >> > > > > We also need to do actual timeout handing so that the timeout is not > > deferred to the end of the transaction (Which is why I moved `if > > (!pg_is_send_pending())` under WalSndCheckTimeOut() and > > WalSndKeepaliveIfNecessary() calls). > > > > If standby really stalled, then it will not read from socket, and then > `pg_is_sendpending` eventually will return false, and timeout will be > checked. > If standby doesn't stall, then `last_reply_timestamp` will be updated > in `ProcessRepliesIfAny`, and so timeout will not be triggered. > Do I understand correctly, or I missed something? > > >> Could patch be reduced to check after first `if > >> (!pg_is_sendpending())` ? like: > >> > >>if (!pq_is_send_pending()) > >> - return; > >> + { > >> + if (last_reply_timestamp <= 0 || > >> wal_sender_timeout <= 0) > >> + { > >> + CHECK_FOR_INTERRUPTS(); > >> + return; > >> + } > >> + if (now <= > >> TimestampTzPlusMilliseconds(last_reply_timestamp, > >> wal_sender_timeout / 2)) > >> + return; > >> + } > >> > >> If not, what problem prevents? > > > > We should do CHECK_FOR_INTERRUPTS() independently of > > pq_is_send_pending so that it's possible to stop walsender while > > it's processing large transaction. > > In this case CHECK_FOR_INTERRUPTS will be called after > wal_sender_timeout/2. > (This diff is for first appearance of `pq_is_send_pending` in a > function). > > If it should be called more often, then patch could be simplier: > > if (!pq_is_send_pending()) > - return; > + { > + CHECK_FOR_INTERRUPTS(); > + if (last_reply_timestamp <= 0 || wal_sender_timeout > <= 0 || > + now <= > TimestampTzPlusMilliseconds(last_reply_timestamp, > wal_sender_timeout / 2)) > + return; > + } > > (Still, I could be mistaken, it is just suggestion). > > Is it hard to add test for case this patch fixes? > > With regards, Tom, Robert, I believe this bug have to be fixed in Pg10, and I don't wonna be that guy who prevents it from being fixed. What should/could I do? ( https://commitfest.postgresql.org/14/1151/ ) -- With regards, Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
On 2017-08-09 16:23, Petr Jelinek wrote: On 02/08/17 19:35, Yura Sokolov wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: not tested Spec compliant: not tested Documentation:not tested There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) as in other places (in WalSndKeepaliveIfNecessary for example). I don't think moving update of 'now' down to end of loop body is correct: there are calls to ProcessConfigFile with SyncRepInitConfig, ProcessRepliesIfAny that can last non-negligible time. It could lead to over sleeping due to larger computed sleeptime. Though I could be mistaken. I'm not sure about moving `if (!pg_is_send_pending())` in a body loop after WalSndKeepaliveIfNecessary. Is it necessary? But it looks harmless at least. We also need to do actual timeout handing so that the timeout is not deferred to the end of the transaction (Which is why I moved `if (!pg_is_send_pending())` under WalSndCheckTimeOut() and WalSndKeepaliveIfNecessary() calls). If standby really stalled, then it will not read from socket, and then `pg_is_sendpending` eventually will return false, and timeout will be checked. If standby doesn't stall, then `last_reply_timestamp` will be updated in `ProcessRepliesIfAny`, and so timeout will not be triggered. Do I understand correctly, or I missed something? Could patch be reduced to check after first `if (!pg_is_sendpending())` ? like: if (!pq_is_send_pending()) - return; + { + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) + { + CHECK_FOR_INTERRUPTS(); + return; + } + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, wal_sender_timeout / 2)) + return; + } If not, what problem prevents? We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending so that it's possible to stop walsender while it's processing large transaction. In this case CHECK_FOR_INTERRUPTS will be called after wal_sender_timeout/2. (This diff is for first appearance of `pq_is_send_pending` in a function). If it should be called more often, then patch could be simplier: if (!pq_is_send_pending()) - return; + { + CHECK_FOR_INTERRUPTS(); + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0 || +now <= TimestampTzPlusMilliseconds(last_reply_timestamp, wal_sender_timeout / 2)) + return; + } (Still, I could be mistaken, it is just suggestion). Is it hard to add test for case this patch fixes? With regards, -- Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
On 9 August 2017 at 21:23, Petr Jelinek wrote: > On 02/08/17 19:35, Yura Sokolov wrote: > > The following review has been posted through the commitfest application: > > make installcheck-world: tested, passed > > Implements feature: not tested > > Spec compliant: not tested > > Documentation:not tested > > > > There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout > <= 0) as in other places > > (in WalSndKeepaliveIfNecessary for example). > > > > I don't think moving update of 'now' down to end of loop body is correct: > > there are calls to ProcessConfigFile with SyncRepInitConfig, > ProcessRepliesIfAny that can > > last non-negligible time. It could lead to over sleeping due to larger > computed sleeptime. > > Though I could be mistaken. > > > > I'm not sure about moving `if (!pg_is_send_pending())` in a body loop > after WalSndKeepaliveIfNecessary. > > Is it necessary? But it looks harmless at least. > > > > We also need to do actual timeout handing so that the timeout is not > deferred to the end of the transaction (Which is why I moved `if > (!pg_is_send_pending())` under WalSndCheckTimeOut() and > WalSndKeepaliveIfNecessary() calls). > > > Could patch be reduced to check after first `if (!pg_is_sendpending())` > ? like: > > > > if (!pq_is_send_pending()) > > - return; > > + { > > + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) > > + { > > + CHECK_FOR_INTERRUPTS(); > > + return; > > + } > > + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, > wal_sender_timeout / 2)) > > + return; > > + } > > > > If not, what problem prevents? > > We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending > so that it's possible to stop walsender while it's processing large > transaction. > > Is there any chance of getting this bugfix into Pg 10? We've just cut back branches, so it'd be a sensible time. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Walsender timeouts and large transactions
On 02/08/17 19:35, Yura Sokolov wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: not tested > Spec compliant: not tested > Documentation:not tested > > There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) > as in other places > (in WalSndKeepaliveIfNecessary for example). > > I don't think moving update of 'now' down to end of loop body is correct: > there are calls to ProcessConfigFile with SyncRepInitConfig, > ProcessRepliesIfAny that can > last non-negligible time. It could lead to over sleeping due to larger > computed sleeptime. > Though I could be mistaken. > > I'm not sure about moving `if (!pg_is_send_pending())` in a body loop after > WalSndKeepaliveIfNecessary. > Is it necessary? But it looks harmless at least. > We also need to do actual timeout handing so that the timeout is not deferred to the end of the transaction (Which is why I moved `if (!pg_is_send_pending())` under WalSndCheckTimeOut() and WalSndKeepaliveIfNecessary() calls). > Could patch be reduced to check after first `if (!pg_is_sendpending())` ? > like: > > if (!pq_is_send_pending()) > - return; > + { > + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) > + { > + CHECK_FOR_INTERRUPTS(); > + return; > + } > + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, > wal_sender_timeout / 2)) > + return; > + } > > If not, what problem prevents? We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending so that it's possible to stop walsender while it's processing large transaction. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: not tested Spec compliant: not tested Documentation:not tested There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) as in other places (in WalSndKeepaliveIfNecessary for example). I don't think moving update of 'now' down to end of loop body is correct: there are calls to ProcessConfigFile with SyncRepInitConfig, ProcessRepliesIfAny that can last non-negligible time. It could lead to over sleeping due to larger computed sleeptime. Though I could be mistaken. I'm not sure about moving `if (!pg_is_send_pending())` in a body loop after WalSndKeepaliveIfNecessary. Is it necessary? But it looks harmless at least. Could patch be reduced to check after first `if (!pg_is_sendpending())` ? like: if (!pq_is_send_pending()) - return; + { + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) + { + CHECK_FOR_INTERRUPTS(); + return; + } + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, wal_sender_timeout / 2)) + return; + } If not, what problem prevents? The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
On 30/05/17 15:44, Petr Jelinek wrote: > On 30/05/17 11:02, Kyotaro HORIGUCHI wrote: >> >> +TimestampTz now = GetCurrentTimestamp(); >> >> I think it is not recommended to read the current time too >> frequently, especially within a loop that hates slowness. (I >> suppose that a loop that can fill up a send queue falls into that > > Yeah that was my main worry for the patch as well, although given that > the loop does tuple processing it might not be very noticeable. > I realized we actually call GetCurrentTimestamp() there anyway (for the pq_sendint64). So I just modified the patch to use the now variable there instead which means there are no additional GetCurrentTimestamp() calls compared to state before patch now. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services >From c6e9cdb81311371662c580db42a165ab0575d951 Mon Sep 17 00:00:00 2001 From: Petr Jelinek Date: Thu, 25 May 2017 17:35:24 +0200 Subject: [PATCH] Fix walsender timeouts when decoding large transaction The logical slots have fast code path for sending data in order to not impose too high per message overhead. The fast path skips checks for interrupts and timeouts. However the fast path failed to consider the fact that transaction with large number of changes may take very long to be processed and sent to the client. This causes walsender to ignore interrupts for potentially long time but more importantly it will cause walsender being killed due to timeout at the end of such transaction. This commit changes the fast path to also check for interrupts and only allows calling the fast path when last keeplaive check happened less than half of walsender timeout ago, otherwise the slower code path will be taken. --- src/backend/replication/walsender.c | 36 ++-- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index f845180..278f882 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1160,6 +1160,8 @@ static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write) { + TimestampTz now = GetCurrentTimestamp(); + /* output previously gathered data in a CopyData packet */ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); @@ -1169,23 +1171,28 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, * several releases by streaming physical replication. */ resetStringInfo(&tmpbuf); - pq_sendint64(&tmpbuf, GetCurrentTimestamp()); + pq_sendint64(&tmpbuf, now); memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)], tmpbuf.data, sizeof(int64)); - /* fast path */ - /* Try to flush pending output to the client */ - if (pq_flush_if_writable() != 0) - WalSndShutdown(); + /* Try taking fast path unless we get too close to walsender timeout. */ + if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, + wal_sender_timeout / 2)) + { + CHECK_FOR_INTERRUPTS(); - if (!pq_is_send_pending()) - return; + /* Try to flush pending output to the client */ + if (pq_flush_if_writable() != 0) + WalSndShutdown(); + + if (!pq_is_send_pending()) + return; + } for (;;) { int wakeEvents; long sleeptime; - TimestampTz now; /* * Emergency bailout if postmaster has died. This is to avoid the @@ -1214,18 +1221,16 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, if (pq_flush_if_writable() != 0) WalSndShutdown(); - /* If we finished clearing the buffered data, we're done here. */ - if (!pq_is_send_pending()) - break; - - now = GetCurrentTimestamp(); - /* die if timeout was reached */ WalSndCheckTimeOut(now); /* Send keepalive if the time has come */ WalSndKeepaliveIfNecessary(now); + /* If we finished clearing the buffered data, we're done here. */ + if (!pq_is_send_pending()) + break; + sleeptime = WalSndComputeSleeptime(now); wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | @@ -1235,6 +1240,9 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, WaitLatchOrSocket(MyLatch, wakeEvents, MyProcPort->sock, sleeptime, WAIT_EVENT_WAL_SENDER_WRITE_DATA); + + /* Update our idea of current time for the next cycle. */ + now = GetCurrentTimestamp(); } /* reactivate latch so WalSndLoop knows to continue */ -- 2.7.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
On 30/05/17 11:02, Kyotaro HORIGUCHI wrote: > At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek > wrote in > >> Hi, >> >> We have had issue with walsender timeout when used with logical decoding >> and the transaction is taking long time to be decoded (because it >> contains many changes) >> >> I was looking today at the walsender code and realized that it's because >> if the network and downstream are fast enough, we'll always take fast >> path in WalSndWriteData which does not do reply or keepalive processing >> and is only reached once the transaction has finished by other code. So >> paradoxically we die of timeout because everything was fast enough to >> never fall back to slow code path. >> >> I propose we only use fast path if the last processed reply is not older >> than half of walsender timeout, if it is then we'll force the slow code >> path to process the replies again. This is similar logic that we use to >> determine if to send keepalive message. I also added CHECK_INTERRUPRS >> call to fast code path because otherwise walsender might ignore them for >> too long on large transactions. >> >> Thoughts? > > + TimestampTz now = GetCurrentTimestamp(); > > I think it is not recommended to read the current time too > frequently, especially within a loop that hates slowness. (I > suppose that a loop that can fill up a send queue falls into that Yeah that was my main worry for the patch as well, although given that the loop does tuple processing it might not be very noticeable. > category.) If you don't mind a certain amount of additional > complexity for eliminating the possible slowdown by the check, > timeout would be usable. Attached patch does almost the same > thing with your patch but without busy time check. > > What do you think about this? > I think we could do it that way. > # I saw that SIGQUIT doens't work for active publisher, which I > # think mention in another thread. Ah missed that email I guess, but that's the missing CHECK_INTERRUPTS(); in the fast-path which btw your updated patch is missing as well. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek wrote in > Hi, > > We have had issue with walsender timeout when used with logical decoding > and the transaction is taking long time to be decoded (because it > contains many changes) > > I was looking today at the walsender code and realized that it's because > if the network and downstream are fast enough, we'll always take fast > path in WalSndWriteData which does not do reply or keepalive processing > and is only reached once the transaction has finished by other code. So > paradoxically we die of timeout because everything was fast enough to > never fall back to slow code path. > > I propose we only use fast path if the last processed reply is not older > than half of walsender timeout, if it is then we'll force the slow code > path to process the replies again. This is similar logic that we use to > determine if to send keepalive message. I also added CHECK_INTERRUPRS > call to fast code path because otherwise walsender might ignore them for > too long on large transactions. > > Thoughts? + TimestampTz now = GetCurrentTimestamp(); I think it is not recommended to read the current time too frequently, especially within a loop that hates slowness. (I suppose that a loop that can fill up a send queue falls into that category.) If you don't mind a certain amount of additional complexity for eliminating the possible slowdown by the check, timeout would be usable. Attached patch does almost the same thing with your patch but without busy time check. What do you think about this? # I saw that SIGQUIT doens't work for active publisher, which I # think mention in another thread. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 49cce38..ec33357 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -194,6 +194,15 @@ static volatile sig_atomic_t replication_active = false; static LogicalDecodingContext *logical_decoding_ctx = NULL; static XLogRecPtr logical_startptr = InvalidXLogRecPtr; +/* + * For logical replication, WalSndWriteData needs to process replies from the + * client to check if keepalive to be sent. WAL send loop may skip check + * replies only while check_replies_needed = false. + */ +#define REPLY_SEND_TIMEOUT USER_TIMEOUT +static bool keepalive_timeout_initialized = false; +static bool check_replies_needed = false; + /* A sample associating a WAL location with the time it was written. */ typedef struct { @@ -1175,12 +1184,12 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)], tmpbuf.data, sizeof(int64)); - /* fast path */ /* Try to flush pending output to the client */ if (pq_flush_if_writable() != 0) WalSndShutdown(); - if (!pq_is_send_pending()) + /* fast path: return immediately if possible */ + if (!check_replies_needed && !pq_is_send_pending()) return; for (;;) @@ -1216,10 +1225,6 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, if (pq_flush_if_writable() != 0) WalSndShutdown(); - /* If we finished clearing the buffered data, we're done here. */ - if (!pq_is_send_pending()) - break; - now = GetCurrentTimestamp(); /* die if timeout was reached */ @@ -1228,6 +1233,10 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, /* Send keepalive if the time has come */ WalSndKeepaliveIfNecessary(now); + /* If we finished clearing the buffered data, we're done here. */ + if (!pq_is_send_pending()) + break; + sleeptime = WalSndComputeSleeptime(now); wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | @@ -1562,6 +1571,13 @@ exec_replication_command(const char *cmd_string) return true; } +static void +LogicalDecodeReplyTimeoutHandler(void) +{ + check_replies_needed = true; +} + + /* * Process any incoming messages while streaming. Also checks if the remote * end has closed the connection. @@ -1662,6 +1678,22 @@ ProcessRepliesIfAny(void) { last_reply_timestamp = GetCurrentTimestamp(); waiting_for_ping_response = false; + + if (wal_sender_timeout > 0) + { + if (!keepalive_timeout_initialized) + { +RegisterTimeout(REPLY_SEND_TIMEOUT, +LogicalDecodeReplyTimeoutHandler); +keepalive_timeout_initialized = true; + } + + check_replies_needed = false; + enable_timeout_at(REPLY_SEND_TIMEOUT, + TimestampTzPlusMilliseconds(last_reply_timestamp, + wal_sender_timeout / 2)); + } + } } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Walsender timeouts and large transactions
Hi, We have had issue with walsender timeout when used with logical decoding and the transaction is taking long time to be decoded (because it contains many changes) I was looking today at the walsender code and realized that it's because if the network and downstream are fast enough, we'll always take fast path in WalSndWriteData which does not do reply or keepalive processing and is only reached once the transaction has finished by other code. So paradoxically we die of timeout because everything was fast enough to never fall back to slow code path. I propose we only use fast path if the last processed reply is not older than half of walsender timeout, if it is then we'll force the slow code path to process the replies again. This is similar logic that we use to determine if to send keepalive message. I also added CHECK_INTERRUPRS call to fast code path because otherwise walsender might ignore them for too long on large transactions. Thoughts? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0001-Fix-walsender-timeouts-when-decoding-large-transacti.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers