Re: [HACKERS] Walsender timeouts and large transactions
On 15 December 2017 at 00:36, Andrew Dunstanwrote: > > > On 12/14/2017 01:46 AM, Craig Ringer wrote: > > On 7 December 2017 at 01:22, Petr Jelinek > > > > > wrote: > > > > On 05/12/17 21:07, Robert Haas wrote: > > > > > > Generally we write if (a && b) { ... } not if (a) { if (b) .. } > > > > > > > It's rather ugly with && because one of the conditions is two > > line, but > > okay here you go. I am keeping the brackets even if normally don't > for > > one-liners because it's completely unreadable without them IMHO. > > > > > > > > Yeah, that's why I passed on that FWIW. Sometimes breaking up a > > condition is nice. Personally I intensely dislike the convention of > > > > > > if (big_condition > > && big_condition) > >one_linerdo_something; > > > > > > as awfully unreadable, but I guess code convention means you live with > > things you don't like. > > > > > > Anyway, I've just hit this bug in the wild for the umpteenth time this > > year, and I'd like to know what I can do to help progress it to > > commit+backport. > > > > > > > Ask and ye shall receive. I've just committed it. > > Brilliant, thanks. Backpatched too, great. Now I'm going to shamelessly point you at the other nasty recurring logical decoding bug while you're fresh from thinking about replication. I can hope, right ;) https://commitfest.postgresql.org/16/1397/ causes errors or bad data to be decoded, so it's a serious bug. I'll see if I can rustle up some review attention first though. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Walsender timeouts and large transactions
On 12/14/2017 01:46 AM, Craig Ringer wrote: > On 7 December 2017 at 01:22, Petr Jelinek >> > wrote: > > On 05/12/17 21:07, Robert Haas wrote: > > > > Generally we write if (a && b) { ... } not if (a) { if (b) .. } > > > > It's rather ugly with && because one of the conditions is two > line, but > okay here you go. I am keeping the brackets even if normally don't for > one-liners because it's completely unreadable without them IMHO. > > > > Yeah, that's why I passed on that FWIW. Sometimes breaking up a > condition is nice. Personally I intensely dislike the convention of > > > if (big_condition > && big_condition) > one_linerdo_something; > > > as awfully unreadable, but I guess code convention means you live with > things you don't like. > > > Anyway, I've just hit this bug in the wild for the umpteenth time this > year, and I'd like to know what I can do to help progress it to > commit+backport. > > Ask and ye shall receive. I've just committed it. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Walsender timeouts and large transactions
On 7 December 2017 at 01:22, Petr Jelinekwrote: > On 05/12/17 21:07, Robert Haas wrote: > > > > Generally we write if (a && b) { ... } not if (a) { if (b) .. } > > > > It's rather ugly with && because one of the conditions is two line, but > okay here you go. I am keeping the brackets even if normally don't for > one-liners because it's completely unreadable without them IMHO. > > Yeah, that's why I passed on that FWIW. Sometimes breaking up a condition is nice. Personally I intensely dislike the convention of if (big_condition && big_condition) one_linerdo_something; as awfully unreadable, but I guess code convention means you live with things you don't like. Anyway, I've just hit this bug in the wild for the umpteenth time this year, and I'd like to know what I can do to help progress it to commit+backport. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Walsender timeouts and large transactions
On 05/12/17 21:07, Robert Haas wrote: > On Mon, Dec 4, 2017 at 10:59 PM, Craig Ringerwrote: >> To me it looks like it's time to get this committed, marking as such. > > This version has noticeably more code rearrangement than before, and > I'm not sure that is actually buying us anything. Why not keep the > changes minimal? > Yeah we moved things around in the end, the main reason would be that it actually works closer to how it was originally designed to work. Meaning that the slow path is not so slow when !pq_is_send_pending() which seems to have been the reasoning for original coding. It's not completely necessary to do it for fixing the bug, but why make things slower than they need to be. > Also, TBH, this doesn't seem to have been carefully reviewed for style: > > -if (!pq_is_send_pending()) > -return; > +/* Try taking fast path unless we get too close to walsender timeout. */ > +if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, > + wal_sender_timeout / 2)) > +{ > +if (!pq_is_send_pending()) > +return; > +} > > Generally we write if (a && b) { ... } not if (a) { if (b) .. } > It's rather ugly with && because one of the conditions is two line, but okay here you go. I am keeping the brackets even if normally don't for one-liners because it's completely unreadable without them IMHO. > -} > +}; > Oops. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 61a23cad58a0016876e3d6c829f6353ee491b4c7 Mon Sep 17 00:00:00 2001 From: Petr Jelinek Date: Tue, 12 Sep 2017 17:31:28 +0900 Subject: [PATCH] Fix walsender timeouts when decoding large transaction The logical slots have fast code path for sending data in order to not impose too high per message overhead. The fast path skips checks for interrupts and timeouts. However, the existing coding failed to consider the fact that transaction with large number of changes may take very long to be processed and sent to the client. This causes walsender to ignore interrupts for potentially long time and more importantly it will cause walsender being killed due to timeout at the end of such transaction. This commit changes the fast path to also check for interrupts and only allows calling the fast path when last keeplaive check happened less than half of walsender timeout ago, otherwise the slower code path will be taken. --- src/backend/replication/walsender.c | 66 + 1 file changed, 37 insertions(+), 29 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index fa1db748b5..e015870a4e 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1151,6 +1151,8 @@ static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write) { + TimestampTz now; + /* output previously gathered data in a CopyData packet */ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); @@ -1160,23 +1162,54 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, * several releases by streaming physical replication. */ resetStringInfo(); - pq_sendint64(, GetCurrentTimestamp()); + now = GetCurrentTimestamp(); + pq_sendint64(, now); memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)], tmpbuf.data, sizeof(int64)); - /* fast path */ + CHECK_FOR_INTERRUPTS(); + /* Try to flush pending output to the client */ if (pq_flush_if_writable() != 0) WalSndShutdown(); - if (!pq_is_send_pending()) + /* Try taking fast path unless we get too close to walsender timeout. */ + if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, + wal_sender_timeout / 2) && + pq_is_send_pending()) + { return; + } + /* If we have pending write here, go to slow path */ for (;;) { int wakeEvents; long sleeptime; - TimestampTz now; + + /* Check for input from the client */ + ProcessRepliesIfAny(); + + now = GetCurrentTimestamp(); + + /* die if timeout was reached */ + WalSndCheckTimeOut(now); + + /* Send keepalive if the time has come */ + WalSndKeepaliveIfNecessary(now); + + if (!pq_is_send_pending()) + break; + + sleeptime = WalSndComputeSleeptime(now); + + wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | + WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT; + + /* Sleep until something happens or we time out */ + WaitLatchOrSocket(MyLatch, wakeEvents, + MyProcPort->sock, sleeptime, + WAIT_EVENT_WAL_SENDER_WRITE_DATA); /* * Emergency bailout if postmaster has died. This is to avoid the @@ -1198,34 +1231,9 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, SyncRepInitConfig(); } - /* Check for input from the client */ -
Re: [HACKERS] Walsender timeouts and large transactions
On 29 November 2017 at 23:59, Petr Jelinekwrote: > Hi, > > On 17/11/17 08:35, Kyotaro HORIGUCHI wrote: > > > > Moving around the code allow us to place ps_is_send_pending() in > > the while condition, which seems to be more proper place to do > > that. I haven't added test for this particular case. > > > > I tested this that > > > > - cleanly applies on the current master HEAD and passes make > > check and subscription test. > > > > - walsender properly chooses the slow-path even if > > pq_is_send_pending() is always false. (happens on a fast enough > > network) > > > > - walsender waits properly waits on socket and process-reply time > > in WaitLatchOrSocket. > > > > - walsender exits by timeout on network stall. > > > > So, I think the patch is functionally perfect. > > > > I'm a reviewer of this patch but I think I'm not allowed to mark > > this "Ready for Commiter" since the last change is made by me. > > > > Thanks for working on this, but there are couple of problems with your > modifications which mean that it does not actually fix the original > issue anymore (transaction taking long time to decode while sending > changes over network works fine will result in walsender timout). > > The firs one is that you put pq_is_send_pending() in the while so the > while is again never executed if there is no network send pending which > makes the if above meaningless. Also you missed ProcessRepliesIfAny() > when moving code around. That's needed for timeout calculations to work > correctly. > > So one more revision attached with those things fixed. This version > fixes the original issue as well. > > I'm happy with what I see here. Commit message needs tweaking, but any committer would do that anyway. To me it looks like it's time to get this committed, marking as such. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Walsender timeouts and large transactions
Hi, On 17/11/17 08:35, Kyotaro HORIGUCHI wrote: > > Moving around the code allow us to place ps_is_send_pending() in > the while condition, which seems to be more proper place to do > that. I haven't added test for this particular case. > > I tested this that > > - cleanly applies on the current master HEAD and passes make > check and subscription test. > > - walsender properly chooses the slow-path even if > pq_is_send_pending() is always false. (happens on a fast enough > network) > > - walsender waits properly waits on socket and process-reply time > in WaitLatchOrSocket. > > - walsender exits by timeout on network stall. > > So, I think the patch is functionally perfect. > > I'm a reviewer of this patch but I think I'm not allowed to mark > this "Ready for Commiter" since the last change is made by me. > Thanks for working on this, but there are couple of problems with your modifications which mean that it does not actually fix the original issue anymore (transaction taking long time to decode while sending changes over network works fine will result in walsender timout). The firs one is that you put pq_is_send_pending() in the while so the while is again never executed if there is no network send pending which makes the if above meaningless. Also you missed ProcessRepliesIfAny() when moving code around. That's needed for timeout calculations to work correctly. So one more revision attached with those things fixed. This version fixes the original issue as well. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 46bbc20a3b4d6e2eed97ddc1b05d828399241c63 Mon Sep 17 00:00:00 2001 From: Petr JelinekDate: Tue, 12 Sep 2017 17:31:28 +0900 Subject: [PATCH] Fix walsender timeouts when decoding large transaction The logical slots have fast code path for sending data in order to not impose too high per message overhead. The fast path skips checks for interrupts and timeouts. However, the existing coding failed to consider the fact that transaction with large number of changes may take very long to be processed and sent to the client. This causes walsender to ignore interrupts for potentially long time and more importantly it will cause walsender being killed due to timeout at the end of such transaction. This commit changes the fast path to also check for interrupts and only allows calling the fast path when last keeplaive check happened less than half of walsender timeout ago, otherwise the slower code path will be taken. --- src/backend/replication/walsender.c | 70 + 1 file changed, 39 insertions(+), 31 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index fa1db748b5..0c600ed2e3 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1151,6 +1151,8 @@ static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write) { + TimestampTz now; + /* output previously gathered data in a CopyData packet */ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); @@ -1160,23 +1162,54 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, * several releases by streaming physical replication. */ resetStringInfo(); - pq_sendint64(, GetCurrentTimestamp()); + now = GetCurrentTimestamp(); + pq_sendint64(, now); memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)], tmpbuf.data, sizeof(int64)); - /* fast path */ + CHECK_FOR_INTERRUPTS(); + /* Try to flush pending output to the client */ if (pq_flush_if_writable() != 0) WalSndShutdown(); - if (!pq_is_send_pending()) - return; + /* Try taking fast path unless we get too close to walsender timeout. */ + if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, + wal_sender_timeout / 2)) + { + if (!pq_is_send_pending()) + return; + } + /* If we have pending write here, go to slow path */ for (;;) { int wakeEvents; long sleeptime; - TimestampTz now; + + /* Check for input from the client */ + ProcessRepliesIfAny(); + + now = GetCurrentTimestamp(); + + /* die if timeout was reached */ + WalSndCheckTimeOut(now); + + /* Send keepalive if the time has come */ + WalSndKeepaliveIfNecessary(now); + + if (!pq_is_send_pending()) + break; + + sleeptime = WalSndComputeSleeptime(now); + + wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | + WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT; + + /* Sleep until something happens or we time out */ + WaitLatchOrSocket(MyLatch, wakeEvents, + MyProcPort->sock, sleeptime, + WAIT_EVENT_WAL_SENDER_WRITE_DATA); /* * Emergency bailout if postmaster has died. This is to avoid the @@ -1198,35 +1231,10 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, SyncRepInitConfig(); } - /*
Re: [HACKERS] Walsender timeouts and large transactions
On Fri, Nov 17, 2017 at 4:35 PM, Kyotaro HORIGUCHIwrote: > I'm a reviewer of this patch but I think I'm not allowed to mark > this "Ready for Commiter" since the last change is made by me. Yes, it is a better idea to wait for reviews here. -- Michael
Re: [HACKERS] Walsender timeouts and large transactions
Ouch.. I'd doubly mistaked. > I found that the patch is the latest one and will look this > soon. Sorry for the ignorance. Thats...wrong. Sorry. There's no new patch since the Reboer's comment. I think this is just a bug fix and needs no more argument on its functionality. (and might ought to be backpatched?) At Fri, 3 Nov 2017 15:54:09 +0100, Petr Jelinekwrote in > > This change falsifies the comments. Maybe initialize now just after > > resetSetringInfo() is done. > > Eh, right, I can do that. It is reasonable. (Or rewrite the comment?) At Fri, 3 Nov 2017 15:54:09 +0100, Petr Jelinek wrote in > > > > -/* fast path */ > > -/* Try to flush pending output to the client */ > > -if (pq_flush_if_writable() != 0) > > -WalSndShutdown(); > > +/* Try taking fast path unless we get too close to walsender timeout. > > */ > > +if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, > > + wal_sender_timeout / 2)) > > +{ > > +CHECK_FOR_INTERRUPTS(); > > > > -if (!pq_is_send_pending()) > > -return; > > +/* Try to flush pending output to the client */ > > +if (pq_flush_if_writable() != 0) > > +WalSndShutdown(); > > + > > +if (!pq_is_send_pending()) > > +return; > > +} > > > > I think it's only the if (!pq_is_send_pending()) return; that needs to > > be conditional here, isn't it? The pq_flush_if_writable() can be done > > unconditionally. > > > > Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes. > It just seems like it's needless call as we'll call both in for loop > anyway if we take the "slow" path. I admit it's not exactly big win > though. If you think it would improve readability I can move it. Moving around the code allow us to place ps_is_send_pending() in the while condition, which seems to be more proper place to do that. I haven't added test for this particular case. I tested this that - cleanly applies on the current master HEAD and passes make check and subscription test. - walsender properly chooses the slow-path even if pq_is_send_pending() is always false. (happens on a fast enough network) - walsender waits properly waits on socket and process-reply time in WaitLatchOrSocket. - walsender exits by timeout on network stall. So, I think the patch is functionally perfect. I'm a reviewer of this patch but I think I'm not allowed to mark this "Ready for Commiter" since the last change is made by me. regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 508c43ef43eebee7de89094658b74f9cef8e4342 Mon Sep 17 00:00:00 2001 From: Petr Jelinek Date: Tue, 12 Sep 2017 17:31:28 +0900 Subject: [PATCH] Fix walsender timeouts when decoding large transaction The logical slots have fast code path for sending data in order to not impose too high per message overhead. The fast path skips checks for interrupts and timeouts. However, the existing coding failed to consider the fact that transaction with large number of changes may take very long to be processed and sent to the client. This causes walsender to ignore interrupts for potentially long time and more importantly it will cause walsender being killed due to timeout at the end of such transaction. This commit changes the fast path to also check for interrupts and only allows calling the fast path when last keeplaive check happened less than half of walsender timeout ago, otherwise the slower code path will be taken. --- src/backend/replication/walsender.c | 58 - 1 file changed, 32 insertions(+), 26 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index fa1db74..5a4c31f 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1151,6 +1151,8 @@ static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write) { + TimestampTz now; + /* output previously gathered data in a CopyData packet */ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); @@ -1160,23 +1162,46 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, * several releases by streaming physical replication. */ resetStringInfo(); - pq_sendint64(, GetCurrentTimestamp()); + now = GetCurrentTimestamp(); + pq_sendint64(, now); memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)], tmpbuf.data, sizeof(int64)); - /* fast path */ + CHECK_FOR_INTERRUPTS(); + /* Try to flush pending output to the client */ if (pq_flush_if_writable() != 0) WalSndShutdown(); - if (!pq_is_send_pending()) - return; + /* Try taking fast path unless we get too close to walsender timeout. */ +
Re: [HACKERS] Walsender timeouts and large transactions
Hello, At Fri, 17 Nov 2017 13:24:08 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHIwrote in <20171117.132408.85564852.horiguchi.kyot...@lab.ntt.co.jp> > > Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes. > > It just seems like it's needless call as we'll call both in for loop > > anyway if we take the "slow" path. I admit it's not exactly big win > > though. If you think it would improve readability I can move it. > > I think this is the last message in this thread so I changed the > status of the CF entry to "Waiting for Author". Hmm. Somehow the last patch and Robert's comment * which is the base of the patch * has been reached me in reverse order. I found that the patch is the latest one and will look this soon. Sorry for the ignorance. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] Walsender timeouts and large transactions
Hello, At Fri, 3 Nov 2017 15:54:09 +0100, Petr Jelinekwrote in > Hi, > > thanks for checking. > > On 02/11/17 10:00, Robert Haas wrote: > > On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek > > wrote: > >> sorry for the delay but I didn't have much time in past weeks to follow > >> this thread. > > > > +TimestampTz now = GetCurrentTimestamp(); > > + > > /* output previously gathered data in a CopyData packet */ > > pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); > > > > /* > > * Fill the send timestamp last, so that it is taken as late as > > possible. > > * This is somewhat ugly, but the protocol is set as it's already used > > for > > * several releases by streaming physical replication. > > */ > > resetStringInfo(); > > -pq_sendint64(, GetCurrentTimestamp()); > > +pq_sendint64(, now); > > memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)], > > tmpbuf.data, sizeof(int64)); > > > > This change falsifies the comments. Maybe initialize now just after > > resetSetringInfo() is done. > > Eh, right, I can do that. > > > > > -/* fast path */ > > -/* Try to flush pending output to the client */ > > -if (pq_flush_if_writable() != 0) > > -WalSndShutdown(); > > +/* Try taking fast path unless we get too close to walsender timeout. > > */ > > +if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, > > + wal_sender_timeout / 2)) > > +{ > > +CHECK_FOR_INTERRUPTS(); > > > > -if (!pq_is_send_pending()) > > -return; > > +/* Try to flush pending output to the client */ > > +if (pq_flush_if_writable() != 0) > > +WalSndShutdown(); > > + > > +if (!pq_is_send_pending()) > > +return; > > +} > > > > I think it's only the if (!pq_is_send_pending()) return; that needs to > > be conditional here, isn't it? The pq_flush_if_writable() can be done > > unconditionally. > > > > Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes. > It just seems like it's needless call as we'll call both in for loop > anyway if we take the "slow" path. I admit it's not exactly big win > though. If you think it would improve readability I can move it. I think this is the last message in this thread so I changed the status of the CF entry to "Waiting for Author". regards, -- Kyotaro Horiguchi NTT Open Source Software Center