RE: Logical replication timeout problem
On Thu, Mar 2, 2023 at 4:19 AM Gregory Stark (as CFM) wrote: > On Wed, 8 Feb 2023 at 15:04, Andres Freund wrote: > > > > Attached is a current, quite rough, prototype. It addresses some of the > > points > > raised, but far from all. There's also several XXXs/FIXMEs in it. I changed > > the file-ending to .txt to avoid hijacking the CF entry. > > It looks like this patch has received quite a generous helping of > feedback from Andres. I'm setting it to Waiting on Author. > > On the one hand it looks like there's a lot of work to do on this but > on the other hand it sounds like this is a live problem in the field > so if it can get done in time for release that would be great but if > not then feel free to move it to the next commitfest (which means next > release). Hi, Since this patch is an improvement to the architecture in HEAD, we started another new thread [1] on this topic to develop related patch. It seems that we could modify the details of this CF entry to point to the new thread and change the status to 'Needs Review'. [1] - https://www.postgresql.org/message-id/20230210210423.r26ndnfmuifie4f6%40awork3.anarazel.de Regards, Wang Wei
Re: Logical replication timeout problem
On Wed, 8 Feb 2023 at 15:04, Andres Freund wrote: > > Attached is a current, quite rough, prototype. It addresses some of the points > raised, but far from all. There's also several XXXs/FIXMEs in it. I changed > the file-ending to .txt to avoid hijacking the CF entry. It looks like this patch has received quite a generous helping of feedback from Andres. I'm setting it to Waiting on Author. On the one hand it looks like there's a lot of work to do on this but on the other hand it sounds like this is a live problem in the field so if it can get done in time for release that would be great but if not then feel free to move it to the next commitfest (which means next release). -- Gregory Stark As Commitfest Manager
Re: Logical replication timeout problem
Hi, On 2023-02-08 10:30:37 -0800, Andres Freund wrote: > On 2023-02-08 10:18:41 -0800, Andres Freund wrote: > > I don't think the syncrep logic in WalSndUpdateProgress really works as-is - > > consider what happens if e.g. the origin filter filters out entire > > transactions. We'll afaics never get to WalSndUpdateProgress(). In some > > cases > > we'll be lucky because we'll return quickly to XLogSendLogical(), but not > > reliably. > > Is it actually the right thing to check SyncRepRequested() in that logic? It's > quite common to set up syncrep so that individual users or transactions opt > into syncrep, but to leave the default disabled. > > I don't really see an alternative to making this depend solely on > sync_standbys_defined. Hacking on a rough prototype how I think this should rather look, I had a few questions / remarks: - We probably need to call UpdateProgress from a bunch of places in decode.c as well? Indicating that we're lagging by a lot, just because all transactions were in another database seems decidedly suboptimal. - Why should lag tracking only be updated at commit like points? That seems like it adds odd discontinuinities? - The mix of skipped_xact and ctx->end_xact in WalSndUpdateProgress() seems somewhat odd. They have very overlapping meanings IMO. - there's no UpdateProgress calls in pgoutput_stream_abort(), but ISTM there should be? It's legit progress. - That's from 6912acc04f0: I find LagTrackerRead(), LagTrackerWrite() quite confusing, naming-wise. IIUC "reading" is about receiving confirmation messages, "writing" about the time the record was generated. ISTM that the current time is a quite poor approximation in XLogSendPhysical(), but pretty much meaningless in WalSndUpdateProgress()? Am I missing something? - Aren't the wal_sender_timeout / 2 checks in WalSndUpdateProgress(), WalSndWriteData() missing wal_sender_timeout <= 0 checks? - I don't really understand why f95d53edged55 added !end_xact to the if condition for ProcessPendingWrites(). Is the theory that we'll end up in an outer loop soon? Attached is a current, quite rough, prototype. It addresses some of the points raised, but far from all. There's also several XXXs/FIXMEs in it. I changed the file-ending to .txt to avoid hijacking the CF entry. Greetings, Andres Freund >From 1d89c84b1465b28ddef8c110500c3744477488df Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 8 Feb 2023 11:57:37 -0800 Subject: [PATCH v1] WIP: Initial sketch of progress update rework --- src/include/replication/logical.h | 7 +- src/include/replication/output_plugin.h | 1 - src/include/replication/reorderbuffer.h | 12 -- src/backend/replication/logical/logical.c | 188 ++ .../replication/logical/reorderbuffer.c | 20 -- src/backend/replication/pgoutput/pgoutput.c | 10 - src/backend/replication/walsender.c | 14 +- 7 files changed, 116 insertions(+), 136 deletions(-) diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h index 5f49554ea05..472f2a5b84c 100644 --- a/src/include/replication/logical.h +++ b/src/include/replication/logical.h @@ -27,8 +27,8 @@ typedef LogicalOutputPluginWriterWrite LogicalOutputPluginWriterPrepareWrite; typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct LogicalDecodingContext *lr, XLogRecPtr Ptr, TransactionId xid, - bool skipped_xact -); + bool did_write, + bool finished_xact); typedef struct LogicalDecodingContext { @@ -105,10 +105,9 @@ typedef struct LogicalDecodingContext */ boolaccept_writes; boolprepared_write; + booldid_write; XLogRecPtr write_location; TransactionId write_xid; - /* Are we processing the end LSN of a transaction? */ - boolend_xact; } LogicalDecodingContext; diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h index 2d89d26586e..b9358e15444 100644 --- a/src/include/replication/output_plugin.h +++ b/src/include/replication/output_plugin.h @@ -245,6 +245,5 @@ typedef struct OutputPluginCallbacks /* Functions in replication/logical/logical.c */ extern void OutputPluginPrepareWrite(struct LogicalDecodingContext *ctx, bool last_write); extern void OutputPluginWrite(struct LogicalDecodingContext *ctx, bool last_write
Re: Logical replication timeout problem
Hi, On 2023-02-08 10:18:41 -0800, Andres Freund wrote: > I don't think the syncrep logic in WalSndUpdateProgress really works as-is - > consider what happens if e.g. the origin filter filters out entire > transactions. We'll afaics never get to WalSndUpdateProgress(). In some cases > we'll be lucky because we'll return quickly to XLogSendLogical(), but not > reliably. Is it actually the right thing to check SyncRepRequested() in that logic? It's quite common to set up syncrep so that individual users or transactions opt into syncrep, but to leave the default disabled. I don't really see an alternative to making this depend solely on sync_standbys_defined. Greetings, Andres Freund
Re: Logical replication timeout problem
Hi, On 2023-02-08 13:36:02 +0530, Amit Kapila wrote: > On Wed, Feb 8, 2023 at 10:57 AM Andres Freund wrote: > > > > On 2023-02-03 10:13:54 +0530, Amit Kapila wrote: > > > I am planning to push this to HEAD sometime next week (by Wednesday). > > > To backpatch this, we need to fix it in some non-standard way, like > > > without introducing a callback which I am not sure is a good idea. If > > > some other committers vote to get this in back branches with that or > > > some different idea that can be backpatched then we can do that > > > separately as well. I don't see this as a must-fix in back branches > > > because we have a workaround (increase timeout) or users can use the > > > streaming option (for >=14). > > > > I just saw the commit go in, and a quick scan over it makes me think neither > > this commit, nor f95d53eded, which unfortunately was already backpatched, is > > the right direction. The wrong direction likely started quite a bit earlier, > > with 024711bb544. > > > > It feels quite fundamentally wrong that bascially every output plugin needs > > to > > call a special function in nearly every callback. > > > > In 024711bb544 there was just one call to OutputPluginUpdateProgress() in > > pgoutput.c. Quite tellingly, it just updated pgoutput, without touching > > test_decoding. > > > > Then a8fd13cab0b added to more calls. 63cf61cdeb7 yet another. > > > > I think the original commit 024711bb544 forgets to call it in > test_decoding and the other commits followed the same and missed to > update test_decoding. I think that's a symptom of the wrong architecture having been chosen. This should *never* have been the task of output plugins. > > I don't think: > > /* > > * Wait until there is no pending write. Also process replies from the other > > * side and check timeouts during that. > > */ > > static void > > ProcessPendingWrites(void) > > > > Is really a good name. What are we processing? > > > > It is for sending the keep_alive message (if required). That is > normally done when we skipped processing a transaction to ensure sync > replication is not delayed. But how is that "processing pending writes"? For me "processing" implies we're doing some analysis on them or such. If we want to write data in WalSndUpdateProgress(), shouldn't we move the common code of WalSndWriteData() and WalSndUpdateProgress() into ProcessPendingWrites()? > It has been discussed previously [1][2] to > extend the WalSndUpdateProgress() interface. Basically, as explained > by Craig [2], this has to be done from plugin as it can do filtering > or there could be other reasons why the output plugin skips all > changes. We used the same interface for sending keep-alive message > when we processed a lot of (DDL) changes without sending anything to > plugin. > > [1] - > https://www.postgresql.org/message-id/20200309183018.tzkzwu635sd366ej%40alap3.anarazel.de > [2] - > https://www.postgresql.org/message-id/CAMsr%2BYE3o8Dt890Q8wTooY2MpN0JvdHqUAHYL-LNhBryXOPaKg%40mail.gmail.com I don't buy that this has to be done by the output plugin. The actual sending out of data happens via the LogicalDecodingContext callbacks, so we very well can know whether we recently did send out data or not. This really is a concern of the LogicalDecodingContext, it has pretty much nothing to do with output plugins. We should remove all calls of OutputPluginUpdateProgress() from pgoutput, and add the necessary calls to LogicalDecodingContext->update_progress() to generic code. And Additionally we should either rename WalSndUpdateProgress(), because it's now doing *far* more than "updating progress", or alternatively, split it into two functions. I don't think the syncrep logic in WalSndUpdateProgress really works as-is - consider what happens if e.g. the origin filter filters out entire transactions. We'll afaics never get to WalSndUpdateProgress(). In some cases we'll be lucky because we'll return quickly to XLogSendLogical(), but not reliably. Greetings, Andres Freund
Re: Logical replication timeout problem
On Wed, Feb 8, 2023 at 10:57 AM Andres Freund wrote: > > On 2023-02-03 10:13:54 +0530, Amit Kapila wrote: > > I am planning to push this to HEAD sometime next week (by Wednesday). > > To backpatch this, we need to fix it in some non-standard way, like > > without introducing a callback which I am not sure is a good idea. If > > some other committers vote to get this in back branches with that or > > some different idea that can be backpatched then we can do that > > separately as well. I don't see this as a must-fix in back branches > > because we have a workaround (increase timeout) or users can use the > > streaming option (for >=14). > > I just saw the commit go in, and a quick scan over it makes me think neither > this commit, nor f95d53eded, which unfortunately was already backpatched, is > the right direction. The wrong direction likely started quite a bit earlier, > with 024711bb544. > > It feels quite fundamentally wrong that bascially every output plugin needs to > call a special function in nearly every callback. > > In 024711bb544 there was just one call to OutputPluginUpdateProgress() in > pgoutput.c. Quite tellingly, it just updated pgoutput, without touching > test_decoding. > > Then a8fd13cab0b added to more calls. 63cf61cdeb7 yet another. > I think the original commit 024711bb544 forgets to call it in test_decoding and the other commits followed the same and missed to update test_decoding. > > This makes no sense. There's lots of output plugins out there. There's an > increasing number of callbacks. This isn't a maintainable path forward. > > > If we want to call something to maintain state, it has to be happening from > central infrastructure. > > > It feels quite odd architecturally that WalSndUpdateProgress() ends up > flushing out writes - that's far far from obvious. > > I don't think: > /* > * Wait until there is no pending write. Also process replies from the other > * side and check timeouts during that. > */ > static void > ProcessPendingWrites(void) > > Is really a good name. What are we processing? > It is for sending the keep_alive message (if required). That is normally done when we skipped processing a transaction to ensure sync replication is not delayed. It has been discussed previously [1][2] to extend the WalSndUpdateProgress() interface. Basically, as explained by Craig [2], this has to be done from plugin as it can do filtering or there could be other reasons why the output plugin skips all changes. We used the same interface for sending keep-alive message when we processed a lot of (DDL) changes without sending anything to plugin. [1] - https://www.postgresql.org/message-id/20200309183018.tzkzwu635sd366ej%40alap3.anarazel.de [2] - https://www.postgresql.org/message-id/CAMsr%2BYE3o8Dt890Q8wTooY2MpN0JvdHqUAHYL-LNhBryXOPaKg%40mail.gmail.com -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
Hi, On 2023-02-03 10:13:54 +0530, Amit Kapila wrote: > I am planning to push this to HEAD sometime next week (by Wednesday). > To backpatch this, we need to fix it in some non-standard way, like > without introducing a callback which I am not sure is a good idea. If > some other committers vote to get this in back branches with that or > some different idea that can be backpatched then we can do that > separately as well. I don't see this as a must-fix in back branches > because we have a workaround (increase timeout) or users can use the > streaming option (for >=14). I just saw the commit go in, and a quick scan over it makes me think neither this commit, nor f95d53eded, which unfortunately was already backpatched, is the right direction. The wrong direction likely started quite a bit earlier, with 024711bb544. It feels quite fundamentally wrong that bascially every output plugin needs to call a special function in nearly every callback. In 024711bb544 there was just one call to OutputPluginUpdateProgress() in pgoutput.c. Quite tellingly, it just updated pgoutput, without touching test_decoding. Then a8fd13cab0b added to more calls. 63cf61cdeb7 yet another. This makes no sense. There's lots of output plugins out there. There's an increasing number of callbacks. This isn't a maintainable path forward. If we want to call something to maintain state, it has to be happening from central infrastructure. It feels quite odd architecturally that WalSndUpdateProgress() ends up flushing out writes - that's far far from obvious. I don't think: /* * Wait until there is no pending write. Also process replies from the other * side and check timeouts during that. */ static void ProcessPendingWrites(void) Is really a good name. What are we processing? What are we actually waiting for - because we don't actually wait for the data to sent out or anything, just that they're in a network buffer. Greetings, Andres Freund
Re: Logical replication timeout problem
On Wed, Feb 1, 2023 at 10:04 AM Amit Kapila wrote: > > On Tue, Jan 31, 2023 at 8:24 PM Ashutosh Bapat > wrote: > > > > On Tue, Jan 31, 2023 at 5:12 PM Amit Kapila wrote: > > > > > > On Tue, Jan 31, 2023 at 5:03 PM Ashutosh Bapat > > > wrote: > > > > > > > > On Tue, Jan 31, 2023 at 4:58 PM Amit Kapila > > > > wrote: > > > > > > > > > Thanks, the patch looks good to me. I have slightly adjusted one of > > > > > the comments and ran pgindent. See attached. As mentioned in the > > > > > commit message, we shouldn't backpatch this as this requires a new > > > > > callback and moreover, users can increase the wal_sender_timeout and > > > > > wal_receiver_timeout to avoid this problem. What do you think? > > > > > > > > The callback and the implementation is all in core. What's the risk > > > > you see in backpatching it? > > > > > > > > > > Because we are changing the exposed structure and which can break > > > existing extensions using it. > > > > Is that because we are adding the new member in the middle of the > > structure? > > > > Not only that but this changes the size of the structure and we want > to avoid that as well in stable branches. See email [1] (you can't > change the struct size either ...). As per my understanding, our usual > practice is to not change the exposed structure's size/definition in > stable branches. > > I am planning to push this to HEAD sometime next week (by Wednesday). To backpatch this, we need to fix it in some non-standard way, like without introducing a callback which I am not sure is a good idea. If some other committers vote to get this in back branches with that or some different idea that can be backpatched then we can do that separately as well. I don't see this as a must-fix in back branches because we have a workaround (increase timeout) or users can use the streaming option (for >=14). -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Tue, Jan 31, 2023 at 8:24 PM Ashutosh Bapat wrote: > > On Tue, Jan 31, 2023 at 5:12 PM Amit Kapila wrote: > > > > On Tue, Jan 31, 2023 at 5:03 PM Ashutosh Bapat > > wrote: > > > > > > On Tue, Jan 31, 2023 at 4:58 PM Amit Kapila > > > wrote: > > > > > > > Thanks, the patch looks good to me. I have slightly adjusted one of > > > > the comments and ran pgindent. See attached. As mentioned in the > > > > commit message, we shouldn't backpatch this as this requires a new > > > > callback and moreover, users can increase the wal_sender_timeout and > > > > wal_receiver_timeout to avoid this problem. What do you think? > > > > > > The callback and the implementation is all in core. What's the risk > > > you see in backpatching it? > > > > > > > Because we are changing the exposed structure and which can break > > existing extensions using it. > > Is that because we are adding the new member in the middle of the > structure? > Not only that but this changes the size of the structure and we want to avoid that as well in stable branches. See email [1] (you can't change the struct size either ...). As per my understanding, our usual practice is to not change the exposed structure's size/definition in stable branches. [1] - https://www.postgresql.org/message-id/2358496.1649168259%40sss.pgh.pa.us -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Wed, Feb 1, 2023 at 4:43 AM Peter Smith wrote: > > Here are my review comments for v13-1. > > == > Commit message > > 1. > The DDLs like Refresh Materialized views that generate lots of temporary > data due to rewrite rules may not be processed by output plugins (for > example pgoutput). So, we won't send keep-alive messages for a long time > while processing such commands and that can lead the subscriber side to > timeout. > > ~ > > SUGGESTION (minor rearranged way to say the same thing) > > For DDLs that generate lots of temporary data due to rewrite rules > (e.g. REFRESH MATERIALIZED VIEW) the output plugins (e.g. pgoutput) > may not be processed for a long time. Since we don't send keep-alive > messages while processing such commands that can lead the subscriber > side to timeout. > Hmm, this makes it less clear and in fact changed the meaning. > ~~~ > > 2. > The commit message says what the problem is, but it doesn’t seem to > describe what this patch does to fix the problem. > I thought it was apparent and the code comments made it clear. > > 4. > +#define CHANGES_THRESHOLD 100 > + > + if (++changes_count >= CHANGES_THRESHOLD) > + { > + rb->update_progress_txn(rb, txn, change->lsn); > + changes_count = 0; > + } > > I was wondering if it would have been simpler to write this code like below. > > Also, by doing it this way the 'changes_count' variable name makes > more sense IMO, otherwise (for current code) maybe it should be called > something like 'changes_since_last_keepalive' > > SUGGESTION > if (++changes_count % CHANGES_THRESHOLD == 0) > rb->update_progress_txn(rb, txn, change->lsn); > I find the current code in the patch clear and easy to understand. -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
Here are my review comments for v13-1. == Commit message 1. The DDLs like Refresh Materialized views that generate lots of temporary data due to rewrite rules may not be processed by output plugins (for example pgoutput). So, we won't send keep-alive messages for a long time while processing such commands and that can lead the subscriber side to timeout. ~ SUGGESTION (minor rearranged way to say the same thing) For DDLs that generate lots of temporary data due to rewrite rules (e.g. REFRESH MATERIALIZED VIEW) the output plugins (e.g. pgoutput) may not be processed for a long time. Since we don't send keep-alive messages while processing such commands that can lead the subscriber side to timeout. ~~~ 2. The commit message says what the problem is, but it doesn’t seem to describe what this patch does to fix the problem. == src/backend/replication/logical/reorderbuffer.c 3. + /* + * It is possible that the data is not sent to downstream for a + * long time either because the output plugin filtered it or there + * is a DDL that generates a lot of data that is not processed by + * the plugin. So, in such cases, the downstream can timeout. To + * avoid that we try to send a keepalive message if required. + * Trying to send a keepalive message after every change has some + * overhead, but testing showed there is no noticeable overhead if + * we do it after every ~100 changes. + */ 3a. "data is not sent to downstream" --> "data is not sent downstream" (?) ~ 3b. "So, in such cases," --> "In such cases," ~~~ 4. +#define CHANGES_THRESHOLD 100 + + if (++changes_count >= CHANGES_THRESHOLD) + { + rb->update_progress_txn(rb, txn, change->lsn); + changes_count = 0; + } I was wondering if it would have been simpler to write this code like below. Also, by doing it this way the 'changes_count' variable name makes more sense IMO, otherwise (for current code) maybe it should be called something like 'changes_since_last_keepalive' SUGGESTION if (++changes_count % CHANGES_THRESHOLD == 0) rb->update_progress_txn(rb, txn, change->lsn); -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical replication timeout problem
On Tue, Jan 31, 2023 at 5:12 PM Amit Kapila wrote: > > On Tue, Jan 31, 2023 at 5:03 PM Ashutosh Bapat > wrote: > > > > On Tue, Jan 31, 2023 at 4:58 PM Amit Kapila wrote: > > > > > Thanks, the patch looks good to me. I have slightly adjusted one of > > > the comments and ran pgindent. See attached. As mentioned in the > > > commit message, we shouldn't backpatch this as this requires a new > > > callback and moreover, users can increase the wal_sender_timeout and > > > wal_receiver_timeout to avoid this problem. What do you think? > > > > The callback and the implementation is all in core. What's the risk > > you see in backpatching it? > > > > Because we are changing the exposed structure and which can break > existing extensions using it. Is that because we are adding the new member in the middle of the structure? Shouldn't extensions provide new libraries with each maintenance release of PG? > > > Customers can adjust the timeouts, but only after the receiver has > > timed out a few times. Replication remains broekn till they notice it > > and adjust timeouts. By that time WAL has piled up. It also takes a > > few attempts to increase timeouts since the time taken by a > > transaction to decode can not be estimated beforehand. All that makes > > it worth back-patching if it's possible. We had a customer who piled > > up GBs of WAL before realising that this is the problem. Their system > > almost came to a halt due to that. > > > > Which version are they using? If they are at >=14, using "streaming = > on" for a subscription should also avoid this problem. 13. -- Best Wishes, Ashutosh Bapat
Re: Logical replication timeout problem
On Tue, Jan 31, 2023 at 5:03 PM Ashutosh Bapat wrote: > > On Tue, Jan 31, 2023 at 4:58 PM Amit Kapila wrote: > > > Thanks, the patch looks good to me. I have slightly adjusted one of > > the comments and ran pgindent. See attached. As mentioned in the > > commit message, we shouldn't backpatch this as this requires a new > > callback and moreover, users can increase the wal_sender_timeout and > > wal_receiver_timeout to avoid this problem. What do you think? > > The callback and the implementation is all in core. What's the risk > you see in backpatching it? > Because we are changing the exposed structure and which can break existing extensions using it. > Customers can adjust the timeouts, but only after the receiver has > timed out a few times. Replication remains broekn till they notice it > and adjust timeouts. By that time WAL has piled up. It also takes a > few attempts to increase timeouts since the time taken by a > transaction to decode can not be estimated beforehand. All that makes > it worth back-patching if it's possible. We had a customer who piled > up GBs of WAL before realising that this is the problem. Their system > almost came to a halt due to that. > Which version are they using? If they are at >=14, using "streaming = on" for a subscription should also avoid this problem. -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Tue, Jan 31, 2023 at 4:58 PM Amit Kapila wrote: > Thanks, the patch looks good to me. I have slightly adjusted one of > the comments and ran pgindent. See attached. As mentioned in the > commit message, we shouldn't backpatch this as this requires a new > callback and moreover, users can increase the wal_sender_timeout and > wal_receiver_timeout to avoid this problem. What do you think? The callback and the implementation is all in core. What's the risk you see in backpatching it? Customers can adjust the timeouts, but only after the receiver has timed out a few times. Replication remains broekn till they notice it and adjust timeouts. By that time WAL has piled up. It also takes a few attempts to increase timeouts since the time taken by a transaction to decode can not be estimated beforehand. All that makes it worth back-patching if it's possible. We had a customer who piled up GBs of WAL before realising that this is the problem. Their system almost came to a halt due to that. -- Best Wishes, Ashutosh Bapat
Re: Logical replication timeout problem
On Tue, Jan 31, 2023 at 2:53 PM wangw.f...@fujitsu.com wrote: > > On Mon, Jan 30, 2023 at 17:50 PM I wrote: > > Attach the new patch. > > When invoking the function ReorderBufferProcessTXN, the threshold-related > counter "changes_count" may have some random value from the previous > transaction's processing. To fix this, I moved the definition of the counter > "changes_count" outside the while-loop and did not use the keyword "static". > > Attach the new patch. > Thanks, the patch looks good to me. I have slightly adjusted one of the comments and ran pgindent. See attached. As mentioned in the commit message, we shouldn't backpatch this as this requires a new callback and moreover, users can increase the wal_sender_timeout and wal_receiver_timeout to avoid this problem. What do you think? -- With Regards, Amit Kapila. v13-0001-Fix-the-logical-replication-timeout-during-large.patch Description: Binary data
RE: Logical replication timeout problem
On Mon, Jan 30, 2023 at 17:50 PM I wrote: > Attach the new patch. When invoking the function ReorderBufferProcessTXN, the threshold-related counter "changes_count" may have some random value from the previous transaction's processing. To fix this, I moved the definition of the counter "changes_count" outside the while-loop and did not use the keyword "static". Attach the new patch. Regards, Wang Wei v11-0001-Fix-the-logical-replication-timeout-during-proce.patch Description: v11-0001-Fix-the-logical-replication-timeout-during-proce.patch
RE: Logical replication timeout problem
On Mon, Jan 30, 2023 at 14:55 PM Amit Kapila wrote: > On Mon, Jan 30, 2023 at 10:36 AM wangw.f...@fujitsu.com > wrote: > > > > On Mon, Jan 30, 2023 11:37 AM Shi, Yu/侍 雨 > wrote: > > > On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com > > > wrote: > > > > Yes, I think you are right. > > Fixed this problem. > > > > + /* > + * Trying to send keepalive message after every change has some > + * overhead, but testing showed there is no noticeable overhead if > + * we do it after every ~100 changes. > + */ > +#define CHANGES_THRESHOLD 100 > + > + if (++changes_count < CHANGES_THRESHOLD) > + return; > ... > + changes_count = 0; > > I think it is better to have this threshold-related code in that > caller as we have in the previous version. Also, let's modify the > comment as follows:" > It is possible that the data is not sent to downstream for a long time > either because the output plugin filtered it or there is a DDL that > generates a lot of data that is not processed by the plugin. So, in > such cases, the downstream can timeout. To avoid that we try to send a > keepalive message if required. Trying to send a keepalive message > after every change has some overhead, but testing showed there is no > noticeable overhead if we do it after every ~100 changes." Changed as suggested. I also removed the comment atop the function update_progress_txn_cb_wrapper to be consistent with the nearby *_cb_wrapper functions. Attach the new patch. Regards, Wang Wei v10-0001-Fix-the-logical-replication-timeout-during-proce.patch Description: v10-0001-Fix-the-logical-replication-timeout-during-proce.patch
Re: Logical replication timeout problem
On Mon, Jan 30, 2023 at 10:36 AM wangw.f...@fujitsu.com wrote: > > On Mon, Jan 30, 2023 11:37 AM Shi, Yu/侍 雨 wrote: > > On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com > > wrote: > > Yes, I think you are right. > Fixed this problem. > + /* + * Trying to send keepalive message after every change has some + * overhead, but testing showed there is no noticeable overhead if + * we do it after every ~100 changes. + */ +#define CHANGES_THRESHOLD 100 + + if (++changes_count < CHANGES_THRESHOLD) + return; ... + changes_count = 0; I think it is better to have this threshold-related code in that caller as we have in the previous version. Also, let's modify the comment as follows:" It is possible that the data is not sent to downstream for a long time either because the output plugin filtered it or there is a DDL that generates a lot of data that is not processed by the plugin. So, in such cases, the downstream can timeout. To avoid that we try to send a keepalive message if required. Trying to send a keepalive message after every change has some overhead, but testing showed there is no noticeable overhead if we do it after every ~100 changes." -- With Regards, Amit Kapila.
RE: Logical replication timeout problem
On Mon, Jan 30, 2023 11:37 AM Shi, Yu/侍 雨 wrote: > On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com > wrote: > > > > I tested a mix transaction of transactional and non-transactional messages > > on > > the current HEAD and reproduced the timeout problem. I think this result is > OK. > > Because when decoding a transaction, non-transactional changes are > processed > > directly and the function WalSndKeepaliveIfNecessary is called, while > > transactional changes are cached and processed after decoding. After > decoding, > > only transactional changes will be processed (in the function > > ReorderBufferProcessTXN), so the timeout problem will still be reproduced. > > > > After applying the v8 patch, the test mentioned above didn't reproduce the > > timeout problem (Attach this test script 'test_with_nontransactional.sh'). > > > > Attach the new patch. > > > > Thanks for updating the patch. Here is a comment. Thanks for your comment. > In update_progress_txn_cb_wrapper(), it looks like we need to reset > changes_count to 0 after calling OutputPluginUpdateProgress(), otherwise > OutputPluginUpdateProgress() will always be called after 100 changes. Yes, I think you are right. Fixed this problem. Attach the new patch. Regards, Wang Wei v9-0001-Fix-the-logical-replication-timeout-during-proces.patch Description: v9-0001-Fix-the-logical-replication-timeout-during-proces.patch
RE: Logical replication timeout problem
On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com wrote: > > I tested a mix transaction of transactional and non-transactional messages on > the current HEAD and reproduced the timeout problem. I think this result is > OK. > Because when decoding a transaction, non-transactional changes are processed > directly and the function WalSndKeepaliveIfNecessary is called, while > transactional changes are cached and processed after decoding. After decoding, > only transactional changes will be processed (in the function > ReorderBufferProcessTXN), so the timeout problem will still be reproduced. > > After applying the v8 patch, the test mentioned above didn't reproduce the > timeout problem (Attach this test script 'test_with_nontransactional.sh'). > > Attach the new patch. > Thanks for updating the patch. Here is a comment. In update_progress_txn_cb_wrapper(), it looks like we need to reset changes_count to 0 after calling OutputPluginUpdateProgress(), otherwise OutputPluginUpdateProgress() will always be called after 100 changes. Regards, Shi yu
RE: Logical replication timeout problem
On Fri, Jan 27, 2023 at 19:55 PM Amit Kapila wrote: > On Fri, Jan 27, 2023 at 5:18 PM houzj.f...@fujitsu.com > wrote: > > > > On Wednesday, January 25, 2023 7:26 PM Amit Kapila > > > > > > > On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com > > > wrote: > > > > > > > > Attach the new patch. > > > > > > > > > > I think the patch missed to handle the case of non-transactional messages > which > > > was previously getting handled. I have tried to address that in the > > > attached. > Is > > > there a reason that shouldn't be handled? > > > > Thanks for updating the patch! > > > > I thought about the non-transactional message. I think it seems fine if we > > don’t handle it for timeout because such message is decoded via: > > > > WalSndLoop > > -XLogSendLogical > > --LogicalDecodingProcessRecord > > ---logicalmsg_decode > > ReorderBufferQueueMessage > > -rb->message() -- //maybe send the message or do nothing here. > > > > After invoking rb->message(), we will directly return to the main > > loop(WalSndLoop) where we will get a chance to call > > WalSndKeepaliveIfNecessary() to avoid the timeout. > > > > Valid point. But this means the previous handling of non-transactional > messages was also redundant. Thanks for the analysis, I think it makes sense. So I removed the handling of non-transactional messages. > > This is a bit different from transactional changes, because for > > transactional > changes, we > > will buffer them and then send every buffered change one by one(via > > ReorderBufferProcessTXN) without going back to the WalSndLoop, so we > don't get > > a chance to send keepalive message if necessary, which is more likely to > > cause > the > > timeout problem. > > > > I will also test the non-transactional message for timeout in case I missed > something. > > > > Okay, thanks. Please see if we can test a mix of transactional and > non-transactional messages as well. I tested a mix transaction of transactional and non-transactional messages on the current HEAD and reproduced the timeout problem. I think this result is OK. Because when decoding a transaction, non-transactional changes are processed directly and the function WalSndKeepaliveIfNecessary is called, while transactional changes are cached and processed after decoding. After decoding, only transactional changes will be processed (in the function ReorderBufferProcessTXN), so the timeout problem will still be reproduced. After applying the v8 patch, the test mentioned above didn't reproduce the timeout problem (Attach this test script 'test_with_nontransactional.sh'). Attach the new patch. Regards, Wang Wei v8-0001-Fix-the-logical-replication-timeout-during-proces.patch Description: v8-0001-Fix-the-logical-replication-timeout-during-proces.patch test_with_nontransactional.sh Description: test_with_nontransactional.sh
Re: Logical replication timeout problem
On Fri, Jan 27, 2023 at 5:18 PM houzj.f...@fujitsu.com wrote: > > On Wednesday, January 25, 2023 7:26 PM Amit Kapila > > > > On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com > > wrote: > > > > > > Attach the new patch. > > > > > > > I think the patch missed to handle the case of non-transactional messages > > which > > was previously getting handled. I have tried to address that in the > > attached. Is > > there a reason that shouldn't be handled? > > Thanks for updating the patch! > > I thought about the non-transactional message. I think it seems fine if we > don’t handle it for timeout because such message is decoded via: > > WalSndLoop > -XLogSendLogical > --LogicalDecodingProcessRecord > ---logicalmsg_decode > ReorderBufferQueueMessage > -rb->message() -- //maybe send the message or do nothing here. > > After invoking rb->message(), we will directly return to the main > loop(WalSndLoop) where we will get a chance to call > WalSndKeepaliveIfNecessary() to avoid the timeout. > Valid point. But this means the previous handling of non-transactional messages was also redundant. > This is a bit different from transactional changes, because for transactional > changes, we > will buffer them and then send every buffered change one by one(via > ReorderBufferProcessTXN) without going back to the WalSndLoop, so we don't get > a chance to send keepalive message if necessary, which is more likely to > cause the > timeout problem. > > I will also test the non-transactional message for timeout in case I missed > something. > Okay, thanks. Please see if we can test a mix of transactional and non-transactional messages as well. -- With Regards, Amit Kapila.
RE: Logical replication timeout problem
On Wednesday, January 25, 2023 7:26 PM Amit Kapila > > On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com > wrote: > > > > Attach the new patch. > > > > I think the patch missed to handle the case of non-transactional messages > which > was previously getting handled. I have tried to address that in the attached. > Is > there a reason that shouldn't be handled? Thanks for updating the patch! I thought about the non-transactional message. I think it seems fine if we don’t handle it for timeout because such message is decoded via: WalSndLoop -XLogSendLogical --LogicalDecodingProcessRecord ---logicalmsg_decode ReorderBufferQueueMessage -rb->message() -- //maybe send the message or do nothing here. After invoking rb->message(), we will directly return to the main loop(WalSndLoop) where we will get a chance to call WalSndKeepaliveIfNecessary() to avoid the timeout. This is a bit different from transactional changes, because for transactional changes, we will buffer them and then send every buffered change one by one(via ReorderBufferProcessTXN) without going back to the WalSndLoop, so we don't get a chance to send keepalive message if necessary, which is more likely to cause the timeout problem. I will also test the non-transactional message for timeout in case I missed something. Best Regards, Hou zj
Re: Logical replication timeout problem
On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com wrote: > > Attach the new patch. > I think the patch missed to handle the case of non-transactional messages which was previously getting handled. I have tried to address that in the attached. Is there a reason that shouldn't be handled? Apart from that changed a few comments. If my understanding is correct, then we need to change the callback update_progress_txn name as well because now it needs to handle both transactional and non-transactional changes. How about update_progress_write? We accordingly need to change the comments for the callback. Additionally, I think we should have a test case to show we don't time out because of not processing non-transactional messages. See pgoutput_message for cases where it doesn't process the message. -- With Regards, Amit Kapila. v7-0001-Fix-the-logical-replication-timeout-during-proces.patch Description: Binary data
Re: Logical replication timeout problem
On Tue, Jan 24, 2023 at 1:45 PM wangw.f...@fujitsu.com wrote: > > On Tues, Jan 24, 2023 at 8:28 AM Peter Smith wrote: > > Hi Hou-san, Here are my review comments for v5-0001. > > Thanks for your comments. ... > > Changed as suggested. > > Attach the new patch. Thanks! Patch v6 LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
RE: Logical replication timeout problem
On Tues, Jan 24, 2023 at 8:28 AM Peter Smith wrote: > Hi Hou-san, Here are my review comments for v5-0001. Thanks for your comments. > == > src/backend/replication/logical/reorderbuffer.c > > 1. > @@ -2446,6 +2452,23 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, > ReorderBufferTXN *txn, > elog(ERROR, "tuplecid value in changequeue"); > break; > } > + > + /* > + * Sending keepalive messages after every change has some overhead, but > + * testing showed there is no noticeable overhead if keepalive is only > + * sent after every ~100 changes. > + */ > +#define CHANGES_THRESHOLD 100 > + > + /* > + * Try to send a keepalive message after every CHANGES_THRESHOLD > + * changes. > + */ > + if (++changes_count >= CHANGES_THRESHOLD) > + { > + rb->update_progress_txn(rb, txn, change); > + changes_count = 0; > + } > > I noticed you put the #define adjacent to the only usage of it, > instead of with the other variable declaration like it was before. > Probably it is better how you have done it, but: > > 1a. > The comment indentation is incorrect. > > ~ > > 1b. > Since the #define is adjacent to its only usage IMO now the 2nd > comment is redundant. So the code can just say > >/* > * Sending keepalive messages after every change has some > overhead, but > * testing showed there is no noticeable overhead if > keepalive is only > * sent after every ~100 changes. > */ > #define CHANGES_THRESHOLD 100 > if (++changes_count >= CHANGES_THRESHOLD) > { > rb->update_progress_txn(rb, txn, change); > changes_count = 0; > } Changed as suggested. Attach the new patch. Regards, Wang Wei v6-0001-Fix-the-logical-replication-timeout-during-proces.patch Description: v6-0001-Fix-the-logical-replication-timeout-during-proces.patch
Re: Logical replication timeout problem
Hi Hou-san, Here are my review comments for v5-0001. == src/backend/replication/logical/reorderbuffer.c 1. @@ -2446,6 +2452,23 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, elog(ERROR, "tuplecid value in changequeue"); break; } + + /* + * Sending keepalive messages after every change has some overhead, but + * testing showed there is no noticeable overhead if keepalive is only + * sent after every ~100 changes. + */ +#define CHANGES_THRESHOLD 100 + + /* + * Try to send a keepalive message after every CHANGES_THRESHOLD + * changes. + */ + if (++changes_count >= CHANGES_THRESHOLD) + { + rb->update_progress_txn(rb, txn, change); + changes_count = 0; + } I noticed you put the #define adjacent to the only usage of it, instead of with the other variable declaration like it was before. Probably it is better how you have done it, but: 1a. The comment indentation is incorrect. ~ 1b. Since the #define is adjacent to its only usage IMO now the 2nd comment is redundant. So the code can just say /* * Sending keepalive messages after every change has some overhead, but * testing showed there is no noticeable overhead if keepalive is only * sent after every ~100 changes. */ #define CHANGES_THRESHOLD 100 if (++changes_count >= CHANGES_THRESHOLD) { rb->update_progress_txn(rb, txn, change); changes_count = 0; } -- Kind Regards, Peter Smith. Fujitsu Australia
RE: Logical replication timeout problem
On Monday, January 23, 2023 8:51 AM Peter Smith wrote: > > Here are my review comments for patch v4-0001 > == > Commit message > > 2. > > The problem is when there is a DDL in a transaction that generates lots of > temporary data due to rewrite rules, these temporary data will not be > processed > by the pgoutput plugin. The previous commit (f95d53e) only fixed timeouts > caused by filtering out changes in pgoutput. Therefore, the previous fix for > DML > had no impact on this case. > > ~ > > IMO this still some rewording to say up-front what the the actual problem -- > i.e. > an avoidable timeout occuring. > > SUGGESTION (or something like this...) > > When there is a DDL in a transaction that generates lots of temporary data due > to rewrite rules, this temporary data will not be processed by the pgoutput > plugin. This means it is possible for a timeout to occur if a sufficiently > long time > elapses since the last pgoutput message. A previous commit (f95d53e) fixed a > similar scenario in this area, but that only fixed timeouts for DML going > through > pgoutput, so it did not address this DDL timeout case. Thanks, I changed the commit message as suggested. > == > src/backend/replication/logical/logical.c > > 3. update_progress_txn_cb_wrapper > > +/* > + * Update progress callback while processing a transaction. > + * > + * Try to update progress and send a keepalive message during sending > +data of a > + * transaction (and its subtransactions) to the output plugin. > + * > + * For a large transaction, if we don't send any change to the > +downstream for a > + * long time (exceeds the wal_receiver_timeout of standby) then it can > timeout. > + * This can happen when all or most of the changes are either not > +published or > + * got filtered out. > + */ > +static void > +update_progress_txn_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN > *txn, > +ReorderBufferChange *change) > > Simplify the "Try to..." paragraph. And other part should also mention about > DDL. > > SUGGESTION > > Try send a keepalive message during transaction processing. > > This is done because if we don't send any change to the downstream for a long > time (exceeds the wal_receiver_timeout of standby), then it can timeout. This > can > happen for large DDL, or for large transactions when all or most of the > changes > are either not published or got filtered out. Changed. > == > .../replication/logical/reorderbuffer.c > > 4. ReorderBufferProcessTXN > > @@ -2105,6 +2105,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, > ReorderBufferTXN *txn, > > PG_TRY(); > { > + /* > + * Static variable used to accumulate the number of changes while > + * processing txn. > + */ > + static int changes_count = 0; > + > + /* > + * Sending keepalive messages after every change has some overhead, but > + * testing showed there is no noticeable overhead if keepalive is only > + * sent after every ~100 changes. > + */ > +#define CHANGES_THRESHOLD 100 > + > > IMO these can be relocated to be declared/defined inside the "while" > loop -- i.e. closer to where they are being used. Moved into the while loop. Attach the new version patch which addressed above comments. Also attach a simple script which use "refresh matview" to reproduce this timeout problem just in case some one want to try to reproduce this. Best regards, Hou zj test.sh Description: test.sh v5-0001-Fix-the-logical-replication-timeout-during-proces.patch Description: v5-0001-Fix-the-logical-replication-timeout-during-proces.patch
Re: Logical replication timeout problem
On Mon, Jan 23, 2023 at 6:21 AM Peter Smith wrote: > > 1. > > It makes no real difference, but I was wondering about: > "update txn progress" versus "update progress txn" > Yeah, I think we can go either way but I still prefer "update progress txn" as that is more closer to LogicalOutputPluginWriterUpdateProgress callback name. > > 5. > > + if (++changes_count >= CHANGES_THRESHOLD) > + { > + rb->update_progress_txn(rb, txn, change); > + changes_count = 0; > + } > > When there is no update_progress function this code is still incurring > some small additional overhead for incrementing and testing the > THRESHOLD every time, and also needlessly calling to the wrapper every > 100x. This overhead could be avoided with a simpler up-front check > like shown below. OTOH, maybe the overhead is insignificant enough > that just leaving the curent code is neater? > As far as built-in logical replication is concerned, it will be defined and I don't know if the overhead will be significant enough in this case. Also, one can say that for the cases it is defined, we are adding this check multiple times (it is already checked inside OutputPluginUpdateProgress). So, I would prefer a neat code here. -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
Here are my review comments for patch v4-0001 == General 1. It makes no real difference, but I was wondering about: "update txn progress" versus "update progress txn" I thought that the first way sounds more natural. YMMV. If you change this then there is impact for the typedef, function names, comments, member names: ReorderBufferUpdateTxnProgressCB --> ReorderBufferUpdateProgressTxnCB “/* update progress txn callback */” --> “/* update txn progress callback */” update_progress_txn_cb_wrapper --> update_txn_progress_cb_wrapper updated_progress_txn --> update_txn_progress == Commit message 2. The problem is when there is a DDL in a transaction that generates lots of temporary data due to rewrite rules, these temporary data will not be processed by the pgoutput plugin. The previous commit (f95d53e) only fixed timeouts caused by filtering out changes in pgoutput. Therefore, the previous fix for DML had no impact on this case. ~ IMO this still some rewording to say up-front what the the actual problem -- i.e. an avoidable timeout occuring. SUGGESTION (or something like this...) When there is a DDL in a transaction that generates lots of temporary data due to rewrite rules, this temporary data will not be processed by the pgoutput plugin. This means it is possible for a timeout to occur if a sufficiently long time elapses since the last pgoutput message. A previous commit (f95d53e) fixed a similar scenario in this area, but that only fixed timeouts for DML going through pgoutput, so it did not address this DDL timeout case. == src/backend/replication/logical/logical.c 3. update_progress_txn_cb_wrapper +/* + * Update progress callback while processing a transaction. + * + * Try to update progress and send a keepalive message during sending data of a + * transaction (and its subtransactions) to the output plugin. + * + * For a large transaction, if we don't send any change to the downstream for a + * long time (exceeds the wal_receiver_timeout of standby) then it can timeout. + * This can happen when all or most of the changes are either not published or + * got filtered out. + */ +static void +update_progress_txn_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, +ReorderBufferChange *change) Simplify the "Try to..." paragraph. And other part should also mention about DDL. SUGGESTION Try send a keepalive message during transaction processing. This is done because if we don't send any change to the downstream for a long time (exceeds the wal_receiver_timeout of standby), then it can timeout. This can happen for large DDL, or for large transactions when all or most of the changes are either not published or got filtered out. == .../replication/logical/reorderbuffer.c 4. ReorderBufferProcessTXN @@ -2105,6 +2105,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, PG_TRY(); { + /* + * Static variable used to accumulate the number of changes while + * processing txn. + */ + static int changes_count = 0; + + /* + * Sending keepalive messages after every change has some overhead, but + * testing showed there is no noticeable overhead if keepalive is only + * sent after every ~100 changes. + */ +#define CHANGES_THRESHOLD 100 + IMO these can be relocated to be declared/defined inside the "while" loop -- i.e. closer to where they are being used. ~~~ 5. + if (++changes_count >= CHANGES_THRESHOLD) + { + rb->update_progress_txn(rb, txn, change); + changes_count = 0; + } When there is no update_progress function this code is still incurring some small additional overhead for incrementing and testing the THRESHOLD every time, and also needlessly calling to the wrapper every 100x. This overhead could be avoided with a simpler up-front check like shown below. OTOH, maybe the overhead is insignificant enough that just leaving the curent code is neater? LogicalDecodingContext *ctx = rb->private_data; ... if (ctx->update_progress_txn && (++changes_count >= CHANGES_THRESHOLD)) { rb->update_progress_txn(rb, txn, change); changes_count = 0; } -- Kind Reagrds, Peter Smith. Fujitsu Australia
RE: Logical replication timeout problem
On Fri, Jan 20, 2023 at 10:10 AM Peter Smith wrote: > Here are some review comments for patch v3-0001. Thanks for your comments. > == > Commit message > > 1. > The problem is when there is a DDL in a transaction that generates lots of > temporary data due to rewrite rules, these temporary data will not be > processed > by the pgoutput - plugin. Therefore, the previous fix (f95d53e) for DML had no > impact on this case. > > ~ > > 1a. > IMO this comment needs to give a bit of background about the original > problem here, rather than just starting with "The problem is" which is > describing the flaws of the previous fix. Added some related message. > ~ > > 1b. > "pgoutput - plugin" -> "pgoutput plugin" ?? Changed. > ~~~ > > 2. > > To fix this, we introduced a new ReorderBuffer callback - > 'ReorderBufferUpdateProgressCB'. This callback is called to try to update the > process after each change has been processed during sending data of a > transaction (and its subtransactions) to the output plugin. > > IIUC it's not really "after each change" - shouldn't this comment > mention something about the CHANGES_THRESHOLD 100? Changed. > ~~~ > > 4. update_progress_cb_wrapper > > +/* > + * Update progress callback > + * > + * Try to update progress and send a keepalive message if too many changes > were > + * processed when processing txn. > + * > + * For a large transaction, if we don't send any change to the downstream > for a > + * long time (exceeds the wal_receiver_timeout of standby) then it can > timeout. > + * This can happen when all or most of the changes are either not published > or > + * got filtered out. > + */ > > SUGGESTION (instead of the "Try to update" sentence) > Send a keepalive message whenever more than > changes are encountered while processing a transaction. Since it's possible that keep-alive messages won't be sent even if the threshold is reached (see function WalSndKeepaliveIfNecessary), I thought it might be better to use "try to". And rewrote the comments here because the threshold logic is moved to the function ReorderBufferProcessTXN. > ~~~ > > 5. > > +static void > +update_progress_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, > +ReorderBufferChange *change) > +{ > + LogicalDecodingContext *ctx = cache->private_data; > + LogicalErrorCallbackState state; > + ErrorContextCallback errcallback; > + static int changes_count = 0; /* Static variable used to accumulate > + * the number of changes while > + * processing txn. */ > + > > IMO this may be more readable if the static 'changes_count' local var > was declared first and separated from the other vars by a blank line. Changed. > ~~~ > > 6. > > + /* > + * We don't want to try sending a keepalive message after processing each > + * change as that can have overhead. Tests revealed that there is no > + * noticeable overhead in doing it after continuously processing 100 or so > + * changes. > + */ > +#define CHANGES_THRESHOLD 100 > > 6a. > I think it might be better to define this right at the top of the > function adjacent to the 'changes_count' variable (e.g. a bit like the > original HEAD code looked) Changed. > ~ > > 6b. > SUGGESTION (for the comment) > Sending keepalive messages after every change has some overhead, but > testing showed there is no noticeable overhead if keepalive is only > sent after every ~100 changes. Changed. > ~~~ > > 7. > > + > + /* > + * After continuously processing CHANGES_THRESHOLD changes, we > + * try to send a keepalive message if required. > + */ > + if (++changes_count >= CHANGES_THRESHOLD) > + { > + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false); > + changes_count = 0; > + } > + > > 7a. > SUGGESTION (for comment) > Send a keepalive message after every CHANGES_THRESHOLD changes. Changed. Regards, Wang Wei
RE: Logical replication timeout problem
On Fri, Jan 20, 2023 at 12:35 PM Amit Kapila wrote: > On Fri, Jan 20, 2023 at 7:40 AM Peter Smith wrote: > > > > Here are some review comments for patch v3-0001. > > > > == > > src/backend/replication/logical/logical.c > > > > 3. forward declaration > > > > +/* update progress callback */ > > +static void update_progress_cb_wrapper(ReorderBuffer *cache, > > +ReorderBufferTXN *txn, > > +ReorderBufferChange *change); > > > > I felt this function wrapper name was a bit misleading... AFAIK every > > other wrapper really does just wrap their respective functions. But > > this one seems a bit different because it calls the wrapped function > > ONLY if some threshold is exceeded. IMO maybe this function could have > > some name that conveys this better: > > > > e.g. update_progress_cb_wrapper_with_threshold > > > > I am wondering whether it would be better to move the threshold logic > to the caller. Previously this logic was inside the function because > it was being invoked from multiple places but now that won't be the > case. Also, then your concern about the name would also be addressed. Agree. Moved the threshold logic to the function ReorderBufferProcessTXN. > > > > ~ > > > > 7b. > > Would it be neater to just call OutputPluginUpdateProgress here instead? > > > > e.g. > > BEFORE > > ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false); > > AFTER > > OutputPluginUpdateProgress(ctx, false); > > > > We already check whether ctx->update_progress is defined or not which > is the only extra job done by OutputPluginUpdateProgress but probably > we can consolidate the checks and directly invoke > OutputPluginUpdateProgress. Changed. Invoke the function OutputPluginUpdateProgress directly in the new callback. Regards, Wang Wei
RE: Logical replication timeout problem
On Thu, Jan 19, 2023 at 19:37 PM Amit Kapila wrote: > On Thu, Jan 19, 2023 at 4:13 PM Ashutosh Bapat > wrote: > > > > On Wed, Jan 18, 2023 at 6:00 PM Amit Kapila > wrote: > > > > > + */ > > > + ReorderBufferUpdateProgressCB update_progress; > > > > > > Are you suggesting changing the name of the above variable? If so, how > > > about apply_progress, progress, or updateprogress? If you don't like > > > any of these then feel free to suggest something else. If we change > > > the variable name then accordingly, we need to update > > > ReorderBufferUpdateProgressCB as well. > > > > > > > I would liked to have all the callback names renamed with prefix > > "rbcb_xxx" so that they have very less chances of conflicting with > > similar names in the code base. But it's probably late to do that :). > > > > How are update_txn_progress since the CB is supposed to be used only > > within a transaction? or update_progress_txn? > > > > Personally, I would prefer 'apply_progress' as it would be similar to > a few other callbacks like apply_change, apply_truncate, or as is > proposed by patch update_progress again because it is similar to > existing callbacks like commit_prepared. If you and others don't like > any of those then we can go for 'update_progress_txn' as well. Anybody > else has an opinion on this? I think 'update_progress_txn' might be better. Because I think this name seems to make it easier to know that this callback is used to update process when processing txn. So, I rename it to 'update_progress_txn'. I have addressed all the comments and here is the new version patch. Regards, Wang Wei v4-0001-Fix-the-logical-replication-timeout-during-proces.patch Description: v4-0001-Fix-the-logical-replication-timeout-during-proces.patch
Re: Logical replication timeout problem
On Fri, Jan 20, 2023 at 3:35 PM Amit Kapila wrote: > > On Fri, Jan 20, 2023 at 7:40 AM Peter Smith wrote: > > > > Here are some review comments for patch v3-0001. > > > > == > > src/backend/replication/logical/logical.c > > > > 3. forward declaration > > > > +/* update progress callback */ > > +static void update_progress_cb_wrapper(ReorderBuffer *cache, > > +ReorderBufferTXN *txn, > > +ReorderBufferChange *change); > > > > I felt this function wrapper name was a bit misleading... AFAIK every > > other wrapper really does just wrap their respective functions. But > > this one seems a bit different because it calls the wrapped function > > ONLY if some threshold is exceeded. IMO maybe this function could have > > some name that conveys this better: > > > > e.g. update_progress_cb_wrapper_with_threshold > > > > I am wondering whether it would be better to move the threshold logic > to the caller. Previously this logic was inside the function because > it was being invoked from multiple places but now that won't be the > case. Also, then your concern about the name would also be addressed. > > > > > ~ > > > > 7b. > > Would it be neater to just call OutputPluginUpdateProgress here instead? > > > > e.g. > > BEFORE > > ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false); > > AFTER > > OutputPluginUpdateProgress(ctx, false); > > > > We already check whether ctx->update_progress is defined or not which > is the only extra job done by OutputPluginUpdateProgress but probably > we can consolidate the checks and directly invoke > OutputPluginUpdateProgress. > Yes, I saw that, but I thought it was better to keep the early exit from update_progress_cb_wrapper, so incurring just one additional boolean check for every 100 changes was not anything to worry about. -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: Logical replication timeout problem
On Fri, Jan 20, 2023 at 7:40 AM Peter Smith wrote: > > Here are some review comments for patch v3-0001. > > == > src/backend/replication/logical/logical.c > > 3. forward declaration > > +/* update progress callback */ > +static void update_progress_cb_wrapper(ReorderBuffer *cache, > +ReorderBufferTXN *txn, > +ReorderBufferChange *change); > > I felt this function wrapper name was a bit misleading... AFAIK every > other wrapper really does just wrap their respective functions. But > this one seems a bit different because it calls the wrapped function > ONLY if some threshold is exceeded. IMO maybe this function could have > some name that conveys this better: > > e.g. update_progress_cb_wrapper_with_threshold > I am wondering whether it would be better to move the threshold logic to the caller. Previously this logic was inside the function because it was being invoked from multiple places but now that won't be the case. Also, then your concern about the name would also be addressed. > > ~ > > 7b. > Would it be neater to just call OutputPluginUpdateProgress here instead? > > e.g. > BEFORE > ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false); > AFTER > OutputPluginUpdateProgress(ctx, false); > We already check whether ctx->update_progress is defined or not which is the only extra job done by OutputPluginUpdateProgress but probably we can consolidate the checks and directly invoke OutputPluginUpdateProgress. -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
Here are some review comments for patch v3-0001. == Commit message 1. The problem is when there is a DDL in a transaction that generates lots of temporary data due to rewrite rules, these temporary data will not be processed by the pgoutput - plugin. Therefore, the previous fix (f95d53e) for DML had no impact on this case. ~ 1a. IMO this comment needs to give a bit of background about the original problem here, rather than just starting with "The problem is" which is describing the flaws of the previous fix. ~ 1b. "pgoutput - plugin" -> "pgoutput plugin" ?? ~~~ 2. To fix this, we introduced a new ReorderBuffer callback - 'ReorderBufferUpdateProgressCB'. This callback is called to try to update the process after each change has been processed during sending data of a transaction (and its subtransactions) to the output plugin. IIUC it's not really "after each change" - shouldn't this comment mention something about the CHANGES_THRESHOLD 100? == src/backend/replication/logical/logical.c 3. forward declaration +/* update progress callback */ +static void update_progress_cb_wrapper(ReorderBuffer *cache, +ReorderBufferTXN *txn, +ReorderBufferChange *change); I felt this function wrapper name was a bit misleading... AFAIK every other wrapper really does just wrap their respective functions. But this one seems a bit different because it calls the wrapped function ONLY if some threshold is exceeded. IMO maybe this function could have some name that conveys this better: e.g. update_progress_cb_wrapper_with_threshold ~~~ 4. update_progress_cb_wrapper +/* + * Update progress callback + * + * Try to update progress and send a keepalive message if too many changes were + * processed when processing txn. + * + * For a large transaction, if we don't send any change to the downstream for a + * long time (exceeds the wal_receiver_timeout of standby) then it can timeout. + * This can happen when all or most of the changes are either not published or + * got filtered out. + */ SUGGESTION (instead of the "Try to update" sentence) Send a keepalive message whenever more than changes are encountered while processing a transaction. ~~~ 5. +static void +update_progress_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, +ReorderBufferChange *change) +{ + LogicalDecodingContext *ctx = cache->private_data; + LogicalErrorCallbackState state; + ErrorContextCallback errcallback; + static int changes_count = 0; /* Static variable used to accumulate + * the number of changes while + * processing txn. */ + IMO this may be more readable if the static 'changes_count' local var was declared first and separated from the other vars by a blank line. ~~~ 6. + /* + * We don't want to try sending a keepalive message after processing each + * change as that can have overhead. Tests revealed that there is no + * noticeable overhead in doing it after continuously processing 100 or so + * changes. + */ +#define CHANGES_THRESHOLD 100 6a. I think it might be better to define this right at the top of the function adjacent to the 'changes_count' variable (e.g. a bit like the original HEAD code looked) ~ 6b. SUGGESTION (for the comment) Sending keepalive messages after every change has some overhead, but testing showed there is no noticeable overhead if keepalive is only sent after every ~100 changes. ~~~ 7. + + /* + * After continuously processing CHANGES_THRESHOLD changes, we + * try to send a keepalive message if required. + */ + if (++changes_count >= CHANGES_THRESHOLD) + { + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false); + changes_count = 0; + } + 7a. SUGGESTION (for comment) Send a keepalive message after every CHANGES_THRESHOLD changes. ~ 7b. Would it be neater to just call OutputPluginUpdateProgress here instead? e.g. BEFORE ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false); AFTER OutputPluginUpdateProgress(ctx, false); -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical replication timeout problem
On Thu, Jan 19, 2023 at 4:13 PM Ashutosh Bapat wrote: > > On Wed, Jan 18, 2023 at 6:00 PM Amit Kapila wrote: > > > + */ > > + ReorderBufferUpdateProgressCB update_progress; > > > > Are you suggesting changing the name of the above variable? If so, how > > about apply_progress, progress, or updateprogress? If you don't like > > any of these then feel free to suggest something else. If we change > > the variable name then accordingly, we need to update > > ReorderBufferUpdateProgressCB as well. > > > > I would liked to have all the callback names renamed with prefix > "rbcb_xxx" so that they have very less chances of conflicting with > similar names in the code base. But it's probably late to do that :). > > How are update_txn_progress since the CB is supposed to be used only > within a transaction? or update_progress_txn? > Personally, I would prefer 'apply_progress' as it would be similar to a few other callbacks like apply_change, apply_truncate, or as is proposed by patch update_progress again because it is similar to existing callbacks like commit_prepared. If you and others don't like any of those then we can go for 'update_progress_txn' as well. Anybody else has an opinion on this? -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Wed, Jan 18, 2023 at 6:00 PM Amit Kapila wrote: > + */ > + ReorderBufferUpdateProgressCB update_progress; > > Are you suggesting changing the name of the above variable? If so, how > about apply_progress, progress, or updateprogress? If you don't like > any of these then feel free to suggest something else. If we change > the variable name then accordingly, we need to update > ReorderBufferUpdateProgressCB as well. > I would liked to have all the callback names renamed with prefix "rbcb_xxx" so that they have very less chances of conflicting with similar names in the code base. But it's probably late to do that :). How are update_txn_progress since the CB is supposed to be used only within a transaction? or update_progress_txn? update_progress_cb_wrapper needs a change of name as well. -- Best Wishes, Ashutosh Bapat
Re: Logical replication timeout problem
On Wed, Jan 18, 2023 at 5:37 PM Ashutosh Bapat wrote: > > On Wed, Jan 18, 2023 at 1:49 PM wangw.f...@fujitsu.com > wrote: > > > > On Wed, Jan 18, 2023 at 13:29 PM Amit Kapila > > wrote: > > > On Tue, Jan 17, 2023 at 6:41 PM Ashutosh Bapat > > > wrote: > > > > > > > > On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila > > > > wrote: > > > > > > > > > > > > > > > > I am a bit worried about the indirections that the wrappers and > > > > > > hooks > > > > > > create. Output plugins call OutputPluginUpdateProgress() in > > > > > > callbacks > > > > > > but I don't see why ReorderBufferProcessTXN() needs a callback to > > > > > > call OutputPluginUpdateProgress. > > > > > > > > > > > > > > > > Yeah, I think we can do it as we are doing the previous approach but > > > > > we need an additional wrapper (update_progress_cb_wrapper()) as the > > > > > current patch has so that we can add error context information. This > > > > > is similar to why we have a wrapper for all other callbacks like > > > > > change_cb_wrapper. > > > > > > > > > > > > > Ultimately OutputPluginUpdateProgress() will be called - which in turn > > > > will call ctx->update_progress. > > > > > > > > > > No, update_progress_cb_wrapper() should directly call > > > ctx->update_progress(). The key reason to have a > > > update_progress_cb_wrapper() is that it allows us to add error context > > > information (see the usage of output_plugin_error_callback). > > > > I think it makes sense. This also avoids the need for every output plugin to > > implement the callback. So I tried to improve the patch based on this > > approach. > > > > And I tried to add some comments for this new callback to distinguish it > > from > > ctx->update_progress. > > Comments don't help when using cscope or some such code browsing tool. > Better to use a different variable name. > + /* + * Callback to be called when updating progress during sending data of a + * transaction (and its subtransactions) to the output plugin. + */ + ReorderBufferUpdateProgressCB update_progress; Are you suggesting changing the name of the above variable? If so, how about apply_progress, progress, or updateprogress? If you don't like any of these then feel free to suggest something else. If we change the variable name then accordingly, we need to update ReorderBufferUpdateProgressCB as well. -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Wed, Jan 18, 2023 at 1:49 PM wangw.f...@fujitsu.com wrote: > > On Wed, Jan 18, 2023 at 13:29 PM Amit Kapila wrote: > > On Tue, Jan 17, 2023 at 6:41 PM Ashutosh Bapat > > wrote: > > > > > > On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila > > > wrote: > > > > > > > > > > > > > I am a bit worried about the indirections that the wrappers and hooks > > > > > create. Output plugins call OutputPluginUpdateProgress() in callbacks > > > > > but I don't see why ReorderBufferProcessTXN() needs a callback to > > > > > call OutputPluginUpdateProgress. > > > > > > > > > > > > > Yeah, I think we can do it as we are doing the previous approach but > > > > we need an additional wrapper (update_progress_cb_wrapper()) as the > > > > current patch has so that we can add error context information. This > > > > is similar to why we have a wrapper for all other callbacks like > > > > change_cb_wrapper. > > > > > > > > > > Ultimately OutputPluginUpdateProgress() will be called - which in turn > > > will call ctx->update_progress. > > > > > > > No, update_progress_cb_wrapper() should directly call > > ctx->update_progress(). The key reason to have a > > update_progress_cb_wrapper() is that it allows us to add error context > > information (see the usage of output_plugin_error_callback). > > I think it makes sense. This also avoids the need for every output plugin to > implement the callback. So I tried to improve the patch based on this > approach. > > And I tried to add some comments for this new callback to distinguish it from > ctx->update_progress. Comments don't help when using cscope or some such code browsing tool. Better to use a different variable name. -- Best Wishes, Ashutosh Bapat
RE: Logical replication timeout problem
On Wed, Jan 18, 2023 at 13:29 PM Amit Kapila wrote: > On Tue, Jan 17, 2023 at 6:41 PM Ashutosh Bapat > wrote: > > > > On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila wrote: > > > > > > > > > > I am a bit worried about the indirections that the wrappers and hooks > > > > create. Output plugins call OutputPluginUpdateProgress() in callbacks > > > > but I don't see why ReorderBufferProcessTXN() needs a callback to > > > > call OutputPluginUpdateProgress. > > > > > > > > > > Yeah, I think we can do it as we are doing the previous approach but > > > we need an additional wrapper (update_progress_cb_wrapper()) as the > > > current patch has so that we can add error context information. This > > > is similar to why we have a wrapper for all other callbacks like > > > change_cb_wrapper. > > > > > > > Ultimately OutputPluginUpdateProgress() will be called - which in turn > > will call ctx->update_progress. > > > > No, update_progress_cb_wrapper() should directly call > ctx->update_progress(). The key reason to have a > update_progress_cb_wrapper() is that it allows us to add error context > information (see the usage of output_plugin_error_callback). I think it makes sense. This also avoids the need for every output plugin to implement the callback. So I tried to improve the patch based on this approach. And I tried to add some comments for this new callback to distinguish it from ctx->update_progress. Attach the new patch. Regards, Wang Wei v3-0001-Fix-the-logical-replication-timeout-during-proces.patch Description: v3-0001-Fix-the-logical-replication-timeout-during-proces.patch
Re: Logical replication timeout problem
On Tue, Jan 17, 2023 at 6:41 PM Ashutosh Bapat wrote: > > On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila wrote: > > > > > > > I am a bit worried about the indirections that the wrappers and hooks > > > create. Output plugins call OutputPluginUpdateProgress() in callbacks > > > but I don't see why ReorderBufferProcessTXN() needs a callback to > > > call OutputPluginUpdateProgress. > > > > > > > Yeah, I think we can do it as we are doing the previous approach but > > we need an additional wrapper (update_progress_cb_wrapper()) as the > > current patch has so that we can add error context information. This > > is similar to why we have a wrapper for all other callbacks like > > change_cb_wrapper. > > > > Ultimately OutputPluginUpdateProgress() will be called - which in turn > will call ctx->update_progress. > No, update_progress_cb_wrapper() should directly call ctx->update_progress(). The key reason to have a update_progress_cb_wrapper() is that it allows us to add error context information (see the usage of output_plugin_error_callback). -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila wrote: > > > > I am a bit worried about the indirections that the wrappers and hooks > > create. Output plugins call OutputPluginUpdateProgress() in callbacks > > but I don't see why ReorderBufferProcessTXN() needs a callback to > > call OutputPluginUpdateProgress. > > > > Yeah, I think we can do it as we are doing the previous approach but > we need an additional wrapper (update_progress_cb_wrapper()) as the > current patch has so that we can add error context information. This > is similar to why we have a wrapper for all other callbacks like > change_cb_wrapper. > Ultimately OutputPluginUpdateProgress() will be called - which in turn will call ctx->update_progress. I don't see wrappers around OutputPluginWrite or OutputPluginPrepareWrite. But I see that those two are called always from output plugin, so indirectly those are called through a wrapper. I also see that update_progress_cb_wrapper() is similar, as far as wrapper is concerned, to ReorderBufferUpdateProgress() in the earlier patch. ReorderBufferUpdateProgress() looks more readable than the wrapper. If we want to keep the wrapper at least we should use a different variable name. update_progress is also there LogicalDecodingContext and will be indirectly called from ReorderBuffer::update_progress. Somebody might think that there's some recursion involved there. That's a mighty confusion. -- Best Wishes, Ashutosh Bapat
Re: Logical replication timeout problem
On Mon, Jan 16, 2023 at 10:06 PM Ashutosh Bapat wrote: > > On Wed, Jan 11, 2023 at 4:11 PM wangw.f...@fujitsu.com > wrote: > > > > On Mon, Jan 9, 2023 at 13:04 PM Amit Kapila wrote: > > > > > > > Thanks for your comments. > > > > > One more thing, I think it would be better to expose a new callback > > > API via reorder buffer as suggested previously [2] similar to other > > > reorder buffer APIs instead of directly using reorderbuffer API to > > > invoke plugin API. > > > > Yes, I agree. I think it would be better to add a new callback API on the > > HEAD. > > So, I improved the fix approach: > > Introduce a new optional callback to update the process. This callback > > function > > is invoked at the end inside the main loop of the function > > ReorderBufferProcessTXN() for each change. In this way, I think it seems > > that > > similar timeout problems could be avoided. > > I am a bit worried about the indirections that the wrappers and hooks > create. Output plugins call OutputPluginUpdateProgress() in callbacks > but I don't see why ReorderBufferProcessTXN() needs a callback to > call OutputPluginUpdateProgress. > Yeah, I think we can do it as we are doing the previous approach but we need an additional wrapper (update_progress_cb_wrapper()) as the current patch has so that we can add error context information. This is similar to why we have a wrapper for all other callbacks like change_cb_wrapper. -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Wed, Jan 11, 2023 at 4:11 PM wangw.f...@fujitsu.com wrote: > > On Mon, Jan 9, 2023 at 13:04 PM Amit Kapila wrote: > > > > Thanks for your comments. > > > One more thing, I think it would be better to expose a new callback > > API via reorder buffer as suggested previously [2] similar to other > > reorder buffer APIs instead of directly using reorderbuffer API to > > invoke plugin API. > > Yes, I agree. I think it would be better to add a new callback API on the > HEAD. > So, I improved the fix approach: > Introduce a new optional callback to update the process. This callback > function > is invoked at the end inside the main loop of the function > ReorderBufferProcessTXN() for each change. In this way, I think it seems that > similar timeout problems could be avoided. I am a bit worried about the indirections that the wrappers and hooks create. Output plugins call OutputPluginUpdateProgress() in callbacks but I don't see why ReorderBufferProcessTXN() needs a callback to call OutputPluginUpdateProgress. I don't think output plugins are going to do anything special with that callback than just call OutputPluginUpdateProgress. Every output plugin will need to implement it and if they do not they will face the timeout problem. That would be unnecessary. Instead ReorderBufferUpdateProgress() in your first patch was more direct and readable. That way the fix works for any output plugin. In fact, I am wondering whether we could have a call in ReorderBufferProcessTxn() at the end of transaction (commit/prepare/commit prepared/abort prepared) instead of the corresponding output plugin callbacks calling OutputPluginUpdateProgress(). -- Best Wishes, Ashutosh Bapat
Re: Logical replication timeout problem
On Mon, Jan 9, 2023 at 4:08 PM wangw.f...@fujitsu.com wrote: > > On Fri, Jan 6, 2023 at 15:06 PM Ashutosh Bapat > wrote: > > I'm not sure if we need to add global variables or member variables for a > cumulative count that is only used here. How would you feel if I add some > comments when declaring this static variable? I see WalSndUpdateProgress::sendTime is static already. So this seems fine. A comment will help sure. > > > + > > +if (!ctx->update_progress) > > +return; > > + > > +Assert(!ctx->fast_forward); > > + > > +/* set output state */ > > +ctx->accept_writes = false; > > +ctx->write_xid = txn->xid; > > +ctx->write_location = change->lsn; > > +ctx->end_xact = false; > > > > This patch reverts many of the changes of the previous commit which tried to > > fix this issue i.e. 8df2374. end_xact was introduced by the same commit > > but > > without much explanation of that in the commit message. Its only user, > > WalSndUpdateProgress(), is probably making a wrong assumption as well. > > > > * We don't have a mechanism to get the ack for any LSN other than end > > * xact LSN from the downstream. So, we track lag only for end of > > * transaction LSN. > > > > IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr > > which is > > sent downstream through a keep alive message. Downstream may > > acknowledge this > > LSN. So we do get ack for any LSN, not just commit LSN. > > > > So I propose removing end_xact as well. > > We didn't track the lag during a transaction because it could make the > calculations of lag functionality inaccurate. If we track every lsn, it could > fail to record important lsn information because of > WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress). > Please see details in [1] and [2]. LagTrackerRead() interpolates to reduce the inaccuracy. I don't understand why we need to track the end LSN only. But I don't think that affects this fix. So I am fine if we want to leave end_xact there. -- Best Wishes, Ashutosh Bapat
RE: Logical replication timeout problem
On Mon, Jan 9, 2023 at 13:04 PM Amit Kapila wrote: > Thanks for your comments. > One more thing, I think it would be better to expose a new callback > API via reorder buffer as suggested previously [2] similar to other > reorder buffer APIs instead of directly using reorderbuffer API to > invoke plugin API. Yes, I agree. I think it would be better to add a new callback API on the HEAD. So, I improved the fix approach: Introduce a new optional callback to update the process. This callback function is invoked at the end inside the main loop of the function ReorderBufferProcessTXN() for each change. In this way, I think it seems that similar timeout problems could be avoided. BTW, I did the performance test for this patch. When running the SQL that reproduces the problem (refresh the materialized view in sync logical replication mode), the running time of new function pgoutput_update_progress is less than 0.1% of the total time. I think this result looks OK. Attach the new patch. Regards, Wang Wei v2-0001-Fix-the-logical-replication-timeout-during-proces.patch Description: v2-0001-Fix-the-logical-replication-timeout-during-proces.patch
RE: Logical replication timeout problem
On Fri, Jan 6, 2023 at 15:06 PM Ashutosh Bapat wrote: > Hi Wang, > Thanks for working on this. One of our customer faced a similar > situation when running BDR with PostgreSQL. > > I tested your patch and it solves the problem. > > Please find some review comments below Thanks for your testing and comments. > +/* > + * Helper function for ReorderBufferProcessTXN for updating progress. > + */ > +static inline void > +ReorderBufferUpdateProgress(ReorderBuffer *rb, ReorderBufferTXN *txn, > +ReorderBufferChange *change) > +{ > +LogicalDecodingContext *ctx = rb->private_data; > +static intchanges_count = 0; > > It's not easy to know that a variable is static when reading the code which > uses it. So it's easy to interpret code wrong. I would probably track it > through logical decoding context itself OR through a global variable like > other > places where we track the last timestamps. But there's more below on this. I'm not sure if we need to add global variables or member variables for a cumulative count that is only used here. How would you feel if I add some comments when declaring this static variable? > + > +if (!ctx->update_progress) > +return; > + > +Assert(!ctx->fast_forward); > + > +/* set output state */ > +ctx->accept_writes = false; > +ctx->write_xid = txn->xid; > +ctx->write_location = change->lsn; > +ctx->end_xact = false; > > This patch reverts many of the changes of the previous commit which tried to > fix this issue i.e. 8df2374. end_xact was introduced by the same commit > but > without much explanation of that in the commit message. Its only user, > WalSndUpdateProgress(), is probably making a wrong assumption as well. > > * We don't have a mechanism to get the ack for any LSN other than end > * xact LSN from the downstream. So, we track lag only for end of > * transaction LSN. > > IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr which > is > sent downstream through a keep alive message. Downstream may > acknowledge this > LSN. So we do get ack for any LSN, not just commit LSN. > > So I propose removing end_xact as well. We didn't track the lag during a transaction because it could make the calculations of lag functionality inaccurate. If we track every lsn, it could fail to record important lsn information because of WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress). Please see details in [1] and [2]. Regards, Wang Wei [1] - https://www.postgresql.org/message-id/OS3PR01MB62755D216245199554DDC8DB9EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/message-id/OS3PR01MB627514AE0B3040D8F55A68B99EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com
Re: Logical replication timeout problem
On Fri, Jan 6, 2023 at 12:35 PM Ashutosh Bapat wrote: > > + > +/* > + * We don't want to try sending a keepalive message after processing each > + * change as that can have overhead. Tests revealed that there is no > + * noticeable overhead in doing it after continuously processing 100 or > so > + * changes. > + */ > +#define CHANGES_THRESHOLD 100 > > I think a time based threashold makes more sense. What if the timeout was > nearing and those 100 changes just took little more time causing a timeout? We > already have a time based threashold in WalSndKeepaliveIfNecessary(). And that > function is invoked after reading every WAL record in WalSndLoop(). So it does > not look like it's an expensive function. If it is expensive we might want to > worry about WalSndLoop as well. Does it make more sense to remove this > threashold? > We have previously tried this for every change [1] and it brings noticeable overhead. In fact, even doing it for every 10 changes also had some overhead which is why we reached this threshold number. I don't think it can lead to timeout due to skipping changes but sure if we see any such report we can further fine-tune this setting or will try to make it time-based but for now I feel it would be safe to use this threshold. > + > +/* > + * After continuously processing CHANGES_THRESHOLD changes, we > + * try to send a keepalive message if required. > + */ > +if (++changes_count >= CHANGES_THRESHOLD) > +{ > +ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, > false); > +changes_count = 0; > +} > +} > + > > On the other thread, I mentioned that we don't have a TAP test for it. > I agree with > Amit's opinion there that it's hard to create a test which will timeout > everywhere. I think what we need is a way to control the time required for > decoding a transaction. > > A rough idea is to induce a small sleep after decoding every change. The > amount > of sleep * number of changes will help us estimate and control the amount of > time taken to decode a transaction. Then we create a transaction which will > take longer than the timeout threashold to decode. But that's a > significant code. I > don't think PostgreSQL has a facility to induce a delay at a particular place > in the code. > Yeah, I don't know how to induce such a delay while decoding changes. One more thing, I think it would be better to expose a new callback API via reorder buffer as suggested previously [2] similar to other reorder buffer APIs instead of directly using reorderbuffer API to invoke plugin API. [1] - https://www.postgresql.org/message-id/OS3PR01MB6275DFFDAC7A59FA148931529E209%40OS3PR01MB6275.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/message-id/CAA4eK1%2BfQjndoBOFUn9Wy0hhm3MLyUWEpcT9O7iuCELktfdBiQ%40mail.gmail.com -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
Hi Wang, Thanks for working on this. One of our customer faced a similar situation when running BDR with PostgreSQL. I tested your patch and it solves the problem. Please find some review comments below On Tue, Nov 8, 2022 at 8:34 AM wangw.f...@fujitsu.com wrote: > > > Attach the patch. > +/* + * Helper function for ReorderBufferProcessTXN for updating progress. + */ +static inline void +ReorderBufferUpdateProgress(ReorderBuffer *rb, ReorderBufferTXN *txn, +ReorderBufferChange *change) +{ +LogicalDecodingContext *ctx = rb->private_data; +static intchanges_count = 0; It's not easy to know that a variable is static when reading the code which uses it. So it's easy to interpret code wrong. I would probably track it through logical decoding context itself OR through a global variable like other places where we track the last timestamps. But there's more below on this. + +if (!ctx->update_progress) +return; + +Assert(!ctx->fast_forward); + +/* set output state */ +ctx->accept_writes = false; +ctx->write_xid = txn->xid; +ctx->write_location = change->lsn; +ctx->end_xact = false; This patch reverts many of the changes of the previous commit which tried to fix this issue i.e. 8df2374. end_xact was introduced by the same commit but without much explanation of that in the commit message. Its only user, WalSndUpdateProgress(), is probably making a wrong assumption as well. * We don't have a mechanism to get the ack for any LSN other than end * xact LSN from the downstream. So, we track lag only for end of * transaction LSN. IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr which is sent downstream through a keep alive message. Downstream may acknowledge this LSN. So we do get ack for any LSN, not just commit LSN. So I propose removing end_xact as well. + +/* + * We don't want to try sending a keepalive message after processing each + * change as that can have overhead. Tests revealed that there is no + * noticeable overhead in doing it after continuously processing 100 or so + * changes. + */ +#define CHANGES_THRESHOLD 100 I think a time based threashold makes more sense. What if the timeout was nearing and those 100 changes just took little more time causing a timeout? We already have a time based threashold in WalSndKeepaliveIfNecessary(). And that function is invoked after reading every WAL record in WalSndLoop(). So it does not look like it's an expensive function. If it is expensive we might want to worry about WalSndLoop as well. Does it make more sense to remove this threashold? + +/* + * After continuously processing CHANGES_THRESHOLD changes, we + * try to send a keepalive message if required. + */ +if (++changes_count >= CHANGES_THRESHOLD) +{ +ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false); +changes_count = 0; +} +} + On the other thread, I mentioned that we don't have a TAP test for it. I agree with Amit's opinion there that it's hard to create a test which will timeout everywhere. I think what we need is a way to control the time required for decoding a transaction. A rough idea is to induce a small sleep after decoding every change. The amount of sleep * number of changes will help us estimate and control the amount of time taken to decode a transaction. Then we create a transaction which will take longer than the timeout threashold to decode. But that's a significant code. I don't think PostgreSQL has a facility to induce a delay at a particular place in the code. -- Best Wishes, Ashutosh Bapat
RE: Logical replication timeout problem
On Fri, Nov 4, 2022 at 18:13 PM Fabrice Chapuis wrote: > Hello Wang, > > I tested the draft patch in my lab for Postgres 14.4, the refresh of the > materialized view ran without generating the timeout on the worker. > Do you plan to propose this patch at the next commit fest. Thanks for your confirmation! I will add this thread to the commit fest soon. The following is the problem analysis and fix approach: I think the problem is when there is a DDL in a transaction that generates lots of temporary data due to rewrite rules, these temporary data will not be processed by the pgoutput - plugin. Therefore, the previous fix (f95d53e) for DML had no impact on this case. To fix this, I think we need to try to send the keepalive messages after each change is processed by walsender, not in the pgoutput-plugin. Attach the patch. Regards, Wang wei v1-0001-Fix-the-logical-replication-timeout-during-proces.patch Description: v1-0001-Fix-the-logical-replication-timeout-during-proces.patch
Re: Logical replication timeout problem
Hello Wang, I tested the draft patch in my lab for Postgres 14.4, the refresh of the materialized view ran without generating the timeout on the worker. Do you plan to propose this patch at the next commit fest. Regards, Fabrice On Wed, Oct 19, 2022 at 10:15 AM wangw.f...@fujitsu.com < wangw.f...@fujitsu.com> wrote: > On Tue, Oct 18, 2022 at 22:35 PM Fabrice Chapuis > wrote: > > Hello Amit, > > > > In version 14.4 the timeout problem for logical replication happens > again despite > > the patch provided for this issue in this version. When bulky > materialized views > > are reloaded it broke logical replication. It is possible to solve this > problem by > > using your new "streaming" option. > > Have you ever had this issue reported to you? > > > > Regards > > > > Fabrice > > > > 2022-10-10 17:19:02 CEST [538424]: [17-1] > > user=postgres,db=dbxxxa00,client=[local] CONTEXT: SQL statement "REFRESH > > MATERIALIZED VIEW sxxxa00.table_base" > > PL/pgSQL function refresh_materialized_view(text) line 5 at > EXECUTE > > 2022-10-10 17:19:02 CEST [538424]: [18-1] > > user=postgres,db=dbxxxa00,client=[local] STATEMENT: select > > refresh_materialized_view('sxxxa00.table_base'); > > 2022-10-10 17:19:02 CEST [538424]: [19-1] > > user=postgres,db=dbxxxa00,client=[local] LOG: duration: 264815.652 > > ms statement: select refresh_materialized_view('sxxxa00.table_base'); > > 2022-10-10 17:19:27 CEST [559156]: [1-1] user=,db=,client= LOG: > automatic > > vacuum of table "dbxxxa00.sxxxa00.table_base": index scans: 0 > > pages: 0 removed, 296589 remain, 0 skipped due to pins, 0 > skipped frozen > > tuples: 0 removed, 48472622 remain, 0 are dead but not yet > removable, > > oldest xmin: 1501528 > > index scan not needed: 0 pages from table (0.00% of total) had 0 > dead item > > identifiers removed > > I/O timings: read: 1.494 ms, write: 0.000 ms > > avg read rate: 0.028 MB/s, avg write rate: 107.952 MB/s > > buffer usage: 593301 hits, 77 misses, 294605 dirtied > > WAL usage: 296644 records, 46119 full page images, 173652718 > bytes > > system usage: CPU: user: 17.26 s, system: 0.29 s, elapsed: 21.32 > s > > 2022-10-10 17:19:28 CEST [559156]: [2-1] user=,db=,client= LOG: > automatic > > analyze of table "dbxxxa00.sxxxa00.table_base" > > I/O timings: read: 0.043 ms, write: 0.000 ms > > avg read rate: 0.026 MB/s, avg write rate: 0.026 MB/s > > buffer usage: 30308 hits, 2 misses, 2 dirtied > > system usage: CPU: user: 0.54 s, system: 0.00 s, elapsed: 0.59 s > > 2022-10-10 17:19:34 CEST [3898111]: [6840-1] user=,db=,client= LOG: > checkpoint > > complete: wrote 1194 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 > recycled; > > write=269.551 s, sync=0.002 s, total=269.560 s; sync files=251, > longest=0.00 > > 1 s, average=0.001 s; distance=583790 kB, estimate=583790 kB > > 2022-10-10 17:20:02 CEST [716163]: [2-1] user=,db=,client= ERROR: > terminating > > logical replication worker due to timeout > > 2022-10-10 17:20:02 CEST [3897921]: [13-1] user=,db=,client= LOG: > background > > worker "logical replication worker" (PID 716163) exited with exit code 1 > > 2022-10-10 17:20:02 CEST [561346]: [1-1] user=,db=,client= LOG: logical > > replication apply worker for subscription "subxxx_sxxxa00" has started > > Thanks for reporting! > > There is one thing I want to confirm: > Is the statement `select refresh_materialized_view('sxxxa00.table_base');` > executed on the publisher-side? > > If so, I think the reason for this timeout problem could be that during DDL > (`REFRESH MATERIALIZED VIEW`), lots of temporary data is generated due to > rewrite. Since these temporary data will not be processed by the pgoutput > plugin, our previous fix for DML had no impact on this case. > I think setting "streaming" option to "on" could work around this problem. > > I tried to write a draft patch (see attachment) on REL_14_4 to fix this. > I tried it locally and it seems to work. > Could you please confirm whether this problem is fixed after applying this > draft patch? > > If this draft patch works, I will improve it and try to fix this problem. > > Regards, > Wang wei >
RE: Logical replication timeout problem
On Thurs, Oct 20, 2022 at 13:47 PM Fabrice Chapuis wrote: > Yes the refresh of MV is on the Publisher Side. > Thanks for your draft patch, I'll try it > I'll back to you as soonas possible Thanks a lot. > One question: why the refresh of the MV is a DDL not a DML? Since in the source, the type of command `REFRESH MATERIALIZED VIEW` is `CMD_UTILITY`, I think this command is DDL (see CmdType in file nodes.h). BTW, after trying to search for DML in the pg-doc, I found the relevant description in the below link: https://www.postgresql.org/docs/devel/logical-replication-publication.html Regards, Wang wei
Re: Logical replication timeout problem
Yes the refresh of MV is on the Publisher Side. Thanks for your draft patch, I'll try it I'll back to you as soonas possible One question: why the refresh of the MV is a DDL not a DML? Regards Fabrice On Wed, 19 Oct 2022, 10:15 wangw.f...@fujitsu.com wrote: > On Tue, Oct 18, 2022 at 22:35 PM Fabrice Chapuis > wrote: > > Hello Amit, > > > > In version 14.4 the timeout problem for logical replication happens > again despite > > the patch provided for this issue in this version. When bulky > materialized views > > are reloaded it broke logical replication. It is possible to solve this > problem by > > using your new "streaming" option. > > Have you ever had this issue reported to you? > > > > Regards > > > > Fabrice > > > > 2022-10-10 17:19:02 CEST [538424]: [17-1] > > user=postgres,db=dbxxxa00,client=[local] CONTEXT: SQL statement "REFRESH > > MATERIALIZED VIEW sxxxa00.table_base" > > PL/pgSQL function refresh_materialized_view(text) line 5 at > EXECUTE > > 2022-10-10 17:19:02 CEST [538424]: [18-1] > > user=postgres,db=dbxxxa00,client=[local] STATEMENT: select > > refresh_materialized_view('sxxxa00.table_base'); > > 2022-10-10 17:19:02 CEST [538424]: [19-1] > > user=postgres,db=dbxxxa00,client=[local] LOG: duration: 264815.652 > > ms statement: select refresh_materialized_view('sxxxa00.table_base'); > > 2022-10-10 17:19:27 CEST [559156]: [1-1] user=,db=,client= LOG: > automatic > > vacuum of table "dbxxxa00.sxxxa00.table_base": index scans: 0 > > pages: 0 removed, 296589 remain, 0 skipped due to pins, 0 > skipped frozen > > tuples: 0 removed, 48472622 remain, 0 are dead but not yet > removable, > > oldest xmin: 1501528 > > index scan not needed: 0 pages from table (0.00% of total) had 0 > dead item > > identifiers removed > > I/O timings: read: 1.494 ms, write: 0.000 ms > > avg read rate: 0.028 MB/s, avg write rate: 107.952 MB/s > > buffer usage: 593301 hits, 77 misses, 294605 dirtied > > WAL usage: 296644 records, 46119 full page images, 173652718 > bytes > > system usage: CPU: user: 17.26 s, system: 0.29 s, elapsed: 21.32 > s > > 2022-10-10 17:19:28 CEST [559156]: [2-1] user=,db=,client= LOG: > automatic > > analyze of table "dbxxxa00.sxxxa00.table_base" > > I/O timings: read: 0.043 ms, write: 0.000 ms > > avg read rate: 0.026 MB/s, avg write rate: 0.026 MB/s > > buffer usage: 30308 hits, 2 misses, 2 dirtied > > system usage: CPU: user: 0.54 s, system: 0.00 s, elapsed: 0.59 s > > 2022-10-10 17:19:34 CEST [3898111]: [6840-1] user=,db=,client= LOG: > checkpoint > > complete: wrote 1194 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 > recycled; > > write=269.551 s, sync=0.002 s, total=269.560 s; sync files=251, > longest=0.00 > > 1 s, average=0.001 s; distance=583790 kB, estimate=583790 kB > > 2022-10-10 17:20:02 CEST [716163]: [2-1] user=,db=,client= ERROR: > terminating > > logical replication worker due to timeout > > 2022-10-10 17:20:02 CEST [3897921]: [13-1] user=,db=,client= LOG: > background > > worker "logical replication worker" (PID 716163) exited with exit code 1 > > 2022-10-10 17:20:02 CEST [561346]: [1-1] user=,db=,client= LOG: logical > > replication apply worker for subscription "subxxx_sxxxa00" has started > > Thanks for reporting! > > There is one thing I want to confirm: > Is the statement `select refresh_materialized_view('sxxxa00.table_base');` > executed on the publisher-side? > > If so, I think the reason for this timeout problem could be that during DDL > (`REFRESH MATERIALIZED VIEW`), lots of temporary data is generated due to > rewrite. Since these temporary data will not be processed by the pgoutput > plugin, our previous fix for DML had no impact on this case. > I think setting "streaming" option to "on" could work around this problem. > > I tried to write a draft patch (see attachment) on REL_14_4 to fix this. > I tried it locally and it seems to work. > Could you please confirm whether this problem is fixed after applying this > draft patch? > > If this draft patch works, I will improve it and try to fix this problem. > > Regards, > Wang wei >
RE: Logical replication timeout problem
On Tue, Oct 18, 2022 at 22:35 PM Fabrice Chapuis wrote: > Hello Amit, > > In version 14.4 the timeout problem for logical replication happens again > despite > the patch provided for this issue in this version. When bulky materialized > views > are reloaded it broke logical replication. It is possible to solve this > problem by > using your new "streaming" option. > Have you ever had this issue reported to you? > > Regards > > Fabrice > > 2022-10-10 17:19:02 CEST [538424]: [17-1] > user=postgres,db=dbxxxa00,client=[local] CONTEXT: SQL statement "REFRESH > MATERIALIZED VIEW sxxxa00.table_base" > PL/pgSQL function refresh_materialized_view(text) line 5 at EXECUTE > 2022-10-10 17:19:02 CEST [538424]: [18-1] > user=postgres,db=dbxxxa00,client=[local] STATEMENT: select > refresh_materialized_view('sxxxa00.table_base'); > 2022-10-10 17:19:02 CEST [538424]: [19-1] > user=postgres,db=dbxxxa00,client=[local] LOG: duration: 264815.652 > ms statement: select refresh_materialized_view('sxxxa00.table_base'); > 2022-10-10 17:19:27 CEST [559156]: [1-1] user=,db=,client= LOG: automatic > vacuum of table "dbxxxa00.sxxxa00.table_base": index scans: 0 > pages: 0 removed, 296589 remain, 0 skipped due to pins, 0 skipped > frozen > tuples: 0 removed, 48472622 remain, 0 are dead but not yet removable, > oldest xmin: 1501528 > index scan not needed: 0 pages from table (0.00% of total) had 0 dead > item > identifiers removed > I/O timings: read: 1.494 ms, write: 0.000 ms > avg read rate: 0.028 MB/s, avg write rate: 107.952 MB/s > buffer usage: 593301 hits, 77 misses, 294605 dirtied > WAL usage: 296644 records, 46119 full page images, 173652718 bytes > system usage: CPU: user: 17.26 s, system: 0.29 s, elapsed: 21.32 s > 2022-10-10 17:19:28 CEST [559156]: [2-1] user=,db=,client= LOG: automatic > analyze of table "dbxxxa00.sxxxa00.table_base" > I/O timings: read: 0.043 ms, write: 0.000 ms > avg read rate: 0.026 MB/s, avg write rate: 0.026 MB/s > buffer usage: 30308 hits, 2 misses, 2 dirtied > system usage: CPU: user: 0.54 s, system: 0.00 s, elapsed: 0.59 s > 2022-10-10 17:19:34 CEST [3898111]: [6840-1] user=,db=,client= LOG: > checkpoint > complete: wrote 1194 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 > recycled; > write=269.551 s, sync=0.002 s, total=269.560 s; sync files=251, longest=0.00 > 1 s, average=0.001 s; distance=583790 kB, estimate=583790 kB > 2022-10-10 17:20:02 CEST [716163]: [2-1] user=,db=,client= ERROR: terminating > logical replication worker due to timeout > 2022-10-10 17:20:02 CEST [3897921]: [13-1] user=,db=,client= LOG: background > worker "logical replication worker" (PID 716163) exited with exit code 1 > 2022-10-10 17:20:02 CEST [561346]: [1-1] user=,db=,client= LOG: logical > replication apply worker for subscription "subxxx_sxxxa00" has started Thanks for reporting! There is one thing I want to confirm: Is the statement `select refresh_materialized_view('sxxxa00.table_base');` executed on the publisher-side? If so, I think the reason for this timeout problem could be that during DDL (`REFRESH MATERIALIZED VIEW`), lots of temporary data is generated due to rewrite. Since these temporary data will not be processed by the pgoutput plugin, our previous fix for DML had no impact on this case. I think setting "streaming" option to "on" could work around this problem. I tried to write a draft patch (see attachment) on REL_14_4 to fix this. I tried it locally and it seems to work. Could you please confirm whether this problem is fixed after applying this draft patch? If this draft patch works, I will improve it and try to fix this problem. Regards, Wang wei 0001-draft-for-REL_14_4.patch Description: 0001-draft-for-REL_14_4.patch
Re: Logical replication timeout problem
Hello Amit, In version 14.4 the timeout problem for logical replication happens again despite the patch provided for this issue in this version. When bulky materialized views are reloaded it broke logical replication. It is possible to solve this problem by using your new "streaming" option. Have you ever had this issue reported to you? Regards Fabrice 2022-10-10 17:19:02 CEST [538424]: [17-1] user=postgres,db=dbxxxa00,client=[local] CONTEXT: SQL statement "REFRESH MATERIALIZED VIEW sxxxa00.table_base" PL/pgSQL function refresh_materialized_view(text) line 5 at EXECUTE 2022-10-10 17:19:02 CEST [538424]: [18-1] user=postgres,db=dbxxxa00,client=[local] STATEMENT: select refresh_materialized_view('sxxxa00.table_base'); 2022-10-10 17:19:02 CEST [538424]: [19-1] user=postgres,db=dbxxxa00,client=[local] LOG: duration: 264815.652 ms statement: select refresh_materialized_view('sxxxa00.table_base'); 2022-10-10 17:19:27 CEST [559156]: [1-1] user=,db=,client= LOG: automatic vacuum of table "dbxxxa00.sxxxa00.table_base": index scans: 0 pages: 0 removed, 296589 remain, 0 skipped due to pins, 0 skipped frozen tuples: 0 removed, 48472622 remain, 0 are dead but not yet removable, oldest xmin: 1501528 index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed I/O timings: read: 1.494 ms, write: 0.000 ms avg read rate: 0.028 MB/s, avg write rate: 107.952 MB/s buffer usage: 593301 hits, 77 misses, 294605 dirtied WAL usage: 296644 records, 46119 full page images, 173652718 bytes system usage: CPU: user: 17.26 s, system: 0.29 s, elapsed: 21.32 s 2022-10-10 17:19:28 CEST [559156]: [2-1] user=,db=,client= LOG: automatic analyze of table "dbxxxa00.sxxxa00.table_base" I/O timings: read: 0.043 ms, write: 0.000 ms avg read rate: 0.026 MB/s, avg write rate: 0.026 MB/s buffer usage: 30308 hits, 2 misses, 2 dirtied system usage: CPU: user: 0.54 s, system: 0.00 s, elapsed: 0.59 s 2022-10-10 17:19:34 CEST [3898111]: [6840-1] user=,db=,client= LOG: checkpoint complete: wrote 1194 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=269.551 s, sync=0.002 s, total=269.560 s; sync files=251, longest=0.00 1 s, average=0.001 s; distance=583790 kB, estimate=583790 kB 2022-10-10 17:20:02 CEST [716163]: [2-1] user=,db=,client= ERROR: terminating logical replication worker due to timeout 2022-10-10 17:20:02 CEST [3897921]: [13-1] user=,db=,client= LOG: background worker "logical replication worker" (PID 716163) exited with exit code 1 2022-10-10 17:20:02 CEST [561346]: [1-1] user=,db=,client= LOG: logical replication apply worker for subscription "subxxx_sxxxa00" has started On Fri, Apr 1, 2022 at 6:09 AM Amit Kapila wrote: > On Fri, Apr 1, 2022 at 8:28 AM Euler Taveira wrote: > > > > On Thu, Mar 31, 2022, at 11:27 PM, Amit Kapila wrote: > > > > This is exactly our initial analysis and we have tried a patch on > > these lines and it has a noticeable overhead. See [1]. Calling this > > for each change or each skipped change can bring noticeable overhead > > that is why we decided to call it after a certain threshold (100) of > > skipped changes. Now, surely as mentioned in my previous reply we can > > make it generic such that instead of calling this (update_progress > > function as in the patch) for skipped cases, we call it always. Will > > that make it better? > > > > That's what I have in mind but using a different approach. > > > > > The functions CreateInitDecodingContext and CreateDecodingContext > receives the > > > update_progress function as a parameter. These functions are called in > 2 > > > places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) > and (b) > > > SQL logical decoding functions (pg_logical_*_changes). Case (a) uses > > > WalSndUpdateProgress as a progress function. Case (b) does not have > one because > > > it is not required -- local decoding/communication. There is no custom > update > > > progress routine for each output plugin which leads me to the question: > > > couldn't we encapsulate the update progress call into the callback > functions? > > > > > > > Sorry, I don't get your point. What exactly do you mean by this? > > AFAIS, currently we call this output plugin API in pgoutput functions > > only, do you intend to get it invoked from a different place? > > > > It seems I didn't make myself clear. The callbacks I'm referring to the > > *_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a > > *_cb_wrapper() function, we have something like: > > > > if (ctx->progress & PGOUTPUT_PROGRESS_FOO) > > NewUpdateProgress(ctx, false); > > > > The NewUpdateProgress function would contain a logic similar to the > > update_progress() from the proposed patch. (A different function name > here just > > to avoid confusion.) > > > > The output plugin is responsible to set ctx->progress with the callback > > variables (for example,
Re: Logical replication timeout problem
On Mon, May 9, 2022 at 2:17 PM Masahiko Sawada wrote: > > The patches look good to me too. > Pushed. -- With Regards, Amit Kapila.
RE: Logical replication timeout problem
On Mon, May 9, 2022 at 11:23 AM Amit Kapila wrote: > On Mon, May 9, 2022 at 7:01 PM Euler Taveira wrote: > > > > On Mon, May 9, 2022, at 3:47 AM, Amit Kapila wrote: > > > > Thanks. The patch LGTM. I'll push and back-patch this after the > > current minor release is done unless there are more comments related > > to this work. > > .. > > Does this issue deserve a test? A small wal_receiver_timeout. Although, I'm > not > > sure how stable the test will be. > > > > Yes, the main part is how to write a stable test because estimating > how many changes are enough for the configured wal_receiver_timeout to > pass on all the buildfarm machines is tricky. Also, I am not sure how > important is to test this behavior because based on this theory we > should have tests for all kinds of timeouts. Yse, I think we could not guarantee the stability of this test. In addition, if we set wal_receiver_timeout too small, it may cause timeout unrelated to this bug. And if we set bigger wal_receiver_timeout and use larger transaction in order to minimize the impact of machine performance, I think this might take some time and might risk making the build farm slow. Regards, Wang wei
Re: Logical replication timeout problem
On Mon, May 9, 2022 at 7:01 PM Euler Taveira wrote: > > On Mon, May 9, 2022, at 3:47 AM, Amit Kapila wrote: > > Thanks. The patch LGTM. I'll push and back-patch this after the > current minor release is done unless there are more comments related > to this work. > > Looks sane to me. (I only tested the HEAD version) > > + boolend_xact = ctx->end_xact; > > Do you really need a new variable here? It has the same name and the new one > isn't changed during the execution. > I think both ways should be okay. I thought the proposed way is okay because it is used in more than one place and is probably slightly easier to follow by having a separate variable. > Does this issue deserve a test? A small wal_receiver_timeout. Although, I'm > not > sure how stable the test will be. > Yes, the main part is how to write a stable test because estimating how many changes are enough for the configured wal_receiver_timeout to pass on all the buildfarm machines is tricky. Also, I am not sure how important is to test this behavior because based on this theory we should have tests for all kinds of timeouts. -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Mon, May 9, 2022, at 3:47 AM, Amit Kapila wrote: > Thanks. The patch LGTM. I'll push and back-patch this after the > current minor release is done unless there are more comments related > to this work. Looks sane to me. (I only tested the HEAD version) + boolend_xact = ctx->end_xact; Do you really need a new variable here? It has the same name and the new one isn't changed during the execution. Does this issue deserve a test? A small wal_receiver_timeout. Although, I'm not sure how stable the test will be. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: Logical replication timeout problem
On Mon, May 9, 2022 at 3:47 PM Amit Kapila wrote: > > On Fri, May 6, 2022 at 12:42 PM wangw.f...@fujitsu.com > wrote: > > > > On Fri, May 6, 2022 at 9:54 AM Masahiko Sawada > > wrote: > > > On Wed, May 4, 2022 at 7:18 PM Amit Kapila > > > wrote: > > > > > > > > On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada > > > wrote: > > > > > > > > > > On Mon, May 2, 2022 at 11:32 AM Amit Kapila > > > wrote: > > > > > > > > > > > > > > > > > > So, shall we go back to the previous approach of using a separate > > > > > > function update_replication_progress? > > > > > > > > > > Ok, agreed. > > > > > > > > > > > > > Attached, please find the updated patch accordingly. Currently, I have > > > > prepared it for HEAD, if you don't see any problem with this, we can > > > > prepare the back-branch patches based on this. > > > > > > Thank you for updating the patch. Looks good to me. > > Thanks for your review. > > > > Improve the back-branch patches according to the discussion. > > > The patches look good to me too. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Logical replication timeout problem
On Fri, May 6, 2022 at 12:42 PM wangw.f...@fujitsu.com wrote: > > On Fri, May 6, 2022 at 9:54 AM Masahiko Sawada wrote: > > On Wed, May 4, 2022 at 7:18 PM Amit Kapila wrote: > > > > > > On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada > > wrote: > > > > > > > > On Mon, May 2, 2022 at 11:32 AM Amit Kapila > > wrote: > > > > > > > > > > > > > > > So, shall we go back to the previous approach of using a separate > > > > > function update_replication_progress? > > > > > > > > Ok, agreed. > > > > > > > > > > Attached, please find the updated patch accordingly. Currently, I have > > > prepared it for HEAD, if you don't see any problem with this, we can > > > prepare the back-branch patches based on this. > > > > Thank you for updating the patch. Looks good to me. > Thanks for your review. > > Improve the back-branch patches according to the discussion. > Thanks. The patch LGTM. I'll push and back-patch this after the current minor release is done unless there are more comments related to this work. -- With Regards, Amit Kapila.
RE: Logical replication timeout problem
On Fri, May 6, 2022 at 9:54 AM Masahiko Sawada wrote: > On Wed, May 4, 2022 at 7:18 PM Amit Kapila wrote: > > > > On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada > wrote: > > > > > > On Mon, May 2, 2022 at 11:32 AM Amit Kapila > wrote: > > > > > > > > > > > > So, shall we go back to the previous approach of using a separate > > > > function update_replication_progress? > > > > > > Ok, agreed. > > > > > > > Attached, please find the updated patch accordingly. Currently, I have > > prepared it for HEAD, if you don't see any problem with this, we can > > prepare the back-branch patches based on this. > > Thank you for updating the patch. Looks good to me. Thanks for your review. Improve the back-branch patches according to the discussion. Move the CHANGES_THRESHOLD logic from function OutputPluginUpdateProgress to new funcion update_replication_progress. In addition, improve all patches formatting with pgindent. Attach the patches. Regards, Wang wei HEAD_v21-0001-Fix-the-logical-replication-timeout-during-large.patch Description: HEAD_v21-0001-Fix-the-logical-replication-timeout-during-large.patch REL14_v4-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL14_v4-0001-Fix-the-logical-replication-timeout-during-large-.patch REL13_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL13_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch REL12_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL12_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch REL11_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL11_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch REL10_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL10_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch
Re: Logical replication timeout problem
On Wed, May 4, 2022 at 7:18 PM Amit Kapila wrote: > > On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada wrote: > > > > On Mon, May 2, 2022 at 11:32 AM Amit Kapila wrote: > > > > > > > > > So, shall we go back to the previous approach of using a separate > > > function update_replication_progress? > > > > Ok, agreed. > > > > Attached, please find the updated patch accordingly. Currently, I have > prepared it for HEAD, if you don't see any problem with this, we can > prepare the back-branch patches based on this. Thank you for updating the patch. Looks good to me. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Logical replication timeout problem
On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada wrote: > > On Mon, May 2, 2022 at 11:32 AM Amit Kapila wrote: > > > > > > So, shall we go back to the previous approach of using a separate > > function update_replication_progress? > > Ok, agreed. > Attached, please find the updated patch accordingly. Currently, I have prepared it for HEAD, if you don't see any problem with this, we can prepare the back-branch patches based on this. -- With Regards, Amit Kapila. v20-0001-Fix-the-logical-replication-timeout-during-large.patch Description: Binary data
Re: Logical replication timeout problem
On Mon, May 2, 2022 at 11:32 AM Amit Kapila wrote: > > On Mon, May 2, 2022 at 7:33 AM Masahiko Sawada wrote: > > On Thu, Apr 28, 2022 at 7:01 PM houzj.f...@fujitsu.com > > wrote: > > > > > > Hi Sawada-san, Wang > > > > > > I was looking at the patch and noticed that we moved some logic from > > > update_replication_progress() to OutputPluginUpdateProgress() like > > > your suggestion. > > > > > > I have a question about this change. In the patch we added some > > > restriction in this function OutputPluginUpdateProgress() like below: > > > > > > + /* > > > + * If we are at the end of transaction LSN, update progress tracking. > > > + * Otherwise, after continuously processing CHANGES_THRESHOLD changes, we > > > + * try to send a keepalive message if required. > > > + */ > > > + if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD) > > > + { > > > + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, > > > + skipped_xact); > > > + changes_count = 0; > > > + } > > > > > > After the patch, we won't be able to always invoke the update_progress() > > > if the > > > caller are at the middle of transaction(e.g. end_xact = false). The > > > bebavior of the > > > public function OutputPluginUpdateProgress() is changed. I am thinking is > > > it ok to > > > change this at back-branches ? > > > > > > Because OutputPluginUpdateProgress() is a public function for the > > > extension > > > developer, just a little concerned about the behavior change here. > > > > Good point. > > > > As you pointed out, it would not be good if we change the behavior of > > OutputPluginUpdateProgress() in back branches. Also, after more > > thought, it is not a good idea even for HEAD since there might be > > background workers that use logical decoding and the timeout checking > > might not be relevant at all with them. > > > > So, shall we go back to the previous approach of using a separate > function update_replication_progress? Ok, agreed. > > > BTW, I think you're concerned about the plugins that call > > OutputPluginUpdateProgress() at the middle of the transaction (i.e., > > end_xact = false). We have the following change that makes the > > walsender not update the progress at the middle of the transaction. Do > > you think it is okay since it's not common usage to call > > OutputPluginUpdateProgress() at the middle of the transaction by the > > plugin that is used by the walsender? > > > > We have done that purposefully as otherwise, the lag tracker shows > incorrect information. See email [1]. The reason is that we always get > ack from subscribers for transaction end. Also, prior to this patch we > never call the lag tracker recording apart from the transaction end, > so as a bug fix we shouldn't try to change it. Make sense. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Logical replication timeout problem
On Mon, May 2, 2022 at 7:33 AM Masahiko Sawada wrote: > On Thu, Apr 28, 2022 at 7:01 PM houzj.f...@fujitsu.com > wrote: > > > > Hi Sawada-san, Wang > > > > I was looking at the patch and noticed that we moved some logic from > > update_replication_progress() to OutputPluginUpdateProgress() like > > your suggestion. > > > > I have a question about this change. In the patch we added some > > restriction in this function OutputPluginUpdateProgress() like below: > > > > + /* > > + * If we are at the end of transaction LSN, update progress tracking. > > + * Otherwise, after continuously processing CHANGES_THRESHOLD changes, we > > + * try to send a keepalive message if required. > > + */ > > + if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD) > > + { > > + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, > > + skipped_xact); > > + changes_count = 0; > > + } > > > > After the patch, we won't be able to always invoke the update_progress() if > > the > > caller are at the middle of transaction(e.g. end_xact = false). The > > bebavior of the > > public function OutputPluginUpdateProgress() is changed. I am thinking is > > it ok to > > change this at back-branches ? > > > > Because OutputPluginUpdateProgress() is a public function for the extension > > developer, just a little concerned about the behavior change here. > > Good point. > > As you pointed out, it would not be good if we change the behavior of > OutputPluginUpdateProgress() in back branches. Also, after more > thought, it is not a good idea even for HEAD since there might be > background workers that use logical decoding and the timeout checking > might not be relevant at all with them. > So, shall we go back to the previous approach of using a separate function update_replication_progress? > BTW, I think you're concerned about the plugins that call > OutputPluginUpdateProgress() at the middle of the transaction (i.e., > end_xact = false). We have the following change that makes the > walsender not update the progress at the middle of the transaction. Do > you think it is okay since it's not common usage to call > OutputPluginUpdateProgress() at the middle of the transaction by the > plugin that is used by the walsender? > We have done that purposefully as otherwise, the lag tracker shows incorrect information. See email [1]. The reason is that we always get ack from subscribers for transaction end. Also, prior to this patch we never call the lag tracker recording apart from the transaction end, so as a bug fix we shouldn't try to change it. [1] - https://www.postgresql.org/message-id/OS3PR01MB62755D216245199554DDC8DB9EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Thu, Apr 28, 2022 at 7:01 PM houzj.f...@fujitsu.com wrote: > > On Wednesday, April 20, 2022 3:21 PM Masahiko Sawada > wrote: > > > > BTW the changes in > > REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch, > > adding end_xact to LogicalDecodingContext, seems good to me and it > > might be better than the approach of v17 patch from plugin developers’ > > perspective? This is because they won’t need to pass true/false to > > end_xact of OutputPluginUpdateProgress(). Furthermore, if we do what > > we do in update_replication_progress() in > > OutputPluginUpdateProgress(), what plugins need to do will be just to > > call OutputPluginUpdate() in every callback and they don't need to > > have the CHANGES_THRESHOLD logic. What do you think? > > Hi Sawada-san, Wang > > I was looking at the patch and noticed that we moved some logic from > update_replication_progress() to OutputPluginUpdateProgress() like > your suggestion. > > I have a question about this change. In the patch we added some > restriction in this function OutputPluginUpdateProgress() like below: > > + /* > + * If we are at the end of transaction LSN, update progress tracking. > + * Otherwise, after continuously processing CHANGES_THRESHOLD changes, we > + * try to send a keepalive message if required. > + */ > + if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD) > + { > + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, > + skipped_xact); > + changes_count = 0; > + } > > After the patch, we won't be able to always invoke the update_progress() if > the > caller are at the middle of transaction(e.g. end_xact = false). The bebavior > of the > public function OutputPluginUpdateProgress() is changed. I am thinking is it > ok to > change this at back-branches ? > > Because OutputPluginUpdateProgress() is a public function for the extension > developer, just a little concerned about the behavior change here. Good point. As you pointed out, it would not be good if we change the behavior of OutputPluginUpdateProgress() in back branches. Also, after more thought, it is not a good idea even for HEAD since there might be background workers that use logical decoding and the timeout checking might not be relevant at all with them. BTW, I think you're concerned about the plugins that call OutputPluginUpdateProgress() at the middle of the transaction (i.e., end_xact = false). We have the following change that makes the walsender not update the progress at the middle of the transaction. Do you think it is okay since it's not common usage to call OutputPluginUpdateProgress() at the middle of the transaction by the plugin that is used by the walsender? #define WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS 1000 - if (!TimestampDifferenceExceeds(sendTime, now, + if (end_xact && TimestampDifferenceExceeds(sendTime, now, WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS)) - return; + { + LagTrackerWrite(lsn, now); + sendTime = now; + } Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: Logical replication timeout problem
On Thur, Apr 28, 2022 at 6:26 PM Amit Kapila wrote: > On Thu, Apr 21, 2022 at 3:21 PM wangw.f...@fujitsu.com > wrote: > > > > I think it is better to keep the new variable 'end_xact' at the end of > the struct where it belongs for HEAD. In back branches, we can keep it > at the place as you have. Apart from that, I have made some cosmetic > changes and changed a few comments in the attached. Let's use this to > prepare patches for back-branches. Thanks for your review and improvement. I improved the back-branch patches according to your modifications. Attach the back-branch patches for REL10~REL14. (Also attach the patch for HEAD, I did not make any changes to this patch.) BTW, I found Hou-san shared some points. After our discussion, I will update the patches if required. Regards, Wang wei HEAD_v19-0001-Fix-the-logical-replication-timeout-during-large.patch Description: HEAD_v19-0001-Fix-the-logical-replication-timeout-during-large.patch REL14_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL14_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch REL13_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL13_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch REL12-REL11_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL12-REL11_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch REL10_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL10_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch
Re: Logical replication timeout problem
On Thu, Apr 21, 2022 at 3:21 PM wangw.f...@fujitsu.com wrote: > I think it is better to keep the new variable 'end_xact' at the end of the struct where it belongs for HEAD. In back branches, we can keep it at the place as you have. Apart from that, I have made some cosmetic changes and changed a few comments in the attached. Let's use this to prepare patches for back-branches. -- With Regards, Amit Kapila. v19-0001-Fix-the-logical-replication-timeout-during-large.patch Description: Binary data
RE: Logical replication timeout problem
On Wednesday, April 20, 2022 3:21 PM Masahiko Sawada wrote: > > BTW the changes in > REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch, > adding end_xact to LogicalDecodingContext, seems good to me and it > might be better than the approach of v17 patch from plugin developers’ > perspective? This is because they won’t need to pass true/false to > end_xact of OutputPluginUpdateProgress(). Furthermore, if we do what > we do in update_replication_progress() in > OutputPluginUpdateProgress(), what plugins need to do will be just to > call OutputPluginUpdate() in every callback and they don't need to > have the CHANGES_THRESHOLD logic. What do you think? Hi Sawada-san, Wang I was looking at the patch and noticed that we moved some logic from update_replication_progress() to OutputPluginUpdateProgress() like your suggestion. I have a question about this change. In the patch we added some restriction in this function OutputPluginUpdateProgress() like below: + /* + * If we are at the end of transaction LSN, update progress tracking. + * Otherwise, after continuously processing CHANGES_THRESHOLD changes, we + * try to send a keepalive message if required. + */ + if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD) + { + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, + skipped_xact); + changes_count = 0; + } After the patch, we won't be able to always invoke the update_progress() if the caller are at the middle of transaction(e.g. end_xact = false). The bebavior of the public function OutputPluginUpdateProgress() is changed. I am thinking is it ok to change this at back-branches ? Because OutputPluginUpdateProgress() is a public function for the extension developer, just a little concerned about the behavior change here. Besides, the check of 'end_xact' and the 'CHANGES_THRESHOLD' seems specified to the pgoutput. I am not very sure that if plugin author also needs these logic(they might want to change the strategy), so I'd like to confirm it with you. Best regards, Hou zj
RE: Logical replication timeout problem
On Wed, Apr 21, 2022 at 10:15 AM I wrote: > The comments by Sawada-San sound reasonable to me. > After doing check, I found that padding in HEAD is the same as in REL14. > So I change the approach of patch for HEAD just like the patch for REL14. Also attach the back-branch patches for REL10~REL13. (REL12 and REL11 patch are the same, so only post one patch for these two branches.) The patch for HEAD: HEAD_v18-0001-Fix-the-logical-replication-timeout-during-large.patch The patch for REL14: REL14_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch The patch for REL13: REL13_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch The patch for REL12 and REL11: REL12-REL11_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch The patch for REL10: REL10_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch BTW, after doing check, I found that padding in REL11~REL13 are similar as HEAD and REL14 (7 bytes padding after fast_forward). But in REL10, the padding is different. There are three parts padding behind the following member variables: - 4 bytes after options - 6 bytes after prepared_write - 4 bytes after write_xid So, in the patches for branches REL11~HEAD, I add the new variable after fast_forward. In the patch for branch REL10, I add the new variable after prepared_write. For each version, the size of the structure does not change after applying the patch. Regards, Wang wei HEAD_v18-0001-Fix-the-logical-replication-timeout-during-large.patch Description: HEAD_v18-0001-Fix-the-logical-replication-timeout-during-large.patch REL14_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL14_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch REL13_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL13_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch REL12-REL11_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL12-REL11_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch REL10_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL10_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch
Re: Logical replication timeout problem
On Thu, Apr 21, 2022 at 11:19 AM Amit Kapila wrote: > > On Wed, Apr 20, 2022 at 6:22 PM Masahiko Sawada wrote: > > > > On Wed, Apr 20, 2022 at 7:12 PM Amit Kapila wrote: > > > > > > > > I think it would > > > be then better to have it in the same place in HEAD as well? > > > > As far as I can see in the v17 patch, which is for HEAD, we don't add > > a variable to LogicalDecodingContext, but did you refer to another > > patch? > > > > No, I thought it is better to follow the same approach in HEAD as > well. Do you see any problem with it? No, that makes sense to me. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Logical replication timeout problem
On Wed, Apr 20, 2022 at 6:22 PM Masahiko Sawada wrote: > > On Wed, Apr 20, 2022 at 7:12 PM Amit Kapila wrote: > > > > > I think it would > > be then better to have it in the same place in HEAD as well? > > As far as I can see in the v17 patch, which is for HEAD, we don't add > a variable to LogicalDecodingContext, but did you refer to another > patch? > No, I thought it is better to follow the same approach in HEAD as well. Do you see any problem with it? -- With Regards, Amit Kapila.
RE: Logical replication timeout problem
On Wed, Apr 20, 2022 at 6:13 PM Amit Kapila wrote: > On Wed, Apr 20, 2022 at 2:38 PM Amit Kapila wrote: > > > > On Wed, Apr 20, 2022 at 12:51 PM Masahiko Sawada > wrote: > > > > > > On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com > > > wrote: > > > > ``` > > > > > > I'm concerned that this 4-byte padding at the end of the struct could > > > depend on platforms (there might be no padding in 32-bit platforms?). > > > > > > > Good point, but ... > > > > > It seems to me that it's better to put it after fast_forward where the > > > new field should fall within the padding space. > > > > > > > Can we add the variable in between the existing variables in the > > structure in the back branches? > > > > I think it should be fine if it falls in the padding space. We have > done similar changes recently in back-branches [1]. I think it would > be then better to have it in the same place in HEAD as well? > > [1] - > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=10520f4346 > 876aad4941797c2255a21bdac74739 Thanks for your comments. The comments by Sawada-San sound reasonable to me. After doing check, I found that padding in HEAD is the same as in REL14. So I change the approach of patch for HEAD just like the patch for REL14. On Wed, Apr 20, 2022 at 3:21 PM Masahiko Sawada wrote: > I'm concerned that this 4-byte padding at the end of the struct could > depend on platforms (there might be no padding in 32-bit platforms?). > It seems to me that it's better to put it after fast_forward where the > new field should fall within the padding space. Fixed. Add new variable after fast_forward. > BTW the changes in > REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch, > adding end_xact to LogicalDecodingContext, seems good to me and it > might be better than the approach of v17 patch from plugin developers’ > perspective? This is because they won’t need to pass true/false to > end_xact of OutputPluginUpdateProgress(). Furthermore, if we do what > we do in update_replication_progress() in > OutputPluginUpdateProgress(), what plugins need to do will be just to > call OutputPluginUpdate() in every callback and they don't need to > have the CHANGES_THRESHOLD logic. What do you think? Change the approach of patch for HEAD. (The size of structure does not change.) Also move the logical of function update_replication_progress to function OutputPluginUpdateProgress. Attach the patches. [suggestion by Sawada-San] 1. Change the position of the new variable in structure. 2. Change the approach of the patch for HEAD. 3. Delete the new function update_replication_progress. Regards, Wang wei HEAD_v18-0001-Fix-the-logical-replication-timeout-during-large.patch Description: HEAD_v18-0001-Fix-the-logical-replication-timeout-during-large.patch REL14_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL14_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch
Re: Logical replication timeout problem
On Wed, Apr 20, 2022 at 7:12 PM Amit Kapila wrote: > > On Wed, Apr 20, 2022 at 2:38 PM Amit Kapila wrote: > > > > On Wed, Apr 20, 2022 at 12:51 PM Masahiko Sawada > > wrote: > > > > > > On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com > > > wrote: > > > > ``` > > > > > > I'm concerned that this 4-byte padding at the end of the struct could > > > depend on platforms (there might be no padding in 32-bit platforms?). > > > > > > > Good point, but ... > > > > > It seems to me that it's better to put it after fast_forward where the > > > new field should fall within the padding space. > > > > > > > Can we add the variable in between the existing variables in the > > structure in the back branches? > > > > I think it should be fine if it falls in the padding space. We have > done similar changes recently in back-branches [1]. Yes. > I think it would > be then better to have it in the same place in HEAD as well? As far as I can see in the v17 patch, which is for HEAD, we don't add a variable to LogicalDecodingContext, but did you refer to another patch? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Logical replication timeout problem
On Wed, Apr 20, 2022 at 2:38 PM Amit Kapila wrote: > > On Wed, Apr 20, 2022 at 12:51 PM Masahiko Sawada > wrote: > > > > On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com > > wrote: > > > ``` > > > > I'm concerned that this 4-byte padding at the end of the struct could > > depend on platforms (there might be no padding in 32-bit platforms?). > > > > Good point, but ... > > > It seems to me that it's better to put it after fast_forward where the > > new field should fall within the padding space. > > > > Can we add the variable in between the existing variables in the > structure in the back branches? > I think it should be fine if it falls in the padding space. We have done similar changes recently in back-branches [1]. I think it would be then better to have it in the same place in HEAD as well? [1] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=10520f4346876aad4941797c2255a21bdac74739 -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Wed, Apr 20, 2022 at 12:51 PM Masahiko Sawada wrote: > > On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com > wrote: > > ``` > > I'm concerned that this 4-byte padding at the end of the struct could > depend on platforms (there might be no padding in 32-bit platforms?). > Good point, but ... > It seems to me that it's better to put it after fast_forward where the > new field should fall within the padding space. > Can we add the variable in between the existing variables in the structure in the back branches? Normally, we add at the end to avoid any breakage of existing apps. See commit 56e366f675 and discussion at [1]. That is related to enum but I think we follow the same for structures. [1] - https://www.postgresql.org/message-id/7dab0929-a966-0c0a-4726-878fced2fe00%40enterprisedb.com -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com wrote: > > On Mon, Apr 18, 2022 at 00:36 PM Amit Kapila wrote: > > On Mon, Apr 18, 2022 at 9:29 AM Amit Kapila wrote: > > > > > > On Thu, Apr 14, 2022 at 5:52 PM Euler Taveira wrote: > > > > > > > > On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote: > > > > > > > > Sawada-San, Euler, do you have any opinion on this approach? I > > > > personally still prefer the approach implemented in v10 [1] > > > > especially due to the latest finding by Wang-San that we can't > > > > update the lag-tracker apart from when it is invoked at the transaction > > > > end. > > > > However, I am fine if we like this approach more. > > > > > > > > It seems v15 is simpler and less error prone than v10. v10 has a mix > > > > of > > > > OutputPluginUpdateProgress() and the new function update_progress(). > > > > The v10 also calls update_progress() for every change action in > > > > pgoutput_change(). It is not a good approach for maintainability -- > > > > new changes like sequences need extra calls. > > > > > > > > > > Okay, let's use the v15 approach as Sawada-San also seems to have a > > > preference for that. > > > > > > > However, as you mentioned there should handle the track lag case. > > > > > > > > Both patches change the OutputPluginUpdateProgress() so it cannot be > > > > backpatched. Are you planning to backpatch it? If so, the boolean > > > > variable (last_write or end_xacts depending of which version you are > > > > considering) could be added to LogicalDecodingContext. > > > > > > > > > > If we add it to LogicalDecodingContext then I think we have to always > > > reset the variable after its use which will make it look ugly and > > > error-prone. I was not thinking to backpatch it because of the API > > > change but I guess if we want to backpatch then we can add it to > > > LogicalDecodingContext for back-branches. I am not sure if that will > > > look committable but surely we can try. > > > > > > > Even, if we want to add the variable in the struct in back-branches, we > > need to > > ensure not to change the size of the struct as it is exposed, see email [1] > > for a > > similar mistake we made in another case. > > > > [1] - https://www.postgresql.org/message- > > id/2358496.1649168259%40sss.pgh.pa.us > Thanks for your comments. > > I did some checks about adding the new variable in LogicalDecodingContext. > I found that because of padding, if we add the new variable at the end of > structure, it dose not make the structure size change. I verified this in > REL_10~REL_14. > > So as suggested by Euler-San and Amit-San, I wrote the patch for REL_14. > Attach > this patch. To prevent patch confusion, the patch for HEAD is also attached. > The patch for REL_14: > REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch > The patch for HEAD: > v17-0001-Fix-the-logical-replication-timeout-during-large.patch > > The following is the details of checks. > On gcc/Linux/x86-64, in REL_14, by looking at the size of each member variable > in the structure LogicalDecodingContext, I found that there are three parts > padding behind the following member variables: > - 7 bytes after fast_forward > - 4 bytes after prepared_write > - 4 bytes after write_xid > > If we add the new variable at the end of structure (bool takes one byte), it > means we will only consume one byte of padding after member write_xid. And > then, at the end of the struct, 3 padding are still required. For easy > understanding, please refer to the following simple calculation. > (In REL14, the size of structure LogicalDecodingContext is 304 bytes.) > Before adding new variable (In REL14): > 8+8+8+8+8+1+168+8+8+8+8+8+8+8+8+1+1+1+1+8+4 = 289 (if padding is not > considered) > +7 +4 +4 = +15 (the padding) > So, the size of structure LogicalDecodingContext is 289+15=304. > After adding new variable (In REL14 with patch): > 8+8+8+8+8+1+168+8+8+8+8+8+8+8+8+1+1+1+1+8+4+1 = 290 (if padding is not > considered) > +7 +4+3 = +14 (the padding) > So, the size of structure LogicalDecodingContext is 290+14=304. > > BTW, the size of structure LogicalDecodingContext in REL_10~REL_13 is 184, > 200, > 200,200 respectively. And I found that at the end of the structure > LogicalDecodingContext, there are always the following members: > ``` > XLogRecPtr write_location; --> 8 > TransactionId write_xid; --> 4 > --> There are 4 padding after write_xid. > ``` I'm concerned that this 4-byte padding at the end of the struct could depend on platforms (there might be no padding in 32-bit platforms?). It seems to me that it's better to put it after fast_forward where the new field should fall within the padding space. BTW the changes in REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch, adding end_xact to LogicalDecodingContext, seems good t
RE: Logical replication timeout problem
On Mon, Apr 18, 2022 at 00:36 PM Amit Kapila wrote: > On Mon, Apr 18, 2022 at 9:29 AM Amit Kapila wrote: > > > > On Thu, Apr 14, 2022 at 5:52 PM Euler Taveira wrote: > > > > > > On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote: > > > > > > Sawada-San, Euler, do you have any opinion on this approach? I > > > personally still prefer the approach implemented in v10 [1] > > > especially due to the latest finding by Wang-San that we can't > > > update the lag-tracker apart from when it is invoked at the transaction > > > end. > > > However, I am fine if we like this approach more. > > > > > > It seems v15 is simpler and less error prone than v10. v10 has a mix > > > of > > > OutputPluginUpdateProgress() and the new function update_progress(). > > > The v10 also calls update_progress() for every change action in > > > pgoutput_change(). It is not a good approach for maintainability -- > > > new changes like sequences need extra calls. > > > > > > > Okay, let's use the v15 approach as Sawada-San also seems to have a > > preference for that. > > > > > However, as you mentioned there should handle the track lag case. > > > > > > Both patches change the OutputPluginUpdateProgress() so it cannot be > > > backpatched. Are you planning to backpatch it? If so, the boolean > > > variable (last_write or end_xacts depending of which version you are > > > considering) could be added to LogicalDecodingContext. > > > > > > > If we add it to LogicalDecodingContext then I think we have to always > > reset the variable after its use which will make it look ugly and > > error-prone. I was not thinking to backpatch it because of the API > > change but I guess if we want to backpatch then we can add it to > > LogicalDecodingContext for back-branches. I am not sure if that will > > look committable but surely we can try. > > > > Even, if we want to add the variable in the struct in back-branches, we need > to > ensure not to change the size of the struct as it is exposed, see email [1] > for a > similar mistake we made in another case. > > [1] - https://www.postgresql.org/message- > id/2358496.1649168259%40sss.pgh.pa.us Thanks for your comments. I did some checks about adding the new variable in LogicalDecodingContext. I found that because of padding, if we add the new variable at the end of structure, it dose not make the structure size change. I verified this in REL_10~REL_14. So as suggested by Euler-San and Amit-San, I wrote the patch for REL_14. Attach this patch. To prevent patch confusion, the patch for HEAD is also attached. The patch for REL_14: REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch The patch for HEAD: v17-0001-Fix-the-logical-replication-timeout-during-large.patch The following is the details of checks. On gcc/Linux/x86-64, in REL_14, by looking at the size of each member variable in the structure LogicalDecodingContext, I found that there are three parts padding behind the following member variables: - 7 bytes after fast_forward - 4 bytes after prepared_write - 4 bytes after write_xid If we add the new variable at the end of structure (bool takes one byte), it means we will only consume one byte of padding after member write_xid. And then, at the end of the struct, 3 padding are still required. For easy understanding, please refer to the following simple calculation. (In REL14, the size of structure LogicalDecodingContext is 304 bytes.) Before adding new variable (In REL14): 8+8+8+8+8+1+168+8+8+8+8+8+8+8+8+1+1+1+1+8+4 = 289 (if padding is not considered) +7 +4 +4 = +15 (the padding) So, the size of structure LogicalDecodingContext is 289+15=304. After adding new variable (In REL14 with patch): 8+8+8+8+8+1+168+8+8+8+8+8+8+8+8+1+1+1+1+8+4+1 = 290 (if padding is not considered) +7 +4+3 = +14 (the padding) So, the size of structure LogicalDecodingContext is 290+14=304. BTW, the size of structure LogicalDecodingContext in REL_10~REL_13 is 184, 200, 200,200 respectively. And I found that at the end of the structure LogicalDecodingContext, there are always the following members: ``` XLogRecPtr write_location; --> 8 TransactionId write_xid; --> 4 --> There are 4 padding after write_xid. ``` It means at the end of structure LogicalDecodingContext, there are 4 bytes padding. So, if we add a new bool type variable (It takes one byte) at the end of the structure LogicalDecodingContext, I think in the current REL_10~REL_14, because of padding, the size of the structure LogicalDecodingContext will not change. Regards, Wang wei REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch v17-0001-Fix-the-logical-replication-timeout-during-large.patch Description: v17-0001-Fix-the-logical-replication-timeout-during-large.patch
RE: Logical replication timeout problem
On Mon, Apr 19, 2022 at 9:32 AM Masahiko Sawada wrote: > Thank you for updating the patch. Thanks for your comments. > + * For a large transaction, if we don't send any change to the > + downstream for a > + * long time(exceeds the wal_receiver_timeout of standby) then it can > timeout. > + * This can happen when all or most of the changes are either not > + published or > + * got filtered out. > > + */ > + if(end_xact || ++changes_count >= CHANGES_THRESHOLD) { > > We need a whitespace before '(' at above two places. The rest looks good to > me. Fix these. Attach the new patch. 1. Fix wrong formatting. [suggestion by Sawada-San] Regards, Wang wei v17-0001-Fix-the-logical-replication-timeout-during-large.patch Description: v17-0001-Fix-the-logical-replication-timeout-during-large.patch
Re: Logical replication timeout problem
On Mon, Apr 18, 2022 at 3:16 PM wangw.f...@fujitsu.com wrote: > > On Mon, Apr 18, 2022 at 00:35 PM Masahiko Sawada > wrote: > > On Mon, Apr 18, 2022 at 1:01 PM Amit Kapila wrote: > > > > > > On Thu, Apr 14, 2022 at 5:50 PM Masahiko Sawada > > wrote: > > > > > > > > On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila > > wrote: > > > > > > > > > > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com > > > > > wrote: > > > > > > > > > > > > So I skip tracking lag during a transaction just like the current > > > > > > HEAD. > > > > > > Attach the new patch. > > > > > > > > > > > > > > > > Thanks, please find the updated patch where I have slightly > > > > > modified the comments. > > > > > > > > > > Sawada-San, Euler, do you have any opinion on this approach? I > > > > > personally still prefer the approach implemented in v10 [1] > > > > > especially due to the latest finding by Wang-San that we can't > > > > > update the lag-tracker apart from when it is invoked at the > > > > > transaction end. > > > > > However, I am fine if we like this approach more. > > > > > > > > Thank you for updating the patch. > > > > > > > > The current patch looks much better than v10 which requires to call > > > > to > > > > update_progress() every path. > > > > > > > > Regarding v15 patch, I'm concerned a bit that the new function name, > > > > update_progress(), is too generic. How about > > > > update_replation_progress() or something more specific name? > > > > > > > > > > Do you intend to say update_replication_progress()? The word > > > 'replation' doesn't make sense to me. I am fine with this suggestion. > > > > Yeah, that was a typo. I meant update_replication_progress(). > Thanks for your comments. > > > > > Regarding v15 patch, I'm concerned a bit that the new function name, > > > > update_progress(), is too generic. How about > > > > update_replation_progress() or something more specific name? > Improve as suggested. Change the name from update_progress to > update_replication_progress. > > > > > --- > > > > +if (end_xact) > > > > +{ > > > > +/* Update progress tracking at xact end. */ > > > > +OutputPluginUpdateProgress(ctx, skipped_xact, > > > > end_xact); > > > > +changes_count = 0; > > > > +return; > > > > +} > > > > + > > > > +/* > > > > + * After continuously processing CHANGES_THRESHOLD changes, > > > > we try to send > > > > + * a keepalive message if required. > > > > + * > > > > + * We don't want to try sending a keepalive message after > > > > processing each > > > > + * change as that can have overhead. Testing reveals that > > > > there is no > > > > + * noticeable overhead in doing it after continuously > > > > processing 100 or so > > > > + * changes. > > > > + */ > > > > +#define CHANGES_THRESHOLD 100 > > > > +if (++changes_count >= CHANGES_THRESHOLD) > > > > +{ > > > > +OutputPluginUpdateProgress(ctx, skipped_xact, > > > > end_xact); > > > > +changes_count = 0; > > > > +} > > > > > > > > Can we merge two if branches since we do the same things? Or did you > > > > separate them for better readability? > Improve as suggested. Merge two if-branches. > > Attach the new patch. > 1. Rename the new function(update_progress) to update_replication_progress. > [suggestion by Sawada-San] > 2. Merge two if-branches in new function update_replication_progress. > [suggestion by Sawada-San.] > 3. Improve comments to make them clear. [suggestions by Euler-San.] Thank you for updating the patch. + * For a large transaction, if we don't send any change to the downstream for a + * long time(exceeds the wal_receiver_timeout of standby) then it can timeout. + * This can happen when all or most of the changes are either not published or + * got filtered out. + */ + if(end_xact || ++changes_count >= CHANGES_THRESHOLD) + { We need a whitespace before '(' at above two places. The rest looks good to me. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: Logical replication timeout problem
On Thur, Apr 14, 2022 at 8:21 PM Euler Taveira wrote: > Thanks for your comments. > + * For a large transaction, if we don't send any change to the downstream > for a > + * long time then it can timeout. This can happen when all or most of the > + * changes are either not published or got filtered out. > > We should probable mention that "long time" is wal_receiver_timeout on > subscriber. Improve as suggested. Add "(exceeds the wal_receiver_timeout of standby)" to explain what "long time" means. > +* change as that can have overhead. Testing reveals that there is no > +* noticeable overhead in doing it after continuously processing 100 or so > +* changes. > > Tests revealed that ... Improve as suggested. > +* We don't have a mechanism to get the ack for any LSN other than end > xact > +* lsn from the downstream. So, we track lag only for end xact lsn's. > > s/lsn/LSN/ and s/lsn's/LSNs/ > > I would say "end of transaction LSN". Improve as suggested. > + * If too many changes are processed then try to send a keepalive message to > + * receiver to avoid timeouts. > > In logical replication, if too many changes are processed then try to send a > keepalive message. It might avoid a timeout in the subscriber. Improve as suggested. Kindly have a look at new patch shared in [1]. [1] - https://www.postgresql.org/message-id/OS3PR01MB627561344A2C7ECF68E41D7E9EF39%40OS3PR01MB6275.jpnprd01.prod.outlook.com Regards, Wang wei
RE: Logical replication timeout problem
On Mon, Apr 18, 2022 at 00:35 PM Masahiko Sawada wrote: > On Mon, Apr 18, 2022 at 1:01 PM Amit Kapila wrote: > > > > On Thu, Apr 14, 2022 at 5:50 PM Masahiko Sawada > wrote: > > > > > > On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila > wrote: > > > > > > > > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com > > > > wrote: > > > > > > > > > > So I skip tracking lag during a transaction just like the current > > > > > HEAD. > > > > > Attach the new patch. > > > > > > > > > > > > > Thanks, please find the updated patch where I have slightly > > > > modified the comments. > > > > > > > > Sawada-San, Euler, do you have any opinion on this approach? I > > > > personally still prefer the approach implemented in v10 [1] > > > > especially due to the latest finding by Wang-San that we can't > > > > update the lag-tracker apart from when it is invoked at the transaction > > > > end. > > > > However, I am fine if we like this approach more. > > > > > > Thank you for updating the patch. > > > > > > The current patch looks much better than v10 which requires to call > > > to > > > update_progress() every path. > > > > > > Regarding v15 patch, I'm concerned a bit that the new function name, > > > update_progress(), is too generic. How about > > > update_replation_progress() or something more specific name? > > > > > > > Do you intend to say update_replication_progress()? The word > > 'replation' doesn't make sense to me. I am fine with this suggestion. > > Yeah, that was a typo. I meant update_replication_progress(). Thanks for your comments. > > > Regarding v15 patch, I'm concerned a bit that the new function name, > > > update_progress(), is too generic. How about > > > update_replation_progress() or something more specific name? Improve as suggested. Change the name from update_progress to update_replication_progress. > > > --- > > > +if (end_xact) > > > +{ > > > +/* Update progress tracking at xact end. */ > > > +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact); > > > +changes_count = 0; > > > +return; > > > +} > > > + > > > +/* > > > + * After continuously processing CHANGES_THRESHOLD changes, > > > we try to send > > > + * a keepalive message if required. > > > + * > > > + * We don't want to try sending a keepalive message after > > > processing each > > > + * change as that can have overhead. Testing reveals that there > > > is no > > > + * noticeable overhead in doing it after continuously > > > processing 100 or so > > > + * changes. > > > + */ > > > +#define CHANGES_THRESHOLD 100 > > > +if (++changes_count >= CHANGES_THRESHOLD) > > > +{ > > > +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact); > > > +changes_count = 0; > > > +} > > > > > > Can we merge two if branches since we do the same things? Or did you > > > separate them for better readability? Improve as suggested. Merge two if-branches. Attach the new patch. 1. Rename the new function(update_progress) to update_replication_progress. [suggestion by Sawada-San] 2. Merge two if-branches in new function update_replication_progress. [suggestion by Sawada-San.] 3. Improve comments to make them clear. [suggestions by Euler-San.] Regards, Wang wei v16-0001-Fix-the-logical-replication-timeout-during-large.patch Description: v16-0001-Fix-the-logical-replication-timeout-during-large.patch
Re: Logical replication timeout problem
On Mon, Apr 18, 2022 at 9:29 AM Amit Kapila wrote: > > On Thu, Apr 14, 2022 at 5:52 PM Euler Taveira wrote: > > > > On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote: > > > > Sawada-San, Euler, do you have any opinion on this approach? I > > personally still prefer the approach implemented in v10 [1] especially > > due to the latest finding by Wang-San that we can't update the > > lag-tracker apart from when it is invoked at the transaction end. > > However, I am fine if we like this approach more. > > > > It seems v15 is simpler and less error prone than v10. v10 has a mix of > > OutputPluginUpdateProgress() and the new function update_progress(). The v10 > > also calls update_progress() for every change action in pgoutput_change(). > > It > > is not a good approach for maintainability -- new changes like sequences > > need > > extra calls. > > > > Okay, let's use the v15 approach as Sawada-San also seems to have a > preference for that. > > > However, as you mentioned there should handle the track lag case. > > > > Both patches change the OutputPluginUpdateProgress() so it cannot be > > backpatched. Are you planning to backpatch it? If so, the boolean variable > > (last_write or end_xacts depending of which version you are considering) > > could > > be added to LogicalDecodingContext. > > > > If we add it to LogicalDecodingContext then I think we have to always > reset the variable after its use which will make it look ugly and > error-prone. I was not thinking to backpatch it because of the API > change but I guess if we want to backpatch then we can add it to > LogicalDecodingContext for back-branches. I am not sure if that will > look committable but surely we can try. > Even, if we want to add the variable in the struct in back-branches, we need to ensure not to change the size of the struct as it is exposed, see email [1] for a similar mistake we made in another case. [1] - https://www.postgresql.org/message-id/2358496.1649168259%40sss.pgh.pa.us -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Mon, Apr 18, 2022 at 1:01 PM Amit Kapila wrote: > > On Thu, Apr 14, 2022 at 5:50 PM Masahiko Sawada wrote: > > > > On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila wrote: > > > > > > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com > > > wrote: > > > > > > > > So I skip tracking lag during a transaction just like the current HEAD. > > > > Attach the new patch. > > > > > > > > > > Thanks, please find the updated patch where I have slightly modified > > > the comments. > > > > > > Sawada-San, Euler, do you have any opinion on this approach? I > > > personally still prefer the approach implemented in v10 [1] especially > > > due to the latest finding by Wang-San that we can't update the > > > lag-tracker apart from when it is invoked at the transaction end. > > > However, I am fine if we like this approach more. > > > > Thank you for updating the patch. > > > > The current patch looks much better than v10 which requires to call to > > update_progress() every path. > > > > Regarding v15 patch, I'm concerned a bit that the new function name, > > update_progress(), is too generic. How about > > update_replation_progress() or something more specific name? > > > > Do you intend to say update_replication_progress()? The word > 'replation' doesn't make sense to me. I am fine with this suggestion. Yeah, that was a typo. I meant update_replication_progress(). Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Logical replication timeout problem
On Thu, Apr 14, 2022 at 5:50 PM Masahiko Sawada wrote: > > On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila wrote: > > > > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com > > wrote: > > > > > > So I skip tracking lag during a transaction just like the current HEAD. > > > Attach the new patch. > > > > > > > Thanks, please find the updated patch where I have slightly modified > > the comments. > > > > Sawada-San, Euler, do you have any opinion on this approach? I > > personally still prefer the approach implemented in v10 [1] especially > > due to the latest finding by Wang-San that we can't update the > > lag-tracker apart from when it is invoked at the transaction end. > > However, I am fine if we like this approach more. > > Thank you for updating the patch. > > The current patch looks much better than v10 which requires to call to > update_progress() every path. > > Regarding v15 patch, I'm concerned a bit that the new function name, > update_progress(), is too generic. How about > update_replation_progress() or something more specific name? > Do you intend to say update_replication_progress()? The word 'replation' doesn't make sense to me. I am fine with this suggestion. > > --- > +if (end_xact) > +{ > +/* Update progress tracking at xact end. */ > +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact); > +changes_count = 0; > +return; > +} > + > +/* > + * After continuously processing CHANGES_THRESHOLD changes, > we try to send > + * a keepalive message if required. > + * > + * We don't want to try sending a keepalive message after > processing each > + * change as that can have overhead. Testing reveals that there is no > + * noticeable overhead in doing it after continuously > processing 100 or so > + * changes. > + */ > +#define CHANGES_THRESHOLD 100 > +if (++changes_count >= CHANGES_THRESHOLD) > +{ > +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact); > +changes_count = 0; > +} > > Can we merge two if branches since we do the same things? Or did you > separate them for better readability? > I think it is fine to merge the two checks. -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Thu, Apr 14, 2022 at 5:52 PM Euler Taveira wrote: > > On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote: > > Sawada-San, Euler, do you have any opinion on this approach? I > personally still prefer the approach implemented in v10 [1] especially > due to the latest finding by Wang-San that we can't update the > lag-tracker apart from when it is invoked at the transaction end. > However, I am fine if we like this approach more. > > It seems v15 is simpler and less error prone than v10. v10 has a mix of > OutputPluginUpdateProgress() and the new function update_progress(). The v10 > also calls update_progress() for every change action in pgoutput_change(). It > is not a good approach for maintainability -- new changes like sequences need > extra calls. > Okay, let's use the v15 approach as Sawada-San also seems to have a preference for that. > However, as you mentioned there should handle the track lag case. > > Both patches change the OutputPluginUpdateProgress() so it cannot be > backpatched. Are you planning to backpatch it? If so, the boolean variable > (last_write or end_xacts depending of which version you are considering) could > be added to LogicalDecodingContext. > If we add it to LogicalDecodingContext then I think we have to always reset the variable after its use which will make it look ugly and error-prone. I was not thinking to backpatch it because of the API change but I guess if we want to backpatch then we can add it to LogicalDecodingContext for back-branches. I am not sure if that will look committable but surely we can try. > (You should probably consider this approach > for skipped_xact too) > As mentioned, I think it will be more error-prone and we already have other xact related parameters in that and similar APIs. So, I am not sure why you want to prefer that? > > Does this same issue occur for long transactions? I mean keep a long > transaction open and execute thousands of transactions. > No, this problem won't happen for such cases because we will only try to send it at the commit time. Note that this problem happens only when we don't send anything to the subscriber till a timeout happens. -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote: > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com > wrote: > > > > So I skip tracking lag during a transaction just like the current HEAD. > > Attach the new patch. > > > > Thanks, please find the updated patch where I have slightly modified > the comments. > > Sawada-San, Euler, do you have any opinion on this approach? I > personally still prefer the approach implemented in v10 [1] especially > due to the latest finding by Wang-San that we can't update the > lag-tracker apart from when it is invoked at the transaction end. > However, I am fine if we like this approach more. It seems v15 is simpler and less error prone than v10. v10 has a mix of OutputPluginUpdateProgress() and the new function update_progress(). The v10 also calls update_progress() for every change action in pgoutput_change(). It is not a good approach for maintainability -- new changes like sequences need extra calls. However, as you mentioned there should handle the track lag case. Both patches change the OutputPluginUpdateProgress() so it cannot be backpatched. Are you planning to backpatch it? If so, the boolean variable (last_write or end_xacts depending of which version you are considering) could be added to LogicalDecodingContext. (You should probably consider this approach for skipped_xact too) + * For a large transaction, if we don't send any change to the downstream for a + * long time then it can timeout. This can happen when all or most of the + * changes are either not published or got filtered out. We should probable mention that "long time" is wal_receiver_timeout on subscriber. +* change as that can have overhead. Testing reveals that there is no +* noticeable overhead in doing it after continuously processing 100 or so +* changes. Tests revealed that ... +* We don't have a mechanism to get the ack for any LSN other than end xact +* lsn from the downstream. So, we track lag only for end xact lsn's. s/lsn/LSN/ and s/lsn's/LSNs/ I would say "end of transaction LSN". + * If too many changes are processed then try to send a keepalive message to + * receiver to avoid timeouts. In logical replication, if too many changes are processed then try to send a keepalive message. It might avoid a timeout in the subscriber. Does this same issue occur for long transactions? I mean keep a long transaction open and execute thousands of transactions. BEGIN; INSERT INTO foo (a) VALUES(1); -- wait a few hours while executing 10^x transactions INSERT INTO foo (a) VALUES(2); COMMIT; -- Euler Taveira EDB https://www.enterprisedb.com/
Re: Logical replication timeout problem
On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila wrote: > > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com > wrote: > > > > So I skip tracking lag during a transaction just like the current HEAD. > > Attach the new patch. > > > > Thanks, please find the updated patch where I have slightly modified > the comments. > > Sawada-San, Euler, do you have any opinion on this approach? I > personally still prefer the approach implemented in v10 [1] especially > due to the latest finding by Wang-San that we can't update the > lag-tracker apart from when it is invoked at the transaction end. > However, I am fine if we like this approach more. Thank you for updating the patch. The current patch looks much better than v10 which requires to call to update_progress() every path. Regarding v15 patch, I'm concerned a bit that the new function name, update_progress(), is too generic. How about update_replation_progress() or something more specific name? --- +if (end_xact) +{ +/* Update progress tracking at xact end. */ +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact); +changes_count = 0; +return; +} + +/* + * After continuously processing CHANGES_THRESHOLD changes, we try to send + * a keepalive message if required. + * + * We don't want to try sending a keepalive message after processing each + * change as that can have overhead. Testing reveals that there is no + * noticeable overhead in doing it after continuously processing 100 or so + * changes. + */ +#define CHANGES_THRESHOLD 100 +if (++changes_count >= CHANGES_THRESHOLD) +{ +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact); +changes_count = 0; +} Can we merge two if branches since we do the same things? Or did you separate them for better readability? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Logical replication timeout problem
On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com wrote: > > So I skip tracking lag during a transaction just like the current HEAD. > Attach the new patch. > Thanks, please find the updated patch where I have slightly modified the comments. Sawada-San, Euler, do you have any opinion on this approach? I personally still prefer the approach implemented in v10 [1] especially due to the latest finding by Wang-San that we can't update the lag-tracker apart from when it is invoked at the transaction end. However, I am fine if we like this approach more. [1] - https://www.postgresql.org/message-id/OS3PR01MB6275E0C2B4D9E488AD7CBA209E1F9%40OS3PR01MB6275.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila. v15-0001-Fix-the-logical-replication-timeout-during-large.patch Description: Binary data
RE: Logical replication timeout problem
On Mon, Apr 11, 2022 at 2:39 PM I wrote: > Attach the new patch. Also, share test results and details. To check that the lsn information used for the calculation is what we expected, I get some information by adding logs in the function LagTrackerRead. Summary of test results: - In current HEAD and current HEAD with v14 patch, we could found the information of same lsn as received from subscriber-side in lag_tracker. - In current HEAD with v13 patch, we could hardly found the information of same lsn in lag_tracker. Attach the details: [The log by HEAD] the lsn we received from subscriber | the lsn whose time we used to calculate in lag_tracker 382826584| 382826584 743884840| 743884840 1104943232 | 1104943232 1468949424 | 1468949424 1469521216 | 1469521216 [The log by HEAD with v14 patch] the lsn we received from subscriber | the lsn whose time we used to calculate in lag_tracker 382826584| 382826584 743890672| 743890672 1105074264 | 1105074264 1469127040 | 1469127040 1830591240 | 1830591240 [The log by HEAD with v13 patch] the lsn we received from subscriber | the lsn whose time we used to calculate in lag_tracker 382826584| 359848728 743884840| 713808560 1105010640 | 1073978544 1468517536 | 1447850160 1469516328 | 1469516328 Regards, Wang wei
RE: Logical replication timeout problem
On Wed, Apr 7, 2022 at 1:34 PM Amit Kapila wrote: > On Wed, Apr 6, 2022 at 6:30 PM wangw.f...@fujitsu.com > wrote: > > > > On Wed, Apr 6, 2022 at 1:58 AM Amit Kapila wrote: > > On Wed, Apr 6, 2022 at 4:32 AM Amit Kapila wrote: > > > Also, let's try to evaluate how it impacts lag functionality for large > transactions? > > I think this patch will not affect lag functionality. It will updates the > > lag > > field of view pg_stat_replication more frequently. > > IIUC, when invoking function WalSndUpdateProgress, it will store the lsn of > > change and invoking time in lag_tracker. Then when invoking function > > ProcessStandbyReplyMessage, it will calculate the lag field according to the > > message from subscriber and the information in lag_tracker. This patch does > > not modify this logic, but only increases the frequency of invoking. > > Please let me know if I understand wrong. > > > > No, your understanding seems correct to me. But what I want to check > is if calling the progress function more often has any impact on > lag-related fields in pg_stat_replication? I think you need to check > the impact of large transaction replay. Thanks for the explanation. After doing some checks, I found that the v13 patch makes the calculations of lag functionality inaccurate. In short, v13 patch lets us try to track lag more frequently and try to send a keepalive message to subscribers. But in order to prevent flooding the lag tracker, we could not track lag more than once within WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress). This means we may lose informations that needs to be tracked. For example, suppose there is a large transaction with lsn from lsn1 to lsn3. In HEAD, when we calculate the lag time for lsn3, the lag time of lsn3 is (now - lsn3.time). But with v13 patch, when we calculate the lag time for lsn3, because there maybe no informations of lsn3 but has informations of lsn2 in lag_tracker, the lag time of lsn3 is (now - t2.time). (see function LagTrackerRead) Therefore, if we lose the informations that need to be tracked, the lag time becomes large and inaccurate. So I skip tracking lag during a transaction just like the current HEAD. Attach the new patch. Regards, Wang wei v14-0001-Fix-the-logical-replication-timeout-during-large.patch Description: v14-0001-Fix-the-logical-replication-timeout-during-large.patch
RE: Logical replication timeout problem
On Wed, Apr 7, 2022 at 1:34 PM Amit Kapila wrote: > Thanks for your comments. > One comment: > +static void > +update_progress(LogicalDecodingContext *ctx, bool skipped_xact, bool > end_xact) > +{ > + static int changes_count = 0; > + > + if (end_xact) > + { > + /* Update progress tracking at xact end. */ > + OutputPluginUpdateProgress(ctx, skipped_xact); > + changes_count = 0; > + } > + /* > + * After continuously processing CHANGES_THRESHOLD changes, update > progress > + * which will also try to send a keepalive message if required. > > I think you can simply return after making changes_count = 0. There > should be an empty line before starting the next comment. Improve as suggested. BTW, there is a conflict in current HEAD when applying v12 because of the commit 2c7ea57. Also rebase it. Attach the new patch. 1. Make some improvements to the new function update_progress. [suggestion by Amit-San] 2. Rebase the patch because the commit 2c7ea57 in current HEAD. Regards, Wang wei v13-0001-Fix-the-logical-replication-timeout-during-large.patch Description: v13-0001-Fix-the-logical-replication-timeout-during-large.patch
Re: Logical replication timeout problem
On Wed, Apr 6, 2022 at 6:30 PM wangw.f...@fujitsu.com wrote: > > On Wed, Apr 6, 2022 at 1:59 AM Amit Kapila wrote: > On Wed, Apr 6, 2022 at 4:32 AM Amit Kapila wrote: > > > Thanks for your comments. > > > typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct > > LogicalDecodingContext *lr, > > XLogRecPtr Ptr, > > TransactionId xid, > > - bool skipped_xact > > + bool skipped_xact, > > + bool last_write > > > > In this approach, I don't think we need an additional parameter last_write. > > Let's > > do the work related to keepalive without a parameter, do you see any problem > > with that? > I agree with you. Modify this point. > > > I think this patch doesn't take into account that we call > > OutputPluginUpdateProgress() from APIs like pgoutput_commit_txn(). I > > think we should always call the new function update_progress from > > those existing call sites and arrange the function such that when > > called from xact end APIs like pgoutput_commit_txn(), it always call > > OutputPluginUpdateProgress and make changes_count as 0. > Improve it. > Add two new input to function update_progress.(skipped_xact and end_xact). > Modify the function invoke from OutputPluginUpdateProgress to update_progress. > > > Also, let's try to evaluate how it impacts lag functionality for large > > transactions? > I think this patch will not affect lag functionality. It will updates the lag > field of view pg_stat_replication more frequently. > IIUC, when invoking function WalSndUpdateProgress, it will store the lsn of > change and invoking time in lag_tracker. Then when invoking function > ProcessStandbyReplyMessage, it will calculate the lag field according to the > message from subscriber and the information in lag_tracker. This patch does > not modify this logic, but only increases the frequency of invoking. > Please let me know if I understand wrong. > No, your understanding seems correct to me. But what I want to check is if calling the progress function more often has any impact on lag-related fields in pg_stat_replication? I think you need to check the impact of large transaction replay. One comment: +static void +update_progress(LogicalDecodingContext *ctx, bool skipped_xact, bool end_xact) +{ + static int changes_count = 0; + + if (end_xact) + { + /* Update progress tracking at xact end. */ + OutputPluginUpdateProgress(ctx, skipped_xact); + changes_count = 0; + } + /* + * After continuously processing CHANGES_THRESHOLD changes, update progress + * which will also try to send a keepalive message if required. I think you can simply return after making changes_count = 0. There should be an empty line before starting the next comment. -- With Regards, Amit Kapila.
RE: Logical replication timeout problem
On Wed, Apr 6, 2022 at 1:59 AM Amit Kapila wrote: On Wed, Apr 6, 2022 at 4:32 AM Amit Kapila wrote: > Thanks for your comments. > typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct > LogicalDecodingContext *lr, > XLogRecPtr Ptr, > TransactionId xid, > - bool skipped_xact > + bool skipped_xact, > + bool last_write > > In this approach, I don't think we need an additional parameter last_write. > Let's > do the work related to keepalive without a parameter, do you see any problem > with that? I agree with you. Modify this point. > I think this patch doesn't take into account that we call > OutputPluginUpdateProgress() from APIs like pgoutput_commit_txn(). I > think we should always call the new function update_progress from > those existing call sites and arrange the function such that when > called from xact end APIs like pgoutput_commit_txn(), it always call > OutputPluginUpdateProgress and make changes_count as 0. Improve it. Add two new input to function update_progress.(skipped_xact and end_xact). Modify the function invoke from OutputPluginUpdateProgress to update_progress. > Also, let's try to evaluate how it impacts lag functionality for large > transactions? I think this patch will not affect lag functionality. It will updates the lag field of view pg_stat_replication more frequently. IIUC, when invoking function WalSndUpdateProgress, it will store the lsn of change and invoking time in lag_tracker. Then when invoking function ProcessStandbyReplyMessage, it will calculate the lag field according to the message from subscriber and the information in lag_tracker. This patch does not modify this logic, but only increases the frequency of invoking. Please let me know if I understand wrong. Attach the new patch. 1. Remove the new function input parameters in this patch(parameter last_write of WalSndUpdateProgress). [suggestion by Amit-San] 2. Also invoke function update_progress in other xact end APIs like pgoutput_commit_txn. [suggestion by Amit-San] Regards, Wang wei v12-0001-Fix-the-logical-replication-timeout-during-large.patch Description: v12-0001-Fix-the-logical-replication-timeout-during-large.patch
Re: Logical replication timeout problem
On Wed, Apr 6, 2022 at 11:28 AM Amit Kapila wrote: > > On Wed, Apr 6, 2022 at 11:09 AM wangw.f...@fujitsu.com > wrote: > > > > According to your suggestion, improve the patch to make it more generic. > > Attach the new patch. > > > > typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct > LogicalDecodingContext *lr, > XLogRecPtr Ptr, > TransactionId xid, > - bool skipped_xact > + bool skipped_xact, > + bool last_write > > In this approach, I don't think we need an additional parameter > last_write. Let's do the work related to keepalive without a > parameter, do you see any problem with that? > I think this patch doesn't take into account that we call OutputPluginUpdateProgress() from APIs like pgoutput_commit_txn(). I think we should always call the new function update_progress from those existing call sites and arrange the function such that when called from xact end APIs like pgoutput_commit_txn(), it always call OutputPluginUpdateProgress and make changes_count as 0. -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Wed, Apr 6, 2022 at 11:09 AM wangw.f...@fujitsu.com wrote: > > On Fri, Apr 1, 2022 at 12:09 AM Amit Kapila wrote: > > On Fri, Apr 1, 2022 at 8:28 AM Euler Taveira wrote: > > > > > > It seems I didn't make myself clear. The callbacks I'm referring to the > > > *_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a > > > *_cb_wrapper() function, we have something like: > > > > > > if (ctx->progress & PGOUTPUT_PROGRESS_FOO) > > > NewUpdateProgress(ctx, false); > > > > > > The NewUpdateProgress function would contain a logic similar to the > > > update_progress() from the proposed patch. (A different function name here > > just > > > to avoid confusion.) > > > > > > The output plugin is responsible to set ctx->progress with the callback > > > variables (for example, PGOUTPUT_PROGRESS_CHANGE for change_cb()) > > that we would > > > like to run NewUpdateProgress. > > > > > > > This sounds like a conflicting approach to what we currently do. > > Currently, OutputPluginUpdateProgress() is called from the xact > > related pgoutput functions like pgoutput_commit_txn(), > > pgoutput_prepare_txn(), pgoutput_commit_prepared_txn(), etc. So, if we > > follow what you are saying then for some of the APIs like > > pgoutput_change/_message/_truncate, we need to set the parameter to > > invoke NewUpdateProgress() which will internally call > > OutputPluginUpdateProgress(), and for the remaining APIs, we will call > > in the corresponding pgoutput_* function. I feel if we want to make it > > more generic than the current patch, it is better to directly call > > what you are referring to here as NewUpdateProgress() in all remaining > > APIs like pgoutput_change/_truncate, etc. > Thanks for your comments. > > According to your suggestion, improve the patch to make it more generic. > Attach the new patch. > typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct LogicalDecodingContext *lr, XLogRecPtr Ptr, TransactionId xid, - bool skipped_xact + bool skipped_xact, + bool last_write In this approach, I don't think we need an additional parameter last_write. Let's do the work related to keepalive without a parameter, do you see any problem with that? Also, let's try to evaluate how it impacts lag functionality for large transactions? -- With Regards, Amit Kapila.
RE: Logical replication timeout problem
On Fri, Apr 1, 2022 at 12:09 AM Amit Kapila wrote: > On Fri, Apr 1, 2022 at 8:28 AM Euler Taveira wrote: > > > > On Thu, Mar 31, 2022, at 11:27 PM, Amit Kapila wrote: > > > > This is exactly our initial analysis and we have tried a patch on > > these lines and it has a noticeable overhead. See [1]. Calling this > > for each change or each skipped change can bring noticeable overhead > > that is why we decided to call it after a certain threshold (100) of > > skipped changes. Now, surely as mentioned in my previous reply we can > > make it generic such that instead of calling this (update_progress > > function as in the patch) for skipped cases, we call it always. Will > > that make it better? > > > > That's what I have in mind but using a different approach. > > > > > The functions CreateInitDecodingContext and CreateDecodingContext > receives the > > > update_progress function as a parameter. These functions are called in 2 > > > places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) and > (b) > > > SQL logical decoding functions (pg_logical_*_changes). Case (a) uses > > > WalSndUpdateProgress as a progress function. Case (b) does not have one > because > > > it is not required -- local decoding/communication. There is no custom > update > > > progress routine for each output plugin which leads me to the question: > > > couldn't we encapsulate the update progress call into the callback > > > functions? > > > > > > > Sorry, I don't get your point. What exactly do you mean by this? > > AFAIS, currently we call this output plugin API in pgoutput functions > > only, do you intend to get it invoked from a different place? > > > > It seems I didn't make myself clear. The callbacks I'm referring to the > > *_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a > > *_cb_wrapper() function, we have something like: > > > > if (ctx->progress & PGOUTPUT_PROGRESS_FOO) > > NewUpdateProgress(ctx, false); > > > > The NewUpdateProgress function would contain a logic similar to the > > update_progress() from the proposed patch. (A different function name here > just > > to avoid confusion.) > > > > The output plugin is responsible to set ctx->progress with the callback > > variables (for example, PGOUTPUT_PROGRESS_CHANGE for change_cb()) > that we would > > like to run NewUpdateProgress. > > > > This sounds like a conflicting approach to what we currently do. > Currently, OutputPluginUpdateProgress() is called from the xact > related pgoutput functions like pgoutput_commit_txn(), > pgoutput_prepare_txn(), pgoutput_commit_prepared_txn(), etc. So, if we > follow what you are saying then for some of the APIs like > pgoutput_change/_message/_truncate, we need to set the parameter to > invoke NewUpdateProgress() which will internally call > OutputPluginUpdateProgress(), and for the remaining APIs, we will call > in the corresponding pgoutput_* function. I feel if we want to make it > more generic than the current patch, it is better to directly call > what you are referring to here as NewUpdateProgress() in all remaining > APIs like pgoutput_change/_truncate, etc. Thanks for your comments. According to your suggestion, improve the patch to make it more generic. Attach the new patch. Regards, Wang wei v11-0001-Fix-the-logical-replication-timeout-during-large.patch Description: v11-0001-Fix-the-logical-replication-timeout-during-large.patch
Re: Logical replication timeout problem
On Fri, Apr 1, 2022 at 8:28 AM Euler Taveira wrote: > > On Thu, Mar 31, 2022, at 11:27 PM, Amit Kapila wrote: > > This is exactly our initial analysis and we have tried a patch on > these lines and it has a noticeable overhead. See [1]. Calling this > for each change or each skipped change can bring noticeable overhead > that is why we decided to call it after a certain threshold (100) of > skipped changes. Now, surely as mentioned in my previous reply we can > make it generic such that instead of calling this (update_progress > function as in the patch) for skipped cases, we call it always. Will > that make it better? > > That's what I have in mind but using a different approach. > > > The functions CreateInitDecodingContext and CreateDecodingContext receives > > the > > update_progress function as a parameter. These functions are called in 2 > > places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) and (b) > > SQL logical decoding functions (pg_logical_*_changes). Case (a) uses > > WalSndUpdateProgress as a progress function. Case (b) does not have one > > because > > it is not required -- local decoding/communication. There is no custom > > update > > progress routine for each output plugin which leads me to the question: > > couldn't we encapsulate the update progress call into the callback > > functions? > > > > Sorry, I don't get your point. What exactly do you mean by this? > AFAIS, currently we call this output plugin API in pgoutput functions > only, do you intend to get it invoked from a different place? > > It seems I didn't make myself clear. The callbacks I'm referring to the > *_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a > *_cb_wrapper() function, we have something like: > > if (ctx->progress & PGOUTPUT_PROGRESS_FOO) > NewUpdateProgress(ctx, false); > > The NewUpdateProgress function would contain a logic similar to the > update_progress() from the proposed patch. (A different function name here > just > to avoid confusion.) > > The output plugin is responsible to set ctx->progress with the callback > variables (for example, PGOUTPUT_PROGRESS_CHANGE for change_cb()) that we > would > like to run NewUpdateProgress. > This sounds like a conflicting approach to what we currently do. Currently, OutputPluginUpdateProgress() is called from the xact related pgoutput functions like pgoutput_commit_txn(), pgoutput_prepare_txn(), pgoutput_commit_prepared_txn(), etc. So, if we follow what you are saying then for some of the APIs like pgoutput_change/_message/_truncate, we need to set the parameter to invoke NewUpdateProgress() which will internally call OutputPluginUpdateProgress(), and for the remaining APIs, we will call in the corresponding pgoutput_* function. I feel if we want to make it more generic than the current patch, it is better to directly call what you are referring to here as NewUpdateProgress() in all remaining APIs like pgoutput_change/_truncate, etc. -- With Regards, Amit Kapila.