subject:"Re\: Logical replication timeout problem"

RE: Logical replication timeout problem

2023-03-01 Thread wangw.f...@fujitsu.com

On Thu, Mar 2, 2023 at 4:19 AM Gregory Stark (as CFM)  
wrote:  
> On Wed, 8 Feb 2023 at 15:04, Andres Freund  wrote:
> >
> > Attached is a current, quite rough, prototype. It addresses some of the 
> > points
> > raised, but far from all. There's also several XXXs/FIXMEs in it.  I changed
> > the file-ending to .txt to avoid hijacking the CF entry.
> 
> It looks like this patch has received quite a generous helping of
> feedback from Andres. I'm setting it to Waiting on Author.
> 
> On the one hand it looks like there's a lot of work to do on this but
> on the other hand it sounds like this is a live problem in the field
> so if it can get done in time for release that would be great but if
> not then feel free to move it to the next commitfest (which means next
> release).

Hi,

Since this patch is an improvement to the architecture in HEAD, we started
another new thread [1] on this topic to develop related patch.

It seems that we could modify the details of this CF entry to point to the new
thread and change the status to 'Needs Review'.

[1] - 
https://www.postgresql.org/message-id/20230210210423.r26ndnfmuifie4f6%40awork3.anarazel.de

Regards,
Wang Wei

Re: Logical replication timeout problem

2023-03-01 Thread Gregory Stark (as CFM)

On Wed, 8 Feb 2023 at 15:04, Andres Freund  wrote:
>
> Attached is a current, quite rough, prototype. It addresses some of the points
> raised, but far from all. There's also several XXXs/FIXMEs in it.  I changed
> the file-ending to .txt to avoid hijacking the CF entry.

It looks like this patch has received quite a generous helping of
feedback from Andres. I'm setting it to Waiting on Author.

On the one hand it looks like there's a lot of work to do on this but
on the other hand it sounds like this is a live problem in the field
so if it can get done in time for release that would be great but if
not then feel free to move it to the next commitfest (which means next
release).

-- 
Gregory Stark
As Commitfest Manager

Re: Logical replication timeout problem

2023-02-08 Thread Andres Freund

Hi,

On 2023-02-08 10:30:37 -0800, Andres Freund wrote:
> On 2023-02-08 10:18:41 -0800, Andres Freund wrote:
> > I don't think the syncrep logic in WalSndUpdateProgress really works as-is -
> > consider what happens if e.g. the origin filter filters out entire
> > transactions. We'll afaics never get to WalSndUpdateProgress(). In some 
> > cases
> > we'll be lucky because we'll return quickly to XLogSendLogical(), but not
> > reliably.
>
> Is it actually the right thing to check SyncRepRequested() in that logic? It's
> quite common to set up syncrep so that individual users or transactions opt
> into syncrep, but to leave the default disabled.
>
> I don't really see an alternative to making this depend solely on
> sync_standbys_defined.

Hacking on a rough prototype how I think this should rather look, I had a few
questions / remarks:

- We probably need to call UpdateProgress from a bunch of places in decode.c
  as well? Indicating that we're lagging by a lot, just because all
  transactions were in another database seems decidedly suboptimal.

- Why should lag tracking only be updated at commit like points? That seems
  like it adds odd discontinuinities?

- The mix of skipped_xact and ctx->end_xact in WalSndUpdateProgress() seems
  somewhat odd. They have very overlapping meanings IMO.

- there's no UpdateProgress calls in pgoutput_stream_abort(), but ISTM there
  should be? It's legit progress.

- That's from 6912acc04f0: I find LagTrackerRead(), LagTrackerWrite() quite
  confusing, naming-wise. IIUC "reading" is about receiving confirmation
  messages, "writing" about the time the record was generated.  ISTM that the
  current time is a quite poor approximation in XLogSendPhysical(), but pretty
  much meaningless in WalSndUpdateProgress()? Am I missing something?

- Aren't the wal_sender_timeout / 2 checks in WalSndUpdateProgress(),
  WalSndWriteData() missing wal_sender_timeout <= 0 checks?

- I don't really understand why f95d53edged55 added !end_xact to the if
  condition for ProcessPendingWrites(). Is the theory that we'll end up in an
  outer loop soon?


Attached is a current, quite rough, prototype. It addresses some of the points
raised, but far from all. There's also several XXXs/FIXMEs in it.  I changed
the file-ending to .txt to avoid hijacking the CF entry.

Greetings,

Andres Freund
>From 1d89c84b1465b28ddef8c110500c3744477488df Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 8 Feb 2023 11:57:37 -0800
Subject: [PATCH v1] WIP: Initial sketch of progress update rework

---
 src/include/replication/logical.h |   7 +-
 src/include/replication/output_plugin.h   |   1 -
 src/include/replication/reorderbuffer.h   |  12 --
 src/backend/replication/logical/logical.c | 188 ++
 .../replication/logical/reorderbuffer.c   |  20 --
 src/backend/replication/pgoutput/pgoutput.c   |  10 -
 src/backend/replication/walsender.c   |  14 +-
 7 files changed, 116 insertions(+), 136 deletions(-)

diff --git a/src/include/replication/logical.h 
b/src/include/replication/logical.h
index 5f49554ea05..472f2a5b84c 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -27,8 +27,8 @@ typedef LogicalOutputPluginWriterWrite 
LogicalOutputPluginWriterPrepareWrite;
 typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct 
LogicalDecodingContext *lr,

 XLogRecPtr Ptr,

 TransactionId xid,
-   
 bool skipped_xact
-);
+   
 bool did_write,
+   
 bool finished_xact);
 
 typedef struct LogicalDecodingContext
 {
@@ -105,10 +105,9 @@ typedef struct LogicalDecodingContext
 */
boolaccept_writes;
boolprepared_write;
+   booldid_write;
XLogRecPtr  write_location;
TransactionId write_xid;
-   /* Are we processing the end LSN of a transaction? */
-   boolend_xact;
 } LogicalDecodingContext;
 
 
diff --git a/src/include/replication/output_plugin.h 
b/src/include/replication/output_plugin.h
index 2d89d26586e..b9358e15444 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -245,6 +245,5 @@ typedef struct OutputPluginCallbacks
 /* Functions in replication/logical/logical.c */
 extern void OutputPluginPrepareWrite(struct LogicalDecodingContext *ctx, bool 
last_write);
 extern void OutputPluginWrite(struct LogicalDecodingContext *ctx, bool 
last_write

Re: Logical replication timeout problem

2023-02-08 Thread Andres Freund

Hi,

On 2023-02-08 10:18:41 -0800, Andres Freund wrote:
> I don't think the syncrep logic in WalSndUpdateProgress really works as-is -
> consider what happens if e.g. the origin filter filters out entire
> transactions. We'll afaics never get to WalSndUpdateProgress(). In some cases
> we'll be lucky because we'll return quickly to XLogSendLogical(), but not
> reliably.

Is it actually the right thing to check SyncRepRequested() in that logic? It's
quite common to set up syncrep so that individual users or transactions opt
into syncrep, but to leave the default disabled.

I don't really see an alternative to making this depend solely on
sync_standbys_defined.

Greetings,

Andres Freund

Re: Logical replication timeout problem

2023-02-08 Thread Andres Freund

Hi,

On 2023-02-08 13:36:02 +0530, Amit Kapila wrote:
> On Wed, Feb 8, 2023 at 10:57 AM Andres Freund  wrote:
> >
> > On 2023-02-03 10:13:54 +0530, Amit Kapila wrote:
> > > I am planning to push this to HEAD sometime next week (by Wednesday).
> > > To backpatch this, we need to fix it in some non-standard way, like
> > > without introducing a callback which I am not sure is a good idea. If
> > > some other committers vote to get this in back branches with that or
> > > some different idea that can be backpatched then we can do that
> > > separately as well. I don't see this as a must-fix in back branches
> > > because we have a workaround (increase timeout) or users can use the
> > > streaming option (for >=14).
> >
> > I just saw the commit go in, and a quick scan over it makes me think neither
> > this commit, nor f95d53eded, which unfortunately was already backpatched, is
> > the right direction. The wrong direction likely started quite a bit earlier,
> > with 024711bb544.
> >
> > It feels quite fundamentally wrong that bascially every output plugin needs 
> > to
> > call a special function in nearly every callback.
> >
> > In 024711bb544 there was just one call to OutputPluginUpdateProgress() in
> > pgoutput.c. Quite tellingly, it just updated pgoutput, without touching
> > test_decoding.
> >
> > Then a8fd13cab0b added to more calls. 63cf61cdeb7 yet another.
> >
> 
> I think the original commit 024711bb544 forgets to call it in
> test_decoding and the other commits followed the same and missed to
> update test_decoding.

I think that's a symptom of the wrong architecture having been chosen. This
should *never* have been the task of output plugins.

> > I don't think:
> > /*
> >  * Wait until there is no pending write. Also process replies from the other
> >  * side and check timeouts during that.
> >  */
> > static void
> > ProcessPendingWrites(void)
> >
> > Is really a good name. What are we processing?
> >
> 
> It is for sending the keep_alive message (if required). That is
> normally done when we skipped processing a transaction to ensure sync
> replication is not delayed.

But how is that "processing pending writes"? For me "processing" implies we're
doing some analysis on them or such.

If we want to write data in WalSndUpdateProgress(), shouldn't we move the
common code of WalSndWriteData() and WalSndUpdateProgress() into
ProcessPendingWrites()?

> It has been discussed previously [1][2] to
> extend the WalSndUpdateProgress() interface. Basically, as explained
> by Craig [2], this has to be done from plugin as it can do filtering
> or there could be other reasons why the output plugin skips all
> changes. We used the same interface for sending keep-alive message
> when we processed a lot of (DDL) changes without sending anything to
> plugin.
>
> [1] - 
> https://www.postgresql.org/message-id/20200309183018.tzkzwu635sd366ej%40alap3.anarazel.de
> [2] - 
> https://www.postgresql.org/message-id/CAMsr%2BYE3o8Dt890Q8wTooY2MpN0JvdHqUAHYL-LNhBryXOPaKg%40mail.gmail.com

I don't buy that this has to be done by the output plugin. The actual sending
out of data happens via the LogicalDecodingContext callbacks, so we very well
can know whether we recently did send out data or not.

This really is a concern of the LogicalDecodingContext, it has pretty much
nothing to do with output plugins.  We should remove all calls of
OutputPluginUpdateProgress() from pgoutput, and add the necessary calls to
LogicalDecodingContext->update_progress() to generic code. And

Additionally we should either rename WalSndUpdateProgress(), because it's now
doing *far* more than "updating progress", or alternatively, split it into two
functions.

I don't think the syncrep logic in WalSndUpdateProgress really works as-is -
consider what happens if e.g. the origin filter filters out entire
transactions. We'll afaics never get to WalSndUpdateProgress(). In some cases
we'll be lucky because we'll return quickly to XLogSendLogical(), but not
reliably.

Greetings,

Andres Freund

Re: Logical replication timeout problem

2023-02-08 Thread Amit Kapila

On Wed, Feb 8, 2023 at 10:57 AM Andres Freund  wrote:
>
> On 2023-02-03 10:13:54 +0530, Amit Kapila wrote:
> > I am planning to push this to HEAD sometime next week (by Wednesday).
> > To backpatch this, we need to fix it in some non-standard way, like
> > without introducing a callback which I am not sure is a good idea. If
> > some other committers vote to get this in back branches with that or
> > some different idea that can be backpatched then we can do that
> > separately as well. I don't see this as a must-fix in back branches
> > because we have a workaround (increase timeout) or users can use the
> > streaming option (for >=14).
>
> I just saw the commit go in, and a quick scan over it makes me think neither
> this commit, nor f95d53eded, which unfortunately was already backpatched, is
> the right direction. The wrong direction likely started quite a bit earlier,
> with 024711bb544.
>
> It feels quite fundamentally wrong that bascially every output plugin needs to
> call a special function in nearly every callback.
>
> In 024711bb544 there was just one call to OutputPluginUpdateProgress() in
> pgoutput.c. Quite tellingly, it just updated pgoutput, without touching
> test_decoding.
>
> Then a8fd13cab0b added to more calls. 63cf61cdeb7 yet another.
>

I think the original commit 024711bb544 forgets to call it in
test_decoding and the other commits followed the same and missed to
update test_decoding.

>
> This makes no sense.  There's lots of output plugins out there. There's an
> increasing number of callbacks.  This isn't a maintainable path forward.
>
>
> If we want to call something to maintain state, it has to be happening from
> central infrastructure.
>
>
> It feels quite odd architecturally that WalSndUpdateProgress() ends up
> flushing out writes - that's far far from obvious.
>
> I don't think:
> /*
>  * Wait until there is no pending write. Also process replies from the other
>  * side and check timeouts during that.
>  */
> static void
> ProcessPendingWrites(void)
>
> Is really a good name. What are we processing?
>

It is for sending the keep_alive message (if required). That is
normally done when we skipped processing a transaction to ensure sync
replication is not delayed. It has been discussed previously [1][2] to
extend the WalSndUpdateProgress() interface. Basically, as explained
by Craig [2], this has to be done from plugin as it can do filtering
or there could be other reasons why the output plugin skips all
changes. We used the same interface for sending keep-alive message
when we processed a lot of (DDL) changes without sending anything to
plugin.

[1] - 
https://www.postgresql.org/message-id/20200309183018.tzkzwu635sd366ej%40alap3.anarazel.de
[2] - 
https://www.postgresql.org/message-id/CAMsr%2BYE3o8Dt890Q8wTooY2MpN0JvdHqUAHYL-LNhBryXOPaKg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-02-07 Thread Andres Freund

Hi,

On 2023-02-03 10:13:54 +0530, Amit Kapila wrote:
> I am planning to push this to HEAD sometime next week (by Wednesday).
> To backpatch this, we need to fix it in some non-standard way, like
> without introducing a callback which I am not sure is a good idea. If
> some other committers vote to get this in back branches with that or
> some different idea that can be backpatched then we can do that
> separately as well. I don't see this as a must-fix in back branches
> because we have a workaround (increase timeout) or users can use the
> streaming option (for >=14).

I just saw the commit go in, and a quick scan over it makes me think neither
this commit, nor f95d53eded, which unfortunately was already backpatched, is
the right direction. The wrong direction likely started quite a bit earlier,
with 024711bb544.

It feels quite fundamentally wrong that bascially every output plugin needs to
call a special function in nearly every callback.

In 024711bb544 there was just one call to OutputPluginUpdateProgress() in
pgoutput.c. Quite tellingly, it just updated pgoutput, without touching
test_decoding.

Then a8fd13cab0b added to more calls. 63cf61cdeb7 yet another.

This makes no sense.  There's lots of output plugins out there. There's an
increasing number of callbacks.  This isn't a maintainable path forward.

If we want to call something to maintain state, it has to be happening from
central infrastructure.

It feels quite odd architecturally that WalSndUpdateProgress() ends up
flushing out writes - that's far far from obvious.

I don't think:
/*
 * Wait until there is no pending write. Also process replies from the other
 * side and check timeouts during that.
 */
static void
ProcessPendingWrites(void)

Is really a good name. What are we processing? What are we actually waiting
for - because we don't actually wait for the data to sent out or anything,
just that they're in a network buffer.

Greetings,

Andres Freund

Re: Logical replication timeout problem

2023-02-02 Thread Amit Kapila

On Wed, Feb 1, 2023 at 10:04 AM Amit Kapila  wrote:
>
> On Tue, Jan 31, 2023 at 8:24 PM Ashutosh Bapat
>  wrote:
> >
> > On Tue, Jan 31, 2023 at 5:12 PM Amit Kapila  wrote:
> > >
> > > On Tue, Jan 31, 2023 at 5:03 PM Ashutosh Bapat
> > >  wrote:
> > > >
> > > > On Tue, Jan 31, 2023 at 4:58 PM Amit Kapila  
> > > > wrote:
> > > >
> > > > > Thanks, the patch looks good to me. I have slightly adjusted one of
> > > > > the comments and ran pgindent. See attached. As mentioned in the
> > > > > commit message, we shouldn't backpatch this as this requires a new
> > > > > callback and moreover, users can increase the wal_sender_timeout and
> > > > > wal_receiver_timeout to avoid this problem. What do you think?
> > > >
> > > > The callback and the implementation is all in core. What's the risk
> > > > you see in backpatching it?
> > > >
> > >
> > > Because we are changing the exposed structure and which can break
> > > existing extensions using it.
> >
> > Is that because we are adding the new member in the middle of the
> > structure?
> >
>
> Not only that but this changes the size of the structure and we want
> to avoid that as well in stable branches. See email [1] (you can't
> change the struct size either ...). As per my understanding, our usual
> practice is to not change the exposed structure's size/definition in
> stable branches.
>
>

I am planning to push this to HEAD sometime next week (by Wednesday).
To backpatch this, we need to fix it in some non-standard way, like
without introducing a callback which I am not sure is a good idea. If
some other committers vote to get this in back branches with that or
some different idea that can be backpatched then we can do that
separately as well. I don't see this as a must-fix in back branches
because we have a workaround (increase timeout) or users can use the
streaming option (for >=14).

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-31 Thread Amit Kapila

On Tue, Jan 31, 2023 at 8:24 PM Ashutosh Bapat
 wrote:
>
> On Tue, Jan 31, 2023 at 5:12 PM Amit Kapila  wrote:
> >
> > On Tue, Jan 31, 2023 at 5:03 PM Ashutosh Bapat
> >  wrote:
> > >
> > > On Tue, Jan 31, 2023 at 4:58 PM Amit Kapila  
> > > wrote:
> > >
> > > > Thanks, the patch looks good to me. I have slightly adjusted one of
> > > > the comments and ran pgindent. See attached. As mentioned in the
> > > > commit message, we shouldn't backpatch this as this requires a new
> > > > callback and moreover, users can increase the wal_sender_timeout and
> > > > wal_receiver_timeout to avoid this problem. What do you think?
> > >
> > > The callback and the implementation is all in core. What's the risk
> > > you see in backpatching it?
> > >
> >
> > Because we are changing the exposed structure and which can break
> > existing extensions using it.
>
> Is that because we are adding the new member in the middle of the
> structure?
>

Not only that but this changes the size of the structure and we want
to avoid that as well in stable branches. See email [1] (you can't
change the struct size either ...). As per my understanding, our usual
practice is to not change the exposed structure's size/definition in
stable branches.


[1] - https://www.postgresql.org/message-id/2358496.1649168259%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-31 Thread Amit Kapila

On Wed, Feb 1, 2023 at 4:43 AM Peter Smith  wrote:
>
> Here are my review comments for v13-1.
>
> ==
> Commit message
>
> 1.
> The DDLs like Refresh Materialized views that generate lots of temporary
> data due to rewrite rules may not be processed by output plugins (for
> example pgoutput). So, we won't send keep-alive messages for a long time
> while processing such commands and that can lead the subscriber side to
> timeout.
>
> ~
>
> SUGGESTION (minor rearranged way to say the same thing)
>
> For DDLs that generate lots of temporary data due to rewrite rules
> (e.g. REFRESH MATERIALIZED VIEW) the output plugins (e.g. pgoutput)
> may not be processed for a long time. Since we don't send keep-alive
> messages while processing such commands that can lead the subscriber
> side to timeout.
>

Hmm, this makes it less clear and in fact changed the meaning.

> ~~~
>
> 2.
> The commit message says what the problem is, but it doesn’t seem to
> describe what this patch does to fix the problem.
>

I thought it was apparent and the code comments made it clear.

>
> 4.
> +#define CHANGES_THRESHOLD 100
> +
> + if (++changes_count >= CHANGES_THRESHOLD)
> + {
> + rb->update_progress_txn(rb, txn, change->lsn);
> + changes_count = 0;
> + }
>
> I was wondering if it would have been simpler to write this code like below.
>
> Also, by doing it this way the 'changes_count' variable name makes
> more sense IMO, otherwise (for current code) maybe it should be called
> something like 'changes_since_last_keepalive'
>
> SUGGESTION
> if (++changes_count % CHANGES_THRESHOLD == 0)
> rb->update_progress_txn(rb, txn, change->lsn);
>

I find the current code in the patch clear and easy to understand.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-31 Thread Peter Smith

Here are my review comments for v13-1.

==
Commit message

1.
The DDLs like Refresh Materialized views that generate lots of temporary
data due to rewrite rules may not be processed by output plugins (for
example pgoutput). So, we won't send keep-alive messages for a long time
while processing such commands and that can lead the subscriber side to
timeout.

~

SUGGESTION (minor rearranged way to say the same thing)

For DDLs that generate lots of temporary data due to rewrite rules
(e.g. REFRESH MATERIALIZED VIEW) the output plugins (e.g. pgoutput)
may not be processed for a long time. Since we don't send keep-alive
messages while processing such commands that can lead the subscriber
side to timeout.

~~~

2.
The commit message says what the problem is, but it doesn’t seem to
describe what this patch does to fix the problem.

==
src/backend/replication/logical/reorderbuffer.c

3.
+ /*
+ * It is possible that the data is not sent to downstream for a
+ * long time either because the output plugin filtered it or there
+ * is a DDL that generates a lot of data that is not processed by
+ * the plugin. So, in such cases, the downstream can timeout. To
+ * avoid that we try to send a keepalive message if required.
+ * Trying to send a keepalive message after every change has some
+ * overhead, but testing showed there is no noticeable overhead if
+ * we do it after every ~100 changes.
+ */


3a.
"data is not sent to downstream" --> "data is not sent downstream" (?)

~

3b.
"So, in such cases," --> "In such cases,"

~~~

4.
+#define CHANGES_THRESHOLD 100
+
+ if (++changes_count >= CHANGES_THRESHOLD)
+ {
+ rb->update_progress_txn(rb, txn, change->lsn);
+ changes_count = 0;
+ }

I was wondering if it would have been simpler to write this code like below.

Also, by doing it this way the 'changes_count' variable name makes
more sense IMO, otherwise (for current code) maybe it should be called
something like 'changes_since_last_keepalive'

SUGGESTION
if (++changes_count % CHANGES_THRESHOLD == 0)
rb->update_progress_txn(rb, txn, change->lsn);


--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Logical replication timeout problem

2023-01-31 Thread Ashutosh Bapat

On Tue, Jan 31, 2023 at 5:12 PM Amit Kapila  wrote:
>
> On Tue, Jan 31, 2023 at 5:03 PM Ashutosh Bapat
>  wrote:
> >
> > On Tue, Jan 31, 2023 at 4:58 PM Amit Kapila  wrote:
> >
> > > Thanks, the patch looks good to me. I have slightly adjusted one of
> > > the comments and ran pgindent. See attached. As mentioned in the
> > > commit message, we shouldn't backpatch this as this requires a new
> > > callback and moreover, users can increase the wal_sender_timeout and
> > > wal_receiver_timeout to avoid this problem. What do you think?
> >
> > The callback and the implementation is all in core. What's the risk
> > you see in backpatching it?
> >
>
> Because we are changing the exposed structure and which can break
> existing extensions using it.

Is that because we are adding the new member in the middle of the
structure? Shouldn't extensions provide new libraries with each
maintenance release of PG?

>
> > Customers can adjust the timeouts, but only after the receiver has
> > timed out a few times. Replication remains broekn till they notice it
> > and adjust timeouts. By that time WAL has piled up. It also takes a
> > few attempts to increase timeouts since the time taken by a
> > transaction to decode can not be estimated beforehand. All that makes
> > it worth back-patching if it's possible. We had a customer who piled
> > up GBs of WAL before realising that this is the problem. Their system
> > almost came to a halt due to that.
> >
>
> Which version are they using? If they are at >=14, using "streaming =
> on" for a subscription should also avoid this problem.

13.

-- 
Best Wishes,
Ashutosh Bapat

Re: Logical replication timeout problem

2023-01-31 Thread Amit Kapila

On Tue, Jan 31, 2023 at 5:03 PM Ashutosh Bapat
 wrote:
>
> On Tue, Jan 31, 2023 at 4:58 PM Amit Kapila  wrote:
>
> > Thanks, the patch looks good to me. I have slightly adjusted one of
> > the comments and ran pgindent. See attached. As mentioned in the
> > commit message, we shouldn't backpatch this as this requires a new
> > callback and moreover, users can increase the wal_sender_timeout and
> > wal_receiver_timeout to avoid this problem. What do you think?
>
> The callback and the implementation is all in core. What's the risk
> you see in backpatching it?
>

Because we are changing the exposed structure and which can break
existing extensions using it.

> Customers can adjust the timeouts, but only after the receiver has
> timed out a few times. Replication remains broekn till they notice it
> and adjust timeouts. By that time WAL has piled up. It also takes a
> few attempts to increase timeouts since the time taken by a
> transaction to decode can not be estimated beforehand. All that makes
> it worth back-patching if it's possible. We had a customer who piled
> up GBs of WAL before realising that this is the problem. Their system
> almost came to a halt due to that.
>

Which version are they using? If they are at >=14, using "streaming =
on" for a subscription should also avoid this problem.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-31 Thread Ashutosh Bapat

On Tue, Jan 31, 2023 at 4:58 PM Amit Kapila  wrote:

> Thanks, the patch looks good to me. I have slightly adjusted one of
> the comments and ran pgindent. See attached. As mentioned in the
> commit message, we shouldn't backpatch this as this requires a new
> callback and moreover, users can increase the wal_sender_timeout and
> wal_receiver_timeout to avoid this problem. What do you think?

The callback and the implementation is all in core. What's the risk
you see in backpatching it?

Customers can adjust the timeouts, but only after the receiver has
timed out a few times. Replication remains broekn till they notice it
and adjust timeouts. By that time WAL has piled up. It also takes a
few attempts to increase timeouts since the time taken by a
transaction to decode can not be estimated beforehand. All that makes
it worth back-patching if it's possible. We had a customer who piled
up GBs of WAL before realising that this is the problem. Their system
almost came to a halt due to that.

--
Best Wishes,
Ashutosh Bapat

Re: Logical replication timeout problem

2023-01-31 Thread Amit Kapila

On Tue, Jan 31, 2023 at 2:53 PM wangw.f...@fujitsu.com
 wrote:
>
> On Mon, Jan 30, 2023 at 17:50 PM I wrote:
> > Attach the new patch.
>
> When invoking the function ReorderBufferProcessTXN, the threshold-related
> counter "changes_count" may have some random value from the previous
> transaction's processing. To fix this, I moved the definition of the counter
> "changes_count" outside the while-loop and did not use the keyword "static".
>
> Attach the new patch.
>

Thanks, the patch looks good to me. I have slightly adjusted one of
the comments and ran pgindent. See attached. As mentioned in the
commit message, we shouldn't backpatch this as this requires a new
callback and moreover, users can increase the wal_sender_timeout and
wal_receiver_timeout to avoid this problem. What do you think?

-- 
With Regards,
Amit Kapila.

v13-0001-Fix-the-logical-replication-timeout-during-large.patch
Description: Binary data

RE: Logical replication timeout problem

2023-01-31 Thread wangw.f...@fujitsu.com

On Mon, Jan 30, 2023 at 17:50 PM I wrote:
> Attach the new patch.

When invoking the function ReorderBufferProcessTXN, the threshold-related
counter "changes_count" may have some random value from the previous
transaction's processing. To fix this, I moved the definition of the counter
"changes_count" outside the while-loop and did not use the keyword "static".

Attach the new patch.

Regards,
Wang Wei


v11-0001-Fix-the-logical-replication-timeout-during-proce.patch
Description:  v11-0001-Fix-the-logical-replication-timeout-during-proce.patch

RE: Logical replication timeout problem

2023-01-30 Thread wangw.f...@fujitsu.com

On Mon, Jan 30, 2023 at 14:55 PM Amit Kapila  wrote:
> On Mon, Jan 30, 2023 at 10:36 AM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Mon, Jan 30, 2023 11:37 AM Shi, Yu/侍 雨 
> wrote:
> > > On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com
> > >  wrote:
> >
> > Yes, I think you are right.
> > Fixed this problem.
> >
> 
> + /*
> + * Trying to send keepalive message after every change has some
> + * overhead, but testing showed there is no noticeable overhead if
> + * we do it after every ~100 changes.
> + */
> +#define CHANGES_THRESHOLD 100
> +
> + if (++changes_count < CHANGES_THRESHOLD)
> + return;
> ...
> + changes_count = 0;
> 
> I think it is better to have this threshold-related code in that
> caller as we have in the previous version. Also, let's modify the
> comment as follows:"
> It is possible that the data is not sent to downstream for a long time
> either because the output plugin filtered it or there is a DDL that
> generates a lot of data that is not processed by the plugin. So, in
> such cases, the downstream can timeout. To avoid that we try to send a
> keepalive message if required. Trying to send a keepalive message
> after every change has some overhead, but testing showed there is no
> noticeable overhead if we do it after every ~100 changes."

Changed as suggested.

I also removed the comment atop the function update_progress_txn_cb_wrapper to
be consistent with the nearby *_cb_wrapper functions.

Attach the new patch.

Regards,
Wang Wei


v10-0001-Fix-the-logical-replication-timeout-during-proce.patch
Description:  v10-0001-Fix-the-logical-replication-timeout-during-proce.patch

Re: Logical replication timeout problem

2023-01-29 Thread Amit Kapila

On Mon, Jan 30, 2023 at 10:36 AM wangw.f...@fujitsu.com
 wrote:
>
> On Mon, Jan 30, 2023 11:37 AM Shi, Yu/侍 雨  wrote:
> > On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com
> >  wrote:
>
> Yes, I think you are right.
> Fixed this problem.
>

+ /*
+ * Trying to send keepalive message after every change has some
+ * overhead, but testing showed there is no noticeable overhead if
+ * we do it after every ~100 changes.
+ */
+#define CHANGES_THRESHOLD 100
+
+ if (++changes_count < CHANGES_THRESHOLD)
+ return;
...
+ changes_count = 0;

I think it is better to have this threshold-related code in that
caller as we have in the previous version. Also, let's modify the
comment as follows:"
It is possible that the data is not sent to downstream for a long time
either because the output plugin filtered it or there is a DDL that
generates a lot of data that is not processed by the plugin. So, in
such cases, the downstream can timeout. To avoid that we try to send a
keepalive message if required. Trying to send a keepalive message
after every change has some overhead, but testing showed there is no
noticeable overhead if we do it after every ~100 changes."

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2023-01-29 Thread wangw.f...@fujitsu.com

On Mon, Jan 30, 2023 11:37 AM Shi, Yu/侍 雨  wrote:
> On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > I tested a mix transaction of transactional and non-transactional messages 
> > on
> > the current HEAD and reproduced the timeout problem. I think this result is
> OK.
> > Because when decoding a transaction, non-transactional changes are
> processed
> > directly and the function WalSndKeepaliveIfNecessary is called, while
> > transactional changes are cached and processed after decoding. After
> decoding,
> > only transactional changes will be processed (in the function
> > ReorderBufferProcessTXN), so the timeout problem will still be reproduced.
> >
> > After applying the v8 patch, the test mentioned above didn't reproduce the
> > timeout problem (Attach this test script 'test_with_nontransactional.sh').
> >
> > Attach the new patch.
> >
> 
> Thanks for updating the patch. Here is a comment.

Thanks for your comment.

> In update_progress_txn_cb_wrapper(), it looks like we need to reset
> changes_count to 0 after calling OutputPluginUpdateProgress(), otherwise
> OutputPluginUpdateProgress() will always be called after 100 changes.

Yes, I think you are right.
Fixed this problem.

Attach the new patch.

Regards,
Wang Wei


v9-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v9-0001-Fix-the-logical-replication-timeout-during-proces.patch

RE: Logical replication timeout problem

2023-01-29 Thread shiy.f...@fujitsu.com

On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com  
wrote:
> 
> I tested a mix transaction of transactional and non-transactional messages on
> the current HEAD and reproduced the timeout problem. I think this result is 
> OK.
> Because when decoding a transaction, non-transactional changes are processed
> directly and the function WalSndKeepaliveIfNecessary is called, while
> transactional changes are cached and processed after decoding. After decoding,
> only transactional changes will be processed (in the function
> ReorderBufferProcessTXN), so the timeout problem will still be reproduced.
> 
> After applying the v8 patch, the test mentioned above didn't reproduce the
> timeout problem (Attach this test script 'test_with_nontransactional.sh').
> 
> Attach the new patch.
> 

Thanks for updating the patch. Here is a comment.

In update_progress_txn_cb_wrapper(), it looks like we need to reset
changes_count to 0 after calling OutputPluginUpdateProgress(), otherwise
OutputPluginUpdateProgress() will always be called after 100 changes.

Regards,
Shi yu

RE: Logical replication timeout problem

2023-01-28 Thread wangw.f...@fujitsu.com

On Fri, Jan 27, 2023 at 19:55 PM Amit Kapila  wrote:
> On Fri, Jan 27, 2023 at 5:18 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Wednesday, January 25, 2023 7:26 PM Amit Kapila
> 
> > >
> > > On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com
> > >  wrote:
> > > >
> > > > Attach the new patch.
> > > >
> > >
> > > I think the patch missed to handle the case of non-transactional messages
> which
> > > was previously getting handled. I have tried to address that in the 
> > > attached.
> Is
> > > there a reason that shouldn't be handled?
> >
> > Thanks for updating the patch!
> >
> > I thought about the non-transactional message. I think it seems fine if we
> > don’t handle it for timeout because such message is decoded via:
> >
> > WalSndLoop
> > -XLogSendLogical
> > --LogicalDecodingProcessRecord
> > ---logicalmsg_decode
> > ReorderBufferQueueMessage
> > -rb->message() -- //maybe send the message or do nothing here.
> >
> > After invoking rb->message(), we will directly return to the main
> > loop(WalSndLoop) where we will get a chance to call
> > WalSndKeepaliveIfNecessary() to avoid the timeout.
> >
> 
> Valid point. But this means the previous handling of non-transactional
> messages was also redundant.

Thanks for the analysis, I think it makes sense. So I removed the handling of
non-transactional messages.

> > This is a bit different from transactional changes, because for 
> > transactional
> changes, we
> > will buffer them and then send every buffered change one by one(via
> > ReorderBufferProcessTXN) without going back to the WalSndLoop, so we
> don't get
> > a chance to send keepalive message if necessary, which is more likely to 
> > cause
> the
> > timeout problem.
> >
> > I will also test the non-transactional message for timeout in case I missed
> something.
> >
> 
> Okay, thanks. Please see if we can test a mix of transactional and
> non-transactional messages as well.

I tested a mix transaction of transactional and non-transactional messages on
the current HEAD and reproduced the timeout problem. I think this result is OK.
Because when decoding a transaction, non-transactional changes are processed
directly and the function WalSndKeepaliveIfNecessary is called, while
transactional changes are cached and processed after decoding. After decoding,
only transactional changes will be processed (in the function
ReorderBufferProcessTXN), so the timeout problem will still be reproduced.

After applying the v8 patch, the test mentioned above didn't reproduce the
timeout problem (Attach this test script 'test_with_nontransactional.sh').

Attach the new patch.

Regards,
Wang Wei


v8-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v8-0001-Fix-the-logical-replication-timeout-during-proces.patch


test_with_nontransactional.sh
Description: test_with_nontransactional.sh

Re: Logical replication timeout problem

2023-01-27 Thread Amit Kapila

On Fri, Jan 27, 2023 at 5:18 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, January 25, 2023 7:26 PM Amit Kapila 
> >
> > On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com
> >  wrote:
> > >
> > > Attach the new patch.
> > >
> >
> > I think the patch missed to handle the case of non-transactional messages 
> > which
> > was previously getting handled. I have tried to address that in the 
> > attached. Is
> > there a reason that shouldn't be handled?
>
> Thanks for updating the patch!
>
> I thought about the non-transactional message. I think it seems fine if we
> don’t handle it for timeout because such message is decoded via:
>
> WalSndLoop
> -XLogSendLogical
> --LogicalDecodingProcessRecord
> ---logicalmsg_decode
> ReorderBufferQueueMessage
> -rb->message() -- //maybe send the message or do nothing here.
>
> After invoking rb->message(), we will directly return to the main
> loop(WalSndLoop) where we will get a chance to call
> WalSndKeepaliveIfNecessary() to avoid the timeout.
>

Valid point. But this means the previous handling of non-transactional
messages was also redundant.

> This is a bit different from transactional changes, because for transactional 
> changes, we
> will buffer them and then send every buffered change one by one(via
> ReorderBufferProcessTXN) without going back to the WalSndLoop, so we don't get
> a chance to send keepalive message if necessary, which is more likely to 
> cause the
> timeout problem.
>
> I will also test the non-transactional message for timeout in case I missed 
> something.
>

Okay, thanks. Please see if we can test a mix of transactional and
non-transactional messages as well.

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2023-01-27 Thread houzj.f...@fujitsu.com

On Wednesday, January 25, 2023 7:26 PM Amit Kapila 
> 
> On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com
>  wrote:
> >
> > Attach the new patch.
> >
> 
> I think the patch missed to handle the case of non-transactional messages 
> which
> was previously getting handled. I have tried to address that in the attached. 
> Is
> there a reason that shouldn't be handled?

Thanks for updating the patch!

I thought about the non-transactional message. I think it seems fine if we
don’t handle it for timeout because such message is decoded via:

WalSndLoop
-XLogSendLogical
--LogicalDecodingProcessRecord
---logicalmsg_decode
ReorderBufferQueueMessage
-rb->message() -- //maybe send the message or do nothing here.

After invoking rb->message(), we will directly return to the main
loop(WalSndLoop) where we will get a chance to call
WalSndKeepaliveIfNecessary() to avoid the timeout.

This is a bit different from transactional changes, because for transactional 
changes, we
will buffer them and then send every buffered change one by one(via
ReorderBufferProcessTXN) without going back to the WalSndLoop, so we don't get
a chance to send keepalive message if necessary, which is more likely to cause 
the
timeout problem.

I will also test the non-transactional message for timeout in case I missed 
something.

Best Regards,
Hou zj

Re: Logical replication timeout problem

2023-01-25 Thread Amit Kapila

On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com
 wrote:
>
> Attach the new patch.
>

I think the patch missed to handle the case of non-transactional
messages which was previously getting handled. I have tried to address
that in the attached. Is there a reason that shouldn't be handled?
Apart from that changed a few comments. If my understanding is
correct, then we need to change the callback update_progress_txn name
as well because now it needs to handle both transactional and
non-transactional changes. How about update_progress_write? We
accordingly need to change the comments for the callback.

Additionally, I think we should have a test case to show we don't time
out because of not processing non-transactional messages. See
pgoutput_message for cases where it doesn't process the message.

-- 
With Regards,
Amit Kapila.

v7-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description: Binary data

Re: Logical replication timeout problem

2023-01-23 Thread Peter Smith

On Tue, Jan 24, 2023 at 1:45 PM wangw.f...@fujitsu.com
 wrote:
>
> On Tues, Jan 24, 2023 at 8:28 AM Peter Smith  wrote:
> > Hi Hou-san, Here are my review comments for v5-0001.
>
> Thanks for your comments.
...
>
> Changed as suggested.
>
> Attach the new patch.

Thanks! Patch v6 LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: Logical replication timeout problem

2023-01-23 Thread wangw.f...@fujitsu.com

On Tues, Jan 24, 2023 at 8:28 AM Peter Smith  wrote:
> Hi Hou-san, Here are my review comments for v5-0001.

Thanks for your comments.

> ==
> src/backend/replication/logical/reorderbuffer.c
> 
> 1.
> @@ -2446,6 +2452,23 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
> ReorderBufferTXN *txn,
>   elog(ERROR, "tuplecid value in changequeue");
>   break;
>   }
> +
> + /*
> + * Sending keepalive messages after every change has some overhead, but
> + * testing showed there is no noticeable overhead if keepalive is only
> + * sent after every ~100 changes.
> + */
> +#define CHANGES_THRESHOLD 100
> +
> + /*
> + * Try to send a keepalive message after every CHANGES_THRESHOLD
> + * changes.
> + */
> + if (++changes_count >= CHANGES_THRESHOLD)
> + {
> + rb->update_progress_txn(rb, txn, change);
> + changes_count = 0;
> + }
> 
> I noticed you put the #define adjacent to the only usage of it,
> instead of with the other variable declaration like it was before.
> Probably it is better how you have done it, but:
> 
> 1a.
> The comment indentation is incorrect.
> 
> ~
> 
> 1b.
> Since the #define is adjacent to its only usage IMO now the 2nd
> comment is redundant. So the code can just say
> 
>/*
> * Sending keepalive messages after every change has some
> overhead, but
> * testing showed there is no noticeable overhead if
> keepalive is only
> * sent after every ~100 changes.
> */
> #define CHANGES_THRESHOLD 100
> if (++changes_count >= CHANGES_THRESHOLD)
> {
> rb->update_progress_txn(rb, txn, change);
> changes_count = 0;
> }

Changed as suggested.

Attach the new patch.

Regards,
Wang Wei


v6-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v6-0001-Fix-the-logical-replication-timeout-during-proces.patch

Re: Logical replication timeout problem

2023-01-23 Thread Peter Smith

Hi Hou-san, Here are my review comments for v5-0001.

==
src/backend/replication/logical/reorderbuffer.c

1.
@@ -2446,6 +2452,23 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
  elog(ERROR, "tuplecid value in changequeue");
  break;
  }
+
+ /*
+ * Sending keepalive messages after every change has some overhead, but
+ * testing showed there is no noticeable overhead if keepalive is only
+ * sent after every ~100 changes.
+ */
+#define CHANGES_THRESHOLD 100
+
+ /*
+ * Try to send a keepalive message after every CHANGES_THRESHOLD
+ * changes.
+ */
+ if (++changes_count >= CHANGES_THRESHOLD)
+ {
+ rb->update_progress_txn(rb, txn, change);
+ changes_count = 0;
+ }

I noticed you put the #define adjacent to the only usage of it,
instead of with the other variable declaration like it was before.
Probably it is better how you have done it, but:

1a.
The comment indentation is incorrect.

~

1b.
Since the #define is adjacent to its only usage IMO now the 2nd
comment is redundant. So the code can just say

   /*
* Sending keepalive messages after every change has some
overhead, but
* testing showed there is no noticeable overhead if
keepalive is only
* sent after every ~100 changes.
*/
#define CHANGES_THRESHOLD 100
if (++changes_count >= CHANGES_THRESHOLD)
{
rb->update_progress_txn(rb, txn, change);
changes_count = 0;
}

--
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: Logical replication timeout problem

2023-01-23 Thread houzj.f...@fujitsu.com

On Monday, January 23, 2023 8:51 AM Peter Smith  wrote:
> 
> Here are my review comments for patch v4-0001
> ==
> Commit message
> 
> 2.
> 
> The problem is when there is a DDL in a transaction that generates lots of
> temporary data due to rewrite rules, these temporary data will not be 
> processed
> by the pgoutput plugin. The previous commit (f95d53e) only fixed timeouts
> caused by filtering out changes in pgoutput. Therefore, the previous fix for 
> DML
> had no impact on this case.
> 
> ~
> 
> IMO this still some rewording to say up-front what the the actual problem -- 
> i.e.
> an avoidable timeout occuring.
> 
> SUGGESTION (or something like this...)
> 
> When there is a DDL in a transaction that generates lots of temporary data due
> to rewrite rules, this temporary data will not be processed by the pgoutput
> plugin. This means it is possible for a timeout to occur if a sufficiently 
> long time
> elapses since the last pgoutput message. A previous commit (f95d53e) fixed a
> similar scenario in this area, but that only fixed timeouts for DML going 
> through
> pgoutput, so it did not address this DDL timeout case.

Thanks, I changed the commit message as suggested.

> ==
> src/backend/replication/logical/logical.c
> 
> 3. update_progress_txn_cb_wrapper
> 
> +/*
> + * Update progress callback while processing a transaction.
> + *
> + * Try to update progress and send a keepalive message during sending
> +data of a
> + * transaction (and its subtransactions) to the output plugin.
> + *
> + * For a large transaction, if we don't send any change to the
> +downstream for a
> + * long time (exceeds the wal_receiver_timeout of standby) then it can 
> timeout.
> + * This can happen when all or most of the changes are either not
> +published or
> + * got filtered out.
> + */
> +static void
> +update_progress_txn_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN
> *txn,
> +ReorderBufferChange *change)
> 
> Simplify the "Try to..." paragraph. And other part should also mention about 
> DDL.
> 
> SUGGESTION
> 
> Try send a keepalive message during transaction processing.
> 
> This is done because if we don't send any change to the downstream for a long
> time (exceeds the wal_receiver_timeout of standby), then it can timeout. This 
> can
> happen for large DDL, or for large transactions when all or most of the 
> changes
> are either not published or got filtered out.

Changed.

> ==
> .../replication/logical/reorderbuffer.c
> 
> 4. ReorderBufferProcessTXN
> 
> @@ -2105,6 +2105,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
> ReorderBufferTXN *txn,
> 
>   PG_TRY();
>   {
> + /*
> + * Static variable used to accumulate the number of changes while
> + * processing txn.
> + */
> + static int changes_count = 0;
> +
> + /*
> + * Sending keepalive messages after every change has some overhead, but
> + * testing showed there is no noticeable overhead if keepalive is only
> + * sent after every ~100 changes.
> + */
> +#define CHANGES_THRESHOLD 100
> +
> 
> IMO these can be relocated to be declared/defined inside the "while"
> loop -- i.e. closer to where they are being used.

Moved into the while loop.

Attach the new version patch which addressed above comments.
Also attach a simple script which use "refresh matview" to reproduce
this timeout problem just in case some one want to try to reproduce this.

Best regards,
Hou zj


test.sh
Description: test.sh


v5-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v5-0001-Fix-the-logical-replication-timeout-during-proces.patch

Re: Logical replication timeout problem

2023-01-22 Thread Amit Kapila

On Mon, Jan 23, 2023 at 6:21 AM Peter Smith  wrote:
>
> 1.
>
> It makes no real difference, but I was wondering about:
> "update txn progress" versus "update progress txn"
>

Yeah, I think we can go either way but I still prefer "update progress
txn" as that is more closer to LogicalOutputPluginWriterUpdateProgress
callback name.

>
> 5.
>
> + if (++changes_count >= CHANGES_THRESHOLD)
> + {
> + rb->update_progress_txn(rb, txn, change);
> + changes_count = 0;
> + }
>
> When there is no update_progress function this code is still incurring
> some small additional overhead for incrementing and testing the
> THRESHOLD every time, and also needlessly calling to the wrapper every
> 100x. This overhead could be avoided with a simpler up-front check
> like shown below. OTOH, maybe the overhead is insignificant enough
> that just leaving the curent code is neater?
>

As far as built-in logical replication is concerned, it will be
defined and I don't know if the overhead will be significant enough in
this case. Also, one can say that for the cases it is defined, we are
adding this check multiple times (it is already checked inside
OutputPluginUpdateProgress). So, I would prefer a neat code here.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-22 Thread Peter Smith

Here are my review comments for patch v4-0001

==
General

1.

It makes no real difference, but I was wondering about:
"update txn progress" versus "update progress txn"

I thought that the first way sounds more natural. YMMV.

If you change this then there is impact for the typedef, function
names, comments, member names:

ReorderBufferUpdateTxnProgressCB -->  ReorderBufferUpdateProgressTxnCB

“/* update progress txn callback */” --> “/* update txn progress callback */”

update_progress_txn_cb_wrapper -->  update_txn_progress_cb_wrapper

updated_progress_txn --> update_txn_progress

==
Commit message

2.

The problem is when there is a DDL in a transaction that generates lots of
temporary data due to rewrite rules, these temporary data will not be processed
by the pgoutput plugin. The previous commit (f95d53e) only fixed timeouts
caused by filtering out changes in pgoutput. Therefore, the previous fix for
DML had no impact on this case.

~

IMO this still some rewording to say up-front what the the actual
problem -- i.e. an avoidable timeout occuring.

SUGGESTION (or something like this...)

When there is a DDL in a transaction that generates lots of temporary
data due to rewrite rules, this temporary data will not be processed
by the pgoutput plugin. This means it is possible for a timeout to
occur if a sufficiently long time elapses since the last pgoutput
message. A previous commit (f95d53e) fixed a similar scenario in this
area, but that only fixed timeouts for DML going through pgoutput, so
it did not address this DDL timeout case.

==
src/backend/replication/logical/logical.c

3. update_progress_txn_cb_wrapper

+/*
+ * Update progress callback while processing a transaction.
+ *
+ * Try to update progress and send a keepalive message during sending data of a
+ * transaction (and its subtransactions) to the output plugin.
+ *
+ * For a large transaction, if we don't send any change to the downstream for a
+ * long time (exceeds the wal_receiver_timeout of standby) then it can timeout.
+ * This can happen when all or most of the changes are either not published or
+ * got filtered out.
+ */
+static void
+update_progress_txn_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ReorderBufferChange *change)

Simplify the "Try to..." paragraph. And other part should also mention
about DDL.

SUGGESTION

Try send a keepalive message during transaction processing.

This is done because if we don't send any change to the downstream for
a long time (exceeds the wal_receiver_timeout of standby), then it can
timeout. This can happen for large DDL, or for large transactions when
all or most of the changes are either not published or got filtered
out.

==
.../replication/logical/reorderbuffer.c

4. ReorderBufferProcessTXN

@@ -2105,6 +2105,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,

  PG_TRY();
  {
+ /*
+ * Static variable used to accumulate the number of changes while
+ * processing txn.
+ */
+ static int changes_count = 0;
+
+ /*
+ * Sending keepalive messages after every change has some overhead, but
+ * testing showed there is no noticeable overhead if keepalive is only
+ * sent after every ~100 changes.
+ */
+#define CHANGES_THRESHOLD 100
+

IMO these can be relocated to be declared/defined inside the "while"
loop -- i.e. closer to where they are being used.

~~~

5.

+ if (++changes_count >= CHANGES_THRESHOLD)
+ {
+ rb->update_progress_txn(rb, txn, change);
+ changes_count = 0;
+ }

When there is no update_progress function this code is still incurring
some small additional overhead for incrementing and testing the
THRESHOLD every time, and also needlessly calling to the wrapper every
100x. This overhead could be avoided with a simpler up-front check
like shown below. OTOH, maybe the overhead is insignificant enough
that just leaving the curent code is neater?

LogicalDecodingContext *ctx = rb->private_data;
...
if (ctx->update_progress_txn && (++changes_count >= CHANGES_THRESHOLD))
{
rb->update_progress_txn(rb, txn, change);
changes_count = 0;
}

--
Kind Reagrds,
Peter Smith.
Fujitsu Australia

RE: Logical replication timeout problem

2023-01-19 Thread wangw.f...@fujitsu.com

On Fri, Jan 20, 2023 at 10:10 AM Peter Smith  wrote:
> Here are some review comments for patch v3-0001.

Thanks for your comments.

> ==
> Commit message
> 
> 1.
> The problem is when there is a DDL in a transaction that generates lots of
> temporary data due to rewrite rules, these temporary data will not be
> processed
> by the pgoutput - plugin. Therefore, the previous fix (f95d53e) for DML had no
> impact on this case.
> 
> ~
> 
> 1a.
> IMO this comment needs to give a bit of background about the original
> problem here, rather than just starting with "The problem is" which is
> describing the flaws of the previous fix.

Added some related message.

> ~
> 
> 1b.
> "pgoutput - plugin" -> "pgoutput plugin" ??

Changed.

> ~~~
> 
> 2.
> 
> To fix this, we introduced a new ReorderBuffer callback -
> 'ReorderBufferUpdateProgressCB'. This callback is called to try to update the
> process after each change has been processed during sending data of a
> transaction (and its subtransactions) to the output plugin.
> 
> IIUC it's not really "after each change" - shouldn't this comment
> mention something about the CHANGES_THRESHOLD 100?

Changed.

> ~~~
> 
> 4. update_progress_cb_wrapper
> 
> +/*
> + * Update progress callback
> + *
> + * Try to update progress and send a keepalive message if too many changes
> were
> + * processed when processing txn.
> + *
> + * For a large transaction, if we don't send any change to the downstream 
> for a
> + * long time (exceeds the wal_receiver_timeout of standby) then it can
> timeout.
> + * This can happen when all or most of the changes are either not published 
> or
> + * got filtered out.
> + */
> 
> SUGGESTION (instead of the "Try to update" sentence)
> Send a keepalive message whenever more than 
> changes are encountered while processing a transaction.

Since it's possible that keep-alive messages won't be sent even if the
threshold is reached (see function WalSndKeepaliveIfNecessary), I thought it
might be better to use "try to".
And rewrote the comments here because the threshold logic is moved to the
function ReorderBufferProcessTXN.

> ~~~
> 
> 5.
> 
> +static void
> +update_progress_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
> +ReorderBufferChange *change)
> +{
> + LogicalDecodingContext *ctx = cache->private_data;
> + LogicalErrorCallbackState state;
> + ErrorContextCallback errcallback;
> + static int changes_count = 0; /* Static variable used to accumulate
> + * the number of changes while
> + * processing txn. */
> +
> 
> IMO this may be more readable if the static 'changes_count' local var
> was declared first and separated from the other vars by a blank line.

Changed.

> ~~~
> 
> 6.
> 
> + /*
> + * We don't want to try sending a keepalive message after processing each
> + * change as that can have overhead. Tests revealed that there is no
> + * noticeable overhead in doing it after continuously processing 100 or so
> + * changes.
> + */
> +#define CHANGES_THRESHOLD 100
> 
> 6a.
> I think it might be better to define this right at the top of the
> function adjacent to the 'changes_count' variable (e.g. a bit like the
> original HEAD code looked)

Changed.

> ~
> 
> 6b.
> SUGGESTION (for the comment)
> Sending keepalive messages after every change has some overhead, but
> testing showed there is no noticeable overhead if keepalive is only
> sent after every ~100 changes.

Changed.

> ~~~
> 
> 7.
> 
> +
> + /*
> + * After continuously processing CHANGES_THRESHOLD changes, we
> + * try to send a keepalive message if required.
> + */
> + if (++changes_count >= CHANGES_THRESHOLD)
> + {
> + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> + changes_count = 0;
> + }
> +
> 
> 7a.
> SUGGESTION (for comment)
> Send a keepalive message after every CHANGES_THRESHOLD changes.

Changed.

Regards,
Wang Wei

RE: Logical replication timeout problem

2023-01-19 Thread wangw.f...@fujitsu.com

On Fri, Jan 20, 2023 at 12:35 PM Amit Kapila  wrote:
> On Fri, Jan 20, 2023 at 7:40 AM Peter Smith  wrote:
> >
> > Here are some review comments for patch v3-0001.
> >
> > ==
> > src/backend/replication/logical/logical.c
> >
> > 3. forward declaration
> >
> > +/* update progress callback */
> > +static void update_progress_cb_wrapper(ReorderBuffer *cache,
> > +ReorderBufferTXN *txn,
> > +ReorderBufferChange *change);
> >
> > I felt this function wrapper name was a bit misleading... AFAIK every
> > other wrapper really does just wrap their respective functions. But
> > this one seems a bit different because it calls the wrapped function
> > ONLY if some threshold is exceeded. IMO maybe this function could have
> > some name that conveys this better:
> >
> > e.g. update_progress_cb_wrapper_with_threshold
> >
> 
> I am wondering whether it would be better to move the threshold logic
> to the caller. Previously this logic was inside the function because
> it was being invoked from multiple places but now that won't be the
> case. Also, then your concern about the name would also be addressed.

Agree. Moved the threshold logic to the function ReorderBufferProcessTXN.

> >
> > ~
> >
> > 7b.
> > Would it be neater to just call OutputPluginUpdateProgress here instead?
> >
> > e.g.
> > BEFORE
> > ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> > AFTER
> > OutputPluginUpdateProgress(ctx, false);
> >
> 
> We already check whether ctx->update_progress is defined or not which
> is the only extra job done by OutputPluginUpdateProgress but probably
> we can consolidate the checks and directly invoke
> OutputPluginUpdateProgress.

Changed. Invoke the function OutputPluginUpdateProgress directly in the new
callback.

Regards,
Wang Wei

RE: Logical replication timeout problem

2023-01-19 Thread wangw.f...@fujitsu.com

On Thu, Jan 19, 2023 at 19:37 PM Amit Kapila  wrote:
> On Thu, Jan 19, 2023 at 4:13 PM Ashutosh Bapat
>  wrote:
> >
> > On Wed, Jan 18, 2023 at 6:00 PM Amit Kapila 
> wrote:
> >
> > > + */
> > > + ReorderBufferUpdateProgressCB update_progress;
> > >
> > > Are you suggesting changing the name of the above variable? If so, how
> > > about apply_progress, progress, or updateprogress? If you don't like
> > > any of these then feel free to suggest something else. If we change
> > > the variable name then accordingly, we need to update
> > > ReorderBufferUpdateProgressCB as well.
> > >
> >
> > I would liked to have all the callback names renamed with prefix
> > "rbcb_xxx" so that they have very less chances of conflicting with
> > similar names in the code base. But it's probably late to do that :).
> >
> > How are update_txn_progress since the CB is supposed to be used only
> > within a transaction? or update_progress_txn?
> >
> 
> Personally, I would prefer 'apply_progress' as it would be similar to
> a few other callbacks like apply_change, apply_truncate, or as is
> proposed by patch update_progress again because it is similar to
> existing callbacks like commit_prepared. If you and others don't like
> any of those then we can go for 'update_progress_txn' as well. Anybody
> else has an opinion on this?

I think 'update_progress_txn' might be better. Because I think this name seems
to make it easier to know that this callback is used to update process when
processing txn. So, I rename it to 'update_progress_txn'.

I have addressed all the comments and here is the new version patch.

Regards,
Wang Wei


v4-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v4-0001-Fix-the-logical-replication-timeout-during-proces.patch

Re: Logical replication timeout problem

2023-01-19 Thread Peter Smith

On Fri, Jan 20, 2023 at 3:35 PM Amit Kapila  wrote:
>
> On Fri, Jan 20, 2023 at 7:40 AM Peter Smith  wrote:
> >
> > Here are some review comments for patch v3-0001.
> >
> > ==
> > src/backend/replication/logical/logical.c
> >
> > 3. forward declaration
> >
> > +/* update progress callback */
> > +static void update_progress_cb_wrapper(ReorderBuffer *cache,
> > +ReorderBufferTXN *txn,
> > +ReorderBufferChange *change);
> >
> > I felt this function wrapper name was a bit misleading... AFAIK every
> > other wrapper really does just wrap their respective functions. But
> > this one seems a bit different because it calls the wrapped function
> > ONLY if some threshold is exceeded. IMO maybe this function could have
> > some name that conveys this better:
> >
> > e.g. update_progress_cb_wrapper_with_threshold
> >
>
> I am wondering whether it would be better to move the threshold logic
> to the caller. Previously this logic was inside the function because
> it was being invoked from multiple places but now that won't be the
> case. Also, then your concern about the name would also be addressed.
>
> >
> > ~
> >
> > 7b.
> > Would it be neater to just call OutputPluginUpdateProgress here instead?
> >
> > e.g.
> > BEFORE
> > ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> > AFTER
> > OutputPluginUpdateProgress(ctx, false);
> >
>
> We already check whether ctx->update_progress is defined or not which
> is the only extra job done by OutputPluginUpdateProgress but probably
> we can consolidate the checks and directly invoke
> OutputPluginUpdateProgress.
>

Yes, I saw that, but I thought it was better to keep the early exit
from update_progress_cb_wrapper, so incurring just one additional
boolean check for every 100 changes was not anything to worry about.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: Logical replication timeout problem

2023-01-19 Thread Amit Kapila

On Fri, Jan 20, 2023 at 7:40 AM Peter Smith  wrote:
>
> Here are some review comments for patch v3-0001.
>
> ==
> src/backend/replication/logical/logical.c
>
> 3. forward declaration
>
> +/* update progress callback */
> +static void update_progress_cb_wrapper(ReorderBuffer *cache,
> +ReorderBufferTXN *txn,
> +ReorderBufferChange *change);
>
> I felt this function wrapper name was a bit misleading... AFAIK every
> other wrapper really does just wrap their respective functions. But
> this one seems a bit different because it calls the wrapped function
> ONLY if some threshold is exceeded. IMO maybe this function could have
> some name that conveys this better:
>
> e.g. update_progress_cb_wrapper_with_threshold
>

I am wondering whether it would be better to move the threshold logic
to the caller. Previously this logic was inside the function because
it was being invoked from multiple places but now that won't be the
case. Also, then your concern about the name would also be addressed.

>
> ~
>
> 7b.
> Would it be neater to just call OutputPluginUpdateProgress here instead?
>
> e.g.
> BEFORE
> ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> AFTER
> OutputPluginUpdateProgress(ctx, false);
>

We already check whether ctx->update_progress is defined or not which
is the only extra job done by OutputPluginUpdateProgress but probably
we can consolidate the checks and directly invoke
OutputPluginUpdateProgress.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-19 Thread Peter Smith

Here are some review comments for patch v3-0001.

==
Commit message

1.
The problem is when there is a DDL in a transaction that generates lots of
temporary data due to rewrite rules, these temporary data will not be processed
by the pgoutput - plugin. Therefore, the previous fix (f95d53e) for DML had no
impact on this case.

~

1a.
IMO this comment needs to give a bit of background about the original
problem here, rather than just starting with "The problem is" which is
describing the flaws of the previous fix.

~

1b.
"pgoutput - plugin" -> "pgoutput plugin" ??

~~~

2.

To fix this, we introduced a new ReorderBuffer callback -
'ReorderBufferUpdateProgressCB'. This callback is called to try to update the
process after each change has been processed during sending data of a
transaction (and its subtransactions) to the output plugin.

IIUC it's not really "after each change" - shouldn't this comment
mention something about the CHANGES_THRESHOLD 100?

==
src/backend/replication/logical/logical.c

3. forward declaration

+/* update progress callback */
+static void update_progress_cb_wrapper(ReorderBuffer *cache,
+ReorderBufferTXN *txn,
+ReorderBufferChange *change);

I felt this function wrapper name was a bit misleading... AFAIK every
other wrapper really does just wrap their respective functions. But
this one seems a bit different because it calls the wrapped function
ONLY if some threshold is exceeded. IMO maybe this function could have
some name that conveys this better:

e.g. update_progress_cb_wrapper_with_threshold

~~~

4. update_progress_cb_wrapper

+/*
+ * Update progress callback
+ *
+ * Try to update progress and send a keepalive message if too many changes were
+ * processed when processing txn.
+ *
+ * For a large transaction, if we don't send any change to the downstream for a
+ * long time (exceeds the wal_receiver_timeout of standby) then it can timeout.
+ * This can happen when all or most of the changes are either not published or
+ * got filtered out.
+ */

SUGGESTION (instead of the "Try to update" sentence)
Send a keepalive message whenever more than 
changes are encountered while processing a transaction.

~~~

5.

+static void
+update_progress_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ReorderBufferChange *change)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ static int changes_count = 0; /* Static variable used to accumulate
+ * the number of changes while
+ * processing txn. */
+

IMO this may be more readable if the static 'changes_count' local var
was declared first and separated from the other vars by a blank line.

~~~

6.

+ /*
+ * We don't want to try sending a keepalive message after processing each
+ * change as that can have overhead. Tests revealed that there is no
+ * noticeable overhead in doing it after continuously processing 100 or so
+ * changes.
+ */
+#define CHANGES_THRESHOLD 100

6a.
I think it might be better to define this right at the top of the
function adjacent to the 'changes_count' variable (e.g. a bit like the
original HEAD code looked)

~

6b.
SUGGESTION (for the comment)
Sending keepalive messages after every change has some overhead, but
testing showed there is no noticeable overhead if keepalive is only
sent after every ~100 changes.

~~~

7.

+
+ /*
+ * After continuously processing CHANGES_THRESHOLD changes, we
+ * try to send a keepalive message if required.
+ */
+ if (++changes_count >= CHANGES_THRESHOLD)
+ {
+ ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
+ changes_count = 0;
+ }
+

7a.
SUGGESTION (for comment)
Send a keepalive message after every CHANGES_THRESHOLD changes.

~

7b.
Would it be neater to just call OutputPluginUpdateProgress here instead?

e.g.
BEFORE
ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
AFTER
OutputPluginUpdateProgress(ctx, false);

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Logical replication timeout problem

2023-01-19 Thread Amit Kapila

On Thu, Jan 19, 2023 at 4:13 PM Ashutosh Bapat
 wrote:
>
> On Wed, Jan 18, 2023 at 6:00 PM Amit Kapila  wrote:
>
> > + */
> > + ReorderBufferUpdateProgressCB update_progress;
> >
> > Are you suggesting changing the name of the above variable? If so, how
> > about apply_progress, progress, or updateprogress? If you don't like
> > any of these then feel free to suggest something else. If we change
> > the variable name then accordingly, we need to update
> > ReorderBufferUpdateProgressCB as well.
> >
>
> I would liked to have all the callback names renamed with prefix
> "rbcb_xxx" so that they have very less chances of conflicting with
> similar names in the code base. But it's probably late to do that :).
>
> How are update_txn_progress since the CB is supposed to be used only
> within a transaction? or update_progress_txn?
>

Personally, I would prefer 'apply_progress' as it would be similar to
a few other callbacks like apply_change, apply_truncate, or as is
proposed by patch update_progress again because it is similar to
existing callbacks like commit_prepared. If you and others don't like
any of those then we can go for 'update_progress_txn' as well. Anybody
else has an opinion on this?

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-19 Thread Ashutosh Bapat

On Wed, Jan 18, 2023 at 6:00 PM Amit Kapila  wrote:

> + */
> + ReorderBufferUpdateProgressCB update_progress;
>
> Are you suggesting changing the name of the above variable? If so, how
> about apply_progress, progress, or updateprogress? If you don't like
> any of these then feel free to suggest something else. If we change
> the variable name then accordingly, we need to update
> ReorderBufferUpdateProgressCB as well.
>

I would liked to have all the callback names renamed with prefix
"rbcb_xxx" so that they have very less chances of conflicting with
similar names in the code base. But it's probably late to do that :).

How are update_txn_progress since the CB is supposed to be used only
within a transaction? or update_progress_txn?
update_progress_cb_wrapper needs a change of name as well.

-- 
Best Wishes,
Ashutosh Bapat

Re: Logical replication timeout problem

2023-01-18 Thread Amit Kapila

On Wed, Jan 18, 2023 at 5:37 PM Ashutosh Bapat
 wrote:
>
> On Wed, Jan 18, 2023 at 1:49 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Wed, Jan 18, 2023 at 13:29 PM Amit Kapila  
> > wrote:
> > > On Tue, Jan 17, 2023 at 6:41 PM Ashutosh Bapat
> > >  wrote:
> > > >
> > > > On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila  
> > > > wrote:
> > > >
> > > > > >
> > > > > > I am a bit worried about the indirections that the wrappers and 
> > > > > > hooks
> > > > > > create. Output plugins call OutputPluginUpdateProgress() in 
> > > > > > callbacks
> > > > > > but I don't see why  ReorderBufferProcessTXN() needs a callback to
> > > > > > call OutputPluginUpdateProgress.
> > > > > >
> > > > >
> > > > > Yeah, I think we can do it as we are doing the previous approach but
> > > > > we need an additional wrapper (update_progress_cb_wrapper()) as the
> > > > > current patch has so that we can add error context information. This
> > > > > is similar to why we have a wrapper for all other callbacks like
> > > > > change_cb_wrapper.
> > > > >
> > > >
> > > > Ultimately OutputPluginUpdateProgress() will be called - which in turn
> > > > will call ctx->update_progress.
> > > >
> > >
> > > No, update_progress_cb_wrapper() should directly call
> > > ctx->update_progress(). The key reason to have a
> > > update_progress_cb_wrapper() is that it allows us to add error context
> > > information (see the usage of output_plugin_error_callback).
> >
> > I think it makes sense. This also avoids the need for every output plugin to
> > implement the callback. So I tried to improve the patch based on this 
> > approach.
> >
> > And I tried to add some comments for this new callback to distinguish it 
> > from
> > ctx->update_progress.
>
> Comments don't help when using cscope or some such code browsing tool.
> Better to use a different variable name.
>

+ /*
+ * Callback to be called when updating progress during sending data of a
+ * transaction (and its subtransactions) to the output plugin.
+ */
+ ReorderBufferUpdateProgressCB update_progress;

Are you suggesting changing the name of the above variable? If so, how
about apply_progress, progress, or updateprogress? If you don't like
any of these then feel free to suggest something else. If we change
the variable name then accordingly, we need to update
ReorderBufferUpdateProgressCB as well.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-18 Thread Ashutosh Bapat

On Wed, Jan 18, 2023 at 1:49 PM wangw.f...@fujitsu.com
 wrote:
>
> On Wed, Jan 18, 2023 at 13:29 PM Amit Kapila  wrote:
> > On Tue, Jan 17, 2023 at 6:41 PM Ashutosh Bapat
> >  wrote:
> > >
> > > On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila  
> > > wrote:
> > >
> > > > >
> > > > > I am a bit worried about the indirections that the wrappers and hooks
> > > > > create. Output plugins call OutputPluginUpdateProgress() in callbacks
> > > > > but I don't see why  ReorderBufferProcessTXN() needs a callback to
> > > > > call OutputPluginUpdateProgress.
> > > > >
> > > >
> > > > Yeah, I think we can do it as we are doing the previous approach but
> > > > we need an additional wrapper (update_progress_cb_wrapper()) as the
> > > > current patch has so that we can add error context information. This
> > > > is similar to why we have a wrapper for all other callbacks like
> > > > change_cb_wrapper.
> > > >
> > >
> > > Ultimately OutputPluginUpdateProgress() will be called - which in turn
> > > will call ctx->update_progress.
> > >
> >
> > No, update_progress_cb_wrapper() should directly call
> > ctx->update_progress(). The key reason to have a
> > update_progress_cb_wrapper() is that it allows us to add error context
> > information (see the usage of output_plugin_error_callback).
>
> I think it makes sense. This also avoids the need for every output plugin to
> implement the callback. So I tried to improve the patch based on this 
> approach.
>
> And I tried to add some comments for this new callback to distinguish it from
> ctx->update_progress.

Comments don't help when using cscope or some such code browsing tool.
Better to use a different variable name.

-- 
Best Wishes,
Ashutosh Bapat

RE: Logical replication timeout problem

2023-01-18 Thread wangw.f...@fujitsu.com

On Wed, Jan 18, 2023 at 13:29 PM Amit Kapila  wrote:
> On Tue, Jan 17, 2023 at 6:41 PM Ashutosh Bapat
>  wrote:
> >
> > On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila  wrote:
> >
> > > >
> > > > I am a bit worried about the indirections that the wrappers and hooks
> > > > create. Output plugins call OutputPluginUpdateProgress() in callbacks
> > > > but I don't see why  ReorderBufferProcessTXN() needs a callback to
> > > > call OutputPluginUpdateProgress.
> > > >
> > >
> > > Yeah, I think we can do it as we are doing the previous approach but
> > > we need an additional wrapper (update_progress_cb_wrapper()) as the
> > > current patch has so that we can add error context information. This
> > > is similar to why we have a wrapper for all other callbacks like
> > > change_cb_wrapper.
> > >
> >
> > Ultimately OutputPluginUpdateProgress() will be called - which in turn
> > will call ctx->update_progress.
> >
> 
> No, update_progress_cb_wrapper() should directly call
> ctx->update_progress(). The key reason to have a
> update_progress_cb_wrapper() is that it allows us to add error context
> information (see the usage of output_plugin_error_callback).

I think it makes sense. This also avoids the need for every output plugin to
implement the callback. So I tried to improve the patch based on this approach.

And I tried to add some comments for this new callback to distinguish it from
ctx->update_progress.

Attach the new patch.

Regards,
Wang Wei


v3-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v3-0001-Fix-the-logical-replication-timeout-during-proces.patch

Re: Logical replication timeout problem

2023-01-17 Thread Amit Kapila

On Tue, Jan 17, 2023 at 6:41 PM Ashutosh Bapat
 wrote:
>
> On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila  wrote:
>
> > >
> > > I am a bit worried about the indirections that the wrappers and hooks
> > > create. Output plugins call OutputPluginUpdateProgress() in callbacks
> > > but I don't see why  ReorderBufferProcessTXN() needs a callback to
> > > call OutputPluginUpdateProgress.
> > >
> >
> > Yeah, I think we can do it as we are doing the previous approach but
> > we need an additional wrapper (update_progress_cb_wrapper()) as the
> > current patch has so that we can add error context information. This
> > is similar to why we have a wrapper for all other callbacks like
> > change_cb_wrapper.
> >
>
> Ultimately OutputPluginUpdateProgress() will be called - which in turn
> will call ctx->update_progress.
>

No, update_progress_cb_wrapper() should directly call
ctx->update_progress(). The key reason to have a
update_progress_cb_wrapper() is that it allows us to add error context
information (see the usage of output_plugin_error_callback).

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-17 Thread Ashutosh Bapat

On Tue, Jan 17, 2023 at 3:34 PM Amit Kapila  wrote:

> >
> > I am a bit worried about the indirections that the wrappers and hooks
> > create. Output plugins call OutputPluginUpdateProgress() in callbacks
> > but I don't see why  ReorderBufferProcessTXN() needs a callback to
> > call OutputPluginUpdateProgress.
> >
>
> Yeah, I think we can do it as we are doing the previous approach but
> we need an additional wrapper (update_progress_cb_wrapper()) as the
> current patch has so that we can add error context information. This
> is similar to why we have a wrapper for all other callbacks like
> change_cb_wrapper.
>

Ultimately OutputPluginUpdateProgress() will be called - which in turn
will call ctx->update_progress. I don't see wrappers around
OutputPluginWrite or OutputPluginPrepareWrite. But I see that those
two are called always from output plugin, so indirectly those are
called through a wrapper. I also see that update_progress_cb_wrapper()
is similar, as far as wrapper is concerned, to
ReorderBufferUpdateProgress() in the earlier patch.
ReorderBufferUpdateProgress() looks more readable than the wrapper.

If we want to keep the wrapper at least we should use a different
variable name. update_progress is also there LogicalDecodingContext
and will be indirectly called from ReorderBuffer::update_progress.
Somebody might think that there's some recursion involved there.
That's a mighty confusion.

-- 
Best Wishes,
Ashutosh Bapat

Re: Logical replication timeout problem

2023-01-17 Thread Amit Kapila

On Mon, Jan 16, 2023 at 10:06 PM Ashutosh Bapat
 wrote:
>
> On Wed, Jan 11, 2023 at 4:11 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Mon, Jan 9, 2023 at 13:04 PM Amit Kapila  wrote:
> > >
> >
> > Thanks for your comments.
> >
> > > One more thing, I think it would be better to expose a new callback
> > > API via reorder buffer as suggested previously [2] similar to other
> > > reorder buffer APIs instead of directly using reorderbuffer API to
> > > invoke plugin API.
> >
> > Yes, I agree. I think it would be better to add a new callback API on the 
> > HEAD.
> > So, I improved the fix approach:
> > Introduce a new optional callback to update the process. This callback 
> > function
> > is invoked at the end inside the main loop of the function
> > ReorderBufferProcessTXN() for each change. In this way, I think it seems 
> > that
> > similar timeout problems could be avoided.
>
> I am a bit worried about the indirections that the wrappers and hooks
> create. Output plugins call OutputPluginUpdateProgress() in callbacks
> but I don't see why  ReorderBufferProcessTXN() needs a callback to
> call OutputPluginUpdateProgress.
>

Yeah, I think we can do it as we are doing the previous approach but
we need an additional wrapper (update_progress_cb_wrapper()) as the
current patch has so that we can add error context information. This
is similar to why we have a wrapper for all other callbacks like
change_cb_wrapper.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-16 Thread Ashutosh Bapat

On Wed, Jan 11, 2023 at 4:11 PM wangw.f...@fujitsu.com
 wrote:
>
> On Mon, Jan 9, 2023 at 13:04 PM Amit Kapila  wrote:
> >
>
> Thanks for your comments.
>
> > One more thing, I think it would be better to expose a new callback
> > API via reorder buffer as suggested previously [2] similar to other
> > reorder buffer APIs instead of directly using reorderbuffer API to
> > invoke plugin API.
>
> Yes, I agree. I think it would be better to add a new callback API on the 
> HEAD.
> So, I improved the fix approach:
> Introduce a new optional callback to update the process. This callback 
> function
> is invoked at the end inside the main loop of the function
> ReorderBufferProcessTXN() for each change. In this way, I think it seems that
> similar timeout problems could be avoided.

I am a bit worried about the indirections that the wrappers and hooks
create. Output plugins call OutputPluginUpdateProgress() in callbacks
but I don't see why  ReorderBufferProcessTXN() needs a callback to
call OutputPluginUpdateProgress. I don't think output plugins are
going to do anything special with that callback than just call
OutputPluginUpdateProgress. Every output plugin will need to implement
it and if they do not they will face the timeout problem. That would
be unnecessary. Instead ReorderBufferUpdateProgress() in your first
patch was more direct and readable. That way the fix works for any
output plugin. In fact, I am wondering whether we could have a call in
ReorderBufferProcessTxn() at the end of transaction
(commit/prepare/commit prepared/abort prepared) instead of the
corresponding output plugin callbacks calling
OutputPluginUpdateProgress().

-- 
Best Wishes,
Ashutosh Bapat

Re: Logical replication timeout problem

2023-01-16 Thread Ashutosh Bapat

On Mon, Jan 9, 2023 at 4:08 PM wangw.f...@fujitsu.com
 wrote:
>
> On Fri, Jan 6, 2023 at 15:06 PM Ashutosh Bapat  
> wrote:
>
> I'm not sure if we need to add global variables or member variables for a
> cumulative count that is only used here. How would you feel if I add some
> comments when declaring this static variable?

I see WalSndUpdateProgress::sendTime is static already. So this seems
fine. A comment will help sure.

>
> > +
> > +if (!ctx->update_progress)
> > +return;
> > +
> > +Assert(!ctx->fast_forward);
> > +
> > +/* set output state */
> > +ctx->accept_writes = false;
> > +ctx->write_xid = txn->xid;
> > +ctx->write_location = change->lsn;
> > +ctx->end_xact = false;
> >
> > This patch reverts many of the changes of the previous commit which tried to
> > fix this issue i.e. 8df2374. end_xact was introduced by the same commit 
> > but
> > without much explanation of that in the commit message. Its only user,
> > WalSndUpdateProgress(), is probably making a wrong assumption as well.
> >
> >  * We don't have a mechanism to get the ack for any LSN other than end
> >  * xact LSN from the downstream. So, we track lag only for end of
> >  * transaction LSN.
> >
> > IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr 
> > which is
> > sent downstream through a keep alive message. Downstream may
> > acknowledge this
> > LSN. So we do get ack for any LSN, not just commit LSN.
> >
> > So I propose removing end_xact as well.
>
> We didn't track the lag during a transaction because it could make the
> calculations of lag functionality inaccurate. If we track every lsn, it could
> fail to record important lsn information because of
> WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress).
> Please see details in [1] and [2].

LagTrackerRead() interpolates to reduce the inaccuracy. I don't
understand why we need to track the end LSN only. But I don't think
that affects this fix. So I am fine if we want to leave end_xact
there.

-- 
Best Wishes,
Ashutosh Bapat

RE: Logical replication timeout problem

2023-01-11 Thread wangw.f...@fujitsu.com

On Mon, Jan 9, 2023 at 13:04 PM Amit Kapila  wrote:
>

Thanks for your comments.

> One more thing, I think it would be better to expose a new callback
> API via reorder buffer as suggested previously [2] similar to other
> reorder buffer APIs instead of directly using reorderbuffer API to
> invoke plugin API.

Yes, I agree. I think it would be better to add a new callback API on the HEAD.
So, I improved the fix approach:
Introduce a new optional callback to update the process. This callback function
is invoked at the end inside the main loop of the function
ReorderBufferProcessTXN() for each change. In this way, I think it seems that
similar timeout problems could be avoided.

BTW, I did the performance test for this patch. When running the SQL that
reproduces the problem (refresh the materialized view in sync logical
replication mode), the running time of new function pgoutput_update_progress is
less than 0.1% of the total time. I think this result looks OK.

Attach the new patch.

Regards,
Wang Wei

v2-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v2-0001-Fix-the-logical-replication-timeout-during-proces.patch

RE: Logical replication timeout problem

2023-01-09 Thread wangw.f...@fujitsu.com

On Fri, Jan 6, 2023 at 15:06 PM Ashutosh Bapat  
wrote:
> Hi Wang,
> Thanks for working on this. One of our customer faced a similar
> situation when running BDR with PostgreSQL.
> 
> I tested your patch and it solves the problem.
> 
> Please find some review comments below

Thanks for your testing and comments.

> +/*
> + * Helper function for ReorderBufferProcessTXN for updating progress.
> + */
> +static inline void
> +ReorderBufferUpdateProgress(ReorderBuffer *rb, ReorderBufferTXN *txn,
> +ReorderBufferChange *change)
> +{
> +LogicalDecodingContext *ctx = rb->private_data;
> +static intchanges_count = 0;
> 
> It's not easy to know that a variable is static when reading the code which
> uses it. So it's easy to interpret code wrong. I would probably track it
> through logical decoding context itself OR through a global variable like 
> other
> places where we track the last timestamps. But there's more below on this.

I'm not sure if we need to add global variables or member variables for a
cumulative count that is only used here. How would you feel if I add some
comments when declaring this static variable?

> +
> +if (!ctx->update_progress)
> +return;
> +
> +Assert(!ctx->fast_forward);
> +
> +/* set output state */
> +ctx->accept_writes = false;
> +ctx->write_xid = txn->xid;
> +ctx->write_location = change->lsn;
> +ctx->end_xact = false;
> 
> This patch reverts many of the changes of the previous commit which tried to
> fix this issue i.e. 8df2374. end_xact was introduced by the same commit 
> but
> without much explanation of that in the commit message. Its only user,
> WalSndUpdateProgress(), is probably making a wrong assumption as well.
> 
>  * We don't have a mechanism to get the ack for any LSN other than end
>  * xact LSN from the downstream. So, we track lag only for end of
>  * transaction LSN.
> 
> IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr which 
> is
> sent downstream through a keep alive message. Downstream may
> acknowledge this
> LSN. So we do get ack for any LSN, not just commit LSN.
> 
> So I propose removing end_xact as well.

We didn't track the lag during a transaction because it could make the
calculations of lag functionality inaccurate. If we track every lsn, it could
fail to record important lsn information because of
WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress).
Please see details in [1] and [2].

Regards,
Wang Wei

[1] - 
https://www.postgresql.org/message-id/OS3PR01MB62755D216245199554DDC8DB9EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com
[2] - 
https://www.postgresql.org/message-id/OS3PR01MB627514AE0B3040D8F55A68B99EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Re: Logical replication timeout problem

2023-01-08 Thread Amit Kapila

On Fri, Jan 6, 2023 at 12:35 PM Ashutosh Bapat
 wrote:
>
> +
> +/*
> + * We don't want to try sending a keepalive message after processing each
> + * change as that can have overhead. Tests revealed that there is no
> + * noticeable overhead in doing it after continuously processing 100 or 
> so
> + * changes.
> + */
> +#define CHANGES_THRESHOLD 100
>
> I think a time based threashold makes more sense. What if the timeout was
> nearing and those 100 changes just took little more time causing a timeout? We
> already have a time based threashold in WalSndKeepaliveIfNecessary(). And that
> function is invoked after reading every WAL record in WalSndLoop(). So it does
> not look like it's an expensive function. If it is expensive we might want to
> worry about WalSndLoop as well. Does it make more sense to remove this
> threashold?
>

We have previously tried this for every change [1] and it brings
noticeable overhead. In fact, even doing it for every 10 changes also
had some overhead which is why we reached this threshold number. I
don't think it can lead to timeout due to skipping changes but sure if
we see any such report we can further fine-tune this setting or will
try to make it time-based but for now I feel it would be safe to use
this threshold.

> +
> +/*
> + * After continuously processing CHANGES_THRESHOLD changes, we
> + * try to send a keepalive message if required.
> + */
> +if (++changes_count >= CHANGES_THRESHOLD)
> +{
> +ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, 
> false);
> +changes_count = 0;
> +}
> +}
> +
>
> On the other thread, I mentioned that we don't have a TAP test for it.
> I agree with
> Amit's opinion there that it's hard to create a test which will timeout
> everywhere. I think what we need is a way to control the time required for
> decoding a transaction.
>
> A rough idea is to induce a small sleep after decoding every change. The 
> amount
> of sleep * number of changes will help us estimate and control the amount of
> time taken to decode a transaction. Then we create a transaction which will
> take longer than the timeout threashold to decode. But that's a
> significant code. I
> don't think PostgreSQL has a facility to induce a delay at a particular place
> in the code.
>

Yeah, I don't know how to induce such a delay while decoding changes.

One more thing, I think it would be better to expose a new callback
API via reorder buffer as suggested previously [2] similar to other
reorder buffer APIs instead of directly using reorderbuffer API to
invoke plugin API.


[1] - 
https://www.postgresql.org/message-id/OS3PR01MB6275DFFDAC7A59FA148931529E209%40OS3PR01MB6275.jpnprd01.prod.outlook.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1%2BfQjndoBOFUn9Wy0hhm3MLyUWEpcT9O7iuCELktfdBiQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2023-01-05 Thread Ashutosh Bapat

Hi Wang,
Thanks for working on this. One of our customer faced a similar
situation when running BDR with PostgreSQL.

I tested your patch and it solves the problem.

Please find some review comments below

On Tue, Nov 8, 2022 at 8:34 AM wangw.f...@fujitsu.com
 wrote:
>
>
> Attach the patch.
>

+/*
+ * Helper function for ReorderBufferProcessTXN for updating progress.
+ */
+static inline void
+ReorderBufferUpdateProgress(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ReorderBufferChange *change)
+{
+LogicalDecodingContext *ctx = rb->private_data;
+static intchanges_count = 0;

It's not easy to know that a variable is static when reading the code which
uses it. So it's easy to interpret code wrong. I would probably track it
through logical decoding context itself OR through a global variable like other
places where we track the last timestamps. But there's more below on this.

+
+if (!ctx->update_progress)
+return;
+
+Assert(!ctx->fast_forward);
+
+/* set output state */
+ctx->accept_writes = false;
+ctx->write_xid = txn->xid;
+ctx->write_location = change->lsn;
+ctx->end_xact = false;

This patch reverts many of the changes of the previous commit which tried to
fix this issue i.e. 8df2374. end_xact was introduced by the same commit but
without much explanation of that in the commit message. Its only user,
WalSndUpdateProgress(), is probably making a wrong assumption as well.

 * We don't have a mechanism to get the ack for any LSN other than end
 * xact LSN from the downstream. So, we track lag only for end of
 * transaction LSN.

IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr which is
sent downstream through a keep alive message. Downstream may acknowledge this
LSN. So we do get ack for any LSN, not just commit LSN.

So I propose removing end_xact as well.

+
+/*
+ * We don't want to try sending a keepalive message after processing each
+ * change as that can have overhead. Tests revealed that there is no
+ * noticeable overhead in doing it after continuously processing 100 or so
+ * changes.
+ */
+#define CHANGES_THRESHOLD 100

I think a time based threashold makes more sense. What if the timeout was
nearing and those 100 changes just took little more time causing a timeout? We
already have a time based threashold in WalSndKeepaliveIfNecessary(). And that
function is invoked after reading every WAL record in WalSndLoop(). So it does
not look like it's an expensive function. If it is expensive we might want to
worry about WalSndLoop as well. Does it make more sense to remove this
threashold?

+
+/*
+ * After continuously processing CHANGES_THRESHOLD changes, we
+ * try to send a keepalive message if required.
+ */
+if (++changes_count >= CHANGES_THRESHOLD)
+{
+ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
+changes_count = 0;
+}
+}
+

On the other thread, I mentioned that we don't have a TAP test for it.
I agree with
Amit's opinion there that it's hard to create a test which will timeout
everywhere. I think what we need is a way to control the time required for
decoding a transaction.

A rough idea is to induce a small sleep after decoding every change. The amount
of sleep * number of changes will help us estimate and control the amount of
time taken to decode a transaction. Then we create a transaction which will
take longer than the timeout threashold to decode. But that's a
significant code. I
don't think PostgreSQL has a facility to induce a delay at a particular place
in the code.

-- 
Best Wishes,
Ashutosh Bapat

RE: Logical replication timeout problem

2022-11-07 Thread wangw.f...@fujitsu.com

On Fri, Nov 4, 2022 at 18:13 PM Fabrice Chapuis  wrote:
> Hello Wang,
> 
> I tested the draft patch in my lab for Postgres 14.4, the refresh of the
> materialized view ran without generating the timeout on the worker.
> Do you plan to propose this patch at the next commit fest.

Thanks for your confirmation!
I will add this thread to the commit fest soon.

The following is the problem analysis and fix approach:
I think the problem is when there is a DDL in a transaction that generates lots
of temporary data due to rewrite rules, these temporary data will not be
processed by the pgoutput - plugin. Therefore, the previous fix (f95d53e) for
DML had no impact on this case.

To fix this, I think we need to try to send the keepalive messages after each
change is processed by walsender, not in the pgoutput-plugin.

Attach the patch.

Regards,
Wang wei

v1-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v1-0001-Fix-the-logical-replication-timeout-during-proces.patch

Re: Logical replication timeout problem

2022-11-04 Thread Fabrice Chapuis

Hello Wang,
I tested the draft patch in my lab for Postgres 14.4, the refresh of the
materialized view ran without generating the timeout on the worker.
Do you plan to propose this patch at the next commit fest.

Regards,
Fabrice

On Wed, Oct 19, 2022 at 10:15 AM wangw.f...@fujitsu.com <
wangw.f...@fujitsu.com> wrote:

> On Tue, Oct 18, 2022 at 22:35 PM Fabrice Chapuis 
> wrote:
> > Hello Amit,
> >
> > In version 14.4 the timeout problem for logical replication happens
> again despite
> > the patch provided for this issue in this version. When bulky
> materialized views
> > are reloaded it broke logical replication. It is possible to solve this
> problem by
> > using your new "streaming" option.
> > Have you ever had this issue reported to you?
> >
> > Regards
> >
> > Fabrice
> >
> > 2022-10-10 17:19:02 CEST [538424]: [17-1]
> > user=postgres,db=dbxxxa00,client=[local] CONTEXT:  SQL statement "REFRESH
> > MATERIALIZED VIEW sxxxa00.table_base"
> > PL/pgSQL function refresh_materialized_view(text) line 5 at
> EXECUTE
> > 2022-10-10 17:19:02 CEST [538424]: [18-1]
> > user=postgres,db=dbxxxa00,client=[local] STATEMENT:  select
> > refresh_materialized_view('sxxxa00.table_base');
> > 2022-10-10 17:19:02 CEST [538424]: [19-1]
> > user=postgres,db=dbxxxa00,client=[local] LOG:  duration: 264815.652
> > ms  statement: select refresh_materialized_view('sxxxa00.table_base');
> > 2022-10-10 17:19:27 CEST [559156]: [1-1] user=,db=,client= LOG:
> automatic
> > vacuum of table "dbxxxa00.sxxxa00.table_base": index scans: 0
> > pages: 0 removed, 296589 remain, 0 skipped due to pins, 0
> skipped frozen
> > tuples: 0 removed, 48472622 remain, 0 are dead but not yet
> removable,
> > oldest xmin: 1501528
> > index scan not needed: 0 pages from table (0.00% of total) had 0
> dead item
> > identifiers removed
> > I/O timings: read: 1.494 ms, write: 0.000 ms
> > avg read rate: 0.028 MB/s, avg write rate: 107.952 MB/s
> > buffer usage: 593301 hits, 77 misses, 294605 dirtied
> > WAL usage: 296644 records, 46119 full page images, 173652718
> bytes
> > system usage: CPU: user: 17.26 s, system: 0.29 s, elapsed: 21.32
> s
> > 2022-10-10 17:19:28 CEST [559156]: [2-1] user=,db=,client= LOG:
> automatic
> > analyze of table "dbxxxa00.sxxxa00.table_base"
> > I/O timings: read: 0.043 ms, write: 0.000 ms
> > avg read rate: 0.026 MB/s, avg write rate: 0.026 MB/s
> > buffer usage: 30308 hits, 2 misses, 2 dirtied
> > system usage: CPU: user: 0.54 s, system: 0.00 s, elapsed: 0.59 s
> > 2022-10-10 17:19:34 CEST [3898111]: [6840-1] user=,db=,client= LOG:
> checkpoint
> > complete: wrote 1194 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0
> recycled;
> > write=269.551 s, sync=0.002 s, total=269.560 s; sync files=251,
> longest=0.00
> > 1 s, average=0.001 s; distance=583790 kB, estimate=583790 kB
> > 2022-10-10 17:20:02 CEST [716163]: [2-1] user=,db=,client= ERROR:
> terminating
> > logical replication worker due to timeout
> > 2022-10-10 17:20:02 CEST [3897921]: [13-1] user=,db=,client= LOG:
> background
> > worker "logical replication worker" (PID 716163) exited with exit code 1
> > 2022-10-10 17:20:02 CEST [561346]: [1-1] user=,db=,client= LOG:  logical
> > replication apply worker for subscription "subxxx_sxxxa00" has started
>
> Thanks for reporting!
>
> There is one thing I want to confirm:
> Is the statement `select refresh_materialized_view('sxxxa00.table_base');`
> executed on the publisher-side?
>
> If so, I think the reason for this timeout problem could be that during DDL
> (`REFRESH MATERIALIZED VIEW`), lots of temporary data is generated due to
> rewrite. Since these temporary data will not be processed by the pgoutput
> plugin, our previous fix for DML had no impact on this case.
> I think setting "streaming" option to "on" could work around this problem.
>
> I tried to write a draft patch (see attachment) on REL_14_4 to fix this.
> I tried it locally and it seems to work.
> Could you please confirm whether this problem is fixed after applying this
> draft patch?
>
> If this draft patch works, I will improve it and try to fix this problem.
>
> Regards,
> Wang wei
>

RE: Logical replication timeout problem

2022-10-20 Thread wangw.f...@fujitsu.com

On Thurs, Oct 20, 2022 at 13:47 PM Fabrice Chapuis  
wrote:
> Yes the refresh of MV is on the Publisher Side.
> Thanks for your draft patch, I'll try it
> I'll back to you as soonas possible

Thanks a lot.

> One question: why the refresh of the MV is a DDL not a DML?

Since in the source, the type of command `REFRESH MATERIALIZED VIEW` is
`CMD_UTILITY`, I think this command is DDL (see CmdType in file nodes.h).

BTW, after trying to search for DML in the pg-doc, I found the relevant
description in the below link:
https://www.postgresql.org/docs/devel/logical-replication-publication.html

Regards,
Wang wei

Re: Logical replication timeout problem

2022-10-19 Thread Fabrice Chapuis

Yes the refresh of MV is on the Publisher Side.
Thanks for your draft patch, I'll try it
I'll back to you as soonas possible

One question: why the refresh of the MV is a DDL not a DML?

Regards

Fabrice

On Wed, 19 Oct 2022, 10:15 wangw.f...@fujitsu.com 
wrote:

> On Tue, Oct 18, 2022 at 22:35 PM Fabrice Chapuis 
> wrote:
> > Hello Amit,
> >
> > In version 14.4 the timeout problem for logical replication happens
> again despite
> > the patch provided for this issue in this version. When bulky
> materialized views
> > are reloaded it broke logical replication. It is possible to solve this
> problem by
> > using your new "streaming" option.
> > Have you ever had this issue reported to you?
> >
> > Regards
> >
> > Fabrice
> >
> > 2022-10-10 17:19:02 CEST [538424]: [17-1]
> > user=postgres,db=dbxxxa00,client=[local] CONTEXT:  SQL statement "REFRESH
> > MATERIALIZED VIEW sxxxa00.table_base"
> > PL/pgSQL function refresh_materialized_view(text) line 5 at
> EXECUTE
> > 2022-10-10 17:19:02 CEST [538424]: [18-1]
> > user=postgres,db=dbxxxa00,client=[local] STATEMENT:  select
> > refresh_materialized_view('sxxxa00.table_base');
> > 2022-10-10 17:19:02 CEST [538424]: [19-1]
> > user=postgres,db=dbxxxa00,client=[local] LOG:  duration: 264815.652
> > ms  statement: select refresh_materialized_view('sxxxa00.table_base');
> > 2022-10-10 17:19:27 CEST [559156]: [1-1] user=,db=,client= LOG:
> automatic
> > vacuum of table "dbxxxa00.sxxxa00.table_base": index scans: 0
> > pages: 0 removed, 296589 remain, 0 skipped due to pins, 0
> skipped frozen
> > tuples: 0 removed, 48472622 remain, 0 are dead but not yet
> removable,
> > oldest xmin: 1501528
> > index scan not needed: 0 pages from table (0.00% of total) had 0
> dead item
> > identifiers removed
> > I/O timings: read: 1.494 ms, write: 0.000 ms
> > avg read rate: 0.028 MB/s, avg write rate: 107.952 MB/s
> > buffer usage: 593301 hits, 77 misses, 294605 dirtied
> > WAL usage: 296644 records, 46119 full page images, 173652718
> bytes
> > system usage: CPU: user: 17.26 s, system: 0.29 s, elapsed: 21.32
> s
> > 2022-10-10 17:19:28 CEST [559156]: [2-1] user=,db=,client= LOG:
> automatic
> > analyze of table "dbxxxa00.sxxxa00.table_base"
> > I/O timings: read: 0.043 ms, write: 0.000 ms
> > avg read rate: 0.026 MB/s, avg write rate: 0.026 MB/s
> > buffer usage: 30308 hits, 2 misses, 2 dirtied
> > system usage: CPU: user: 0.54 s, system: 0.00 s, elapsed: 0.59 s
> > 2022-10-10 17:19:34 CEST [3898111]: [6840-1] user=,db=,client= LOG:
> checkpoint
> > complete: wrote 1194 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0
> recycled;
> > write=269.551 s, sync=0.002 s, total=269.560 s; sync files=251,
> longest=0.00
> > 1 s, average=0.001 s; distance=583790 kB, estimate=583790 kB
> > 2022-10-10 17:20:02 CEST [716163]: [2-1] user=,db=,client= ERROR:
> terminating
> > logical replication worker due to timeout
> > 2022-10-10 17:20:02 CEST [3897921]: [13-1] user=,db=,client= LOG:
> background
> > worker "logical replication worker" (PID 716163) exited with exit code 1
> > 2022-10-10 17:20:02 CEST [561346]: [1-1] user=,db=,client= LOG:  logical
> > replication apply worker for subscription "subxxx_sxxxa00" has started
>
> Thanks for reporting!
>
> There is one thing I want to confirm:
> Is the statement `select refresh_materialized_view('sxxxa00.table_base');`
> executed on the publisher-side?
>
> If so, I think the reason for this timeout problem could be that during DDL
> (`REFRESH MATERIALIZED VIEW`), lots of temporary data is generated due to
> rewrite. Since these temporary data will not be processed by the pgoutput
> plugin, our previous fix for DML had no impact on this case.
> I think setting "streaming" option to "on" could work around this problem.
>
> I tried to write a draft patch (see attachment) on REL_14_4 to fix this.
> I tried it locally and it seems to work.
> Could you please confirm whether this problem is fixed after applying this
> draft patch?
>
> If this draft patch works, I will improve it and try to fix this problem.
>
> Regards,
> Wang wei
>

RE: Logical replication timeout problem

2022-10-19 Thread wangw.f...@fujitsu.com

On Tue, Oct 18, 2022 at 22:35 PM Fabrice Chapuis  
wrote:
> Hello Amit,
>
> In version 14.4 the timeout problem for logical replication happens again 
> despite
> the patch provided for this issue in this version. When bulky materialized 
> views
> are reloaded it broke logical replication. It is possible to solve this 
> problem by
> using your new "streaming" option.
> Have you ever had this issue reported to you?
>
> Regards
>
> Fabrice
>
> 2022-10-10 17:19:02 CEST [538424]: [17-1]
> user=postgres,db=dbxxxa00,client=[local] CONTEXT:  SQL statement "REFRESH
> MATERIALIZED VIEW sxxxa00.table_base"
> PL/pgSQL function refresh_materialized_view(text) line 5 at EXECUTE
> 2022-10-10 17:19:02 CEST [538424]: [18-1]
> user=postgres,db=dbxxxa00,client=[local] STATEMENT:  select
> refresh_materialized_view('sxxxa00.table_base');
> 2022-10-10 17:19:02 CEST [538424]: [19-1]
> user=postgres,db=dbxxxa00,client=[local] LOG:  duration: 264815.652
> ms  statement: select refresh_materialized_view('sxxxa00.table_base');
> 2022-10-10 17:19:27 CEST [559156]: [1-1] user=,db=,client= LOG:  automatic
> vacuum of table "dbxxxa00.sxxxa00.table_base": index scans: 0
> pages: 0 removed, 296589 remain, 0 skipped due to pins, 0 skipped 
> frozen
> tuples: 0 removed, 48472622 remain, 0 are dead but not yet removable,
> oldest xmin: 1501528
> index scan not needed: 0 pages from table (0.00% of total) had 0 dead 
> item
> identifiers removed
> I/O timings: read: 1.494 ms, write: 0.000 ms
> avg read rate: 0.028 MB/s, avg write rate: 107.952 MB/s
> buffer usage: 593301 hits, 77 misses, 294605 dirtied
> WAL usage: 296644 records, 46119 full page images, 173652718 bytes
> system usage: CPU: user: 17.26 s, system: 0.29 s, elapsed: 21.32 s
> 2022-10-10 17:19:28 CEST [559156]: [2-1] user=,db=,client= LOG:  automatic
> analyze of table "dbxxxa00.sxxxa00.table_base"
> I/O timings: read: 0.043 ms, write: 0.000 ms
> avg read rate: 0.026 MB/s, avg write rate: 0.026 MB/s
> buffer usage: 30308 hits, 2 misses, 2 dirtied
> system usage: CPU: user: 0.54 s, system: 0.00 s, elapsed: 0.59 s
> 2022-10-10 17:19:34 CEST [3898111]: [6840-1] user=,db=,client= LOG:  
> checkpoint
> complete: wrote 1194 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 
> recycled;
> write=269.551 s, sync=0.002 s, total=269.560 s; sync files=251, longest=0.00
> 1 s, average=0.001 s; distance=583790 kB, estimate=583790 kB
> 2022-10-10 17:20:02 CEST [716163]: [2-1] user=,db=,client= ERROR:  terminating
> logical replication worker due to timeout
> 2022-10-10 17:20:02 CEST [3897921]: [13-1] user=,db=,client= LOG:  background
> worker "logical replication worker" (PID 716163) exited with exit code 1
> 2022-10-10 17:20:02 CEST [561346]: [1-1] user=,db=,client= LOG:  logical
> replication apply worker for subscription "subxxx_sxxxa00" has started

Thanks for reporting!

There is one thing I want to confirm:
Is the statement `select refresh_materialized_view('sxxxa00.table_base');`
executed on the publisher-side?

If so, I think the reason for this timeout problem could be that during DDL
(`REFRESH MATERIALIZED VIEW`), lots of temporary data is generated due to
rewrite. Since these temporary data will not be processed by the pgoutput 
plugin, our previous fix for DML had no impact on this case.
I think setting "streaming" option to "on" could work around this problem.

I tried to write a draft patch (see attachment) on REL_14_4 to fix this.
I tried it locally and it seems to work.
Could you please confirm whether this problem is fixed after applying this
draft patch?

If this draft patch works, I will improve it and try to fix this problem.

Regards,
Wang wei


0001-draft-for-REL_14_4.patch
Description: 0001-draft-for-REL_14_4.patch

Re: Logical replication timeout problem

2022-10-18 Thread Fabrice Chapuis

Hello Amit,

In version 14.4 the timeout problem for logical replication happens again
despite the patch provided for this issue in this version. When bulky
materialized views are reloaded it broke logical replication. It is
possible to solve this problem by using your new "streaming" option.
Have you ever had this issue reported to you?

Regards

Fabrice

2022-10-10 17:19:02 CEST [538424]: [17-1]
user=postgres,db=dbxxxa00,client=[local] CONTEXT:  SQL statement "REFRESH
MATERIALIZED VIEW sxxxa00.table_base"
PL/pgSQL function refresh_materialized_view(text) line 5 at EXECUTE
2022-10-10 17:19:02 CEST [538424]: [18-1]
user=postgres,db=dbxxxa00,client=[local] STATEMENT:  select
refresh_materialized_view('sxxxa00.table_base');
2022-10-10 17:19:02 CEST [538424]: [19-1]
user=postgres,db=dbxxxa00,client=[local] LOG:  duration: 264815.652 ms
statement: select refresh_materialized_view('sxxxa00.table_base');
2022-10-10 17:19:27 CEST [559156]: [1-1] user=,db=,client= LOG:  automatic
vacuum of table "dbxxxa00.sxxxa00.table_base": index scans: 0
pages: 0 removed, 296589 remain, 0 skipped due to pins, 0 skipped
frozen
tuples: 0 removed, 48472622 remain, 0 are dead but not yet
removable, oldest xmin: 1501528
index scan not needed: 0 pages from table (0.00% of total) had 0
dead item identifiers removed
I/O timings: read: 1.494 ms, write: 0.000 ms
avg read rate: 0.028 MB/s, avg write rate: 107.952 MB/s
buffer usage: 593301 hits, 77 misses, 294605 dirtied
WAL usage: 296644 records, 46119 full page images, 173652718 bytes
system usage: CPU: user: 17.26 s, system: 0.29 s, elapsed: 21.32 s
2022-10-10 17:19:28 CEST [559156]: [2-1] user=,db=,client= LOG:  automatic
analyze of table "dbxxxa00.sxxxa00.table_base"
I/O timings: read: 0.043 ms, write: 0.000 ms
avg read rate: 0.026 MB/s, avg write rate: 0.026 MB/s
buffer usage: 30308 hits, 2 misses, 2 dirtied
system usage: CPU: user: 0.54 s, system: 0.00 s, elapsed: 0.59 s
2022-10-10 17:19:34 CEST [3898111]: [6840-1] user=,db=,client= LOG:
checkpoint complete: wrote 1194 buffers (0.0%); 0 WAL file(s) added, 0
removed, 0 recycled; write=269.551 s, sync=0.002 s, total=269.560 s; sync
files=251, longest=0.00
1 s, average=0.001 s; distance=583790 kB, estimate=583790 kB
2022-10-10 17:20:02 CEST [716163]: [2-1] user=,db=,client= ERROR:
terminating logical replication worker due to timeout
2022-10-10 17:20:02 CEST [3897921]: [13-1] user=,db=,client= LOG:
background worker "logical replication worker" (PID 716163) exited with
exit code 1
2022-10-10 17:20:02 CEST [561346]: [1-1] user=,db=,client= LOG:  logical
replication apply worker for subscription "subxxx_sxxxa00" has started

On Fri, Apr 1, 2022 at 6:09 AM Amit Kapila  wrote:

> On Fri, Apr 1, 2022 at 8:28 AM Euler Taveira  wrote:
> >
> > On Thu, Mar 31, 2022, at 11:27 PM, Amit Kapila wrote:
> >
> > This is exactly our initial analysis and we have tried a patch on
> > these lines and it has a noticeable overhead. See [1]. Calling this
> > for each change or each skipped change can bring noticeable overhead
> > that is why we decided to call it after a certain threshold (100) of
> > skipped changes. Now, surely as mentioned in my previous reply we can
> > make it generic such that instead of calling this (update_progress
> > function as in the patch) for skipped cases, we call it always. Will
> > that make it better?
> >
> > That's what I have in mind but using a different approach.
> >
> > > The functions CreateInitDecodingContext and CreateDecodingContext
> receives the
> > > update_progress function as a parameter. These functions are called in
> 2
> > > places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT)
> and (b)
> > > SQL logical decoding functions (pg_logical_*_changes). Case (a) uses
> > > WalSndUpdateProgress as a progress function. Case (b) does not have
> one because
> > > it is not required -- local decoding/communication. There is no custom
> update
> > > progress routine for each output plugin which leads me to the question:
> > > couldn't we encapsulate the update progress call into the callback
> functions?
> > >
> >
> > Sorry, I don't get your point. What exactly do you mean by this?
> > AFAIS, currently we call this output plugin API in pgoutput functions
> > only, do you intend to get it invoked from a different place?
> >
> > It seems I didn't make myself clear. The callbacks I'm referring to the
> > *_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a
> > *_cb_wrapper() function, we have something like:
> >
> > if (ctx->progress & PGOUTPUT_PROGRESS_FOO)
> > NewUpdateProgress(ctx, false);
> >
> > The NewUpdateProgress function would contain a logic similar to the
> > update_progress() from the proposed patch. (A different function name
> here just
> > to avoid confusion.)
> >
> > The output plugin is responsible to set ctx->progress with the callback
> > variables (for example,

Re: Logical replication timeout problem

2022-05-11 Thread Amit Kapila

On Mon, May 9, 2022 at 2:17 PM Masahiko Sawada  wrote:
>
> The patches look good to me too.
>

Pushed.

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2022-05-09 Thread wangw.f...@fujitsu.com

On Mon, May 9, 2022 at 11:23 AM Amit Kapila  wrote:
> On Mon, May 9, 2022 at 7:01 PM Euler Taveira  wrote:
> >
> > On Mon, May 9, 2022, at 3:47 AM, Amit Kapila wrote:
> >
> > Thanks. The patch LGTM. I'll push and back-patch this after the
> > current minor release is done unless there are more comments related
> > to this work.
> > ..
> > Does this issue deserve a test? A small wal_receiver_timeout. Although, I'm
> not
> > sure how stable the test will be.
> >
> 
> Yes, the main part is how to write a stable test because estimating
> how many changes are enough for the configured wal_receiver_timeout to
> pass on all the buildfarm machines is tricky. Also, I am not sure how
> important is to test this behavior because based on this theory we
> should have tests for all kinds of timeouts.
Yse, I think we could not guarantee the stability of this test.
In addition, if we set wal_receiver_timeout too small, it may cause timeout
unrelated to this bug. And if we set bigger wal_receiver_timeout and use larger
transaction in order to minimize the impact of machine performance, I think
this might take some time and might risk making the build farm slow.

Regards,
Wang wei

Re: Logical replication timeout problem

2022-05-09 Thread Amit Kapila

On Mon, May 9, 2022 at 7:01 PM Euler Taveira  wrote:
>
> On Mon, May 9, 2022, at 3:47 AM, Amit Kapila wrote:
>
> Thanks. The patch LGTM. I'll push and back-patch this after the
> current minor release is done unless there are more comments related
> to this work.
>
> Looks sane to me. (I only tested the HEAD version)
>
> +   boolend_xact = ctx->end_xact;
>
> Do you really need a new variable here? It has the same name and the new one
> isn't changed during the execution.
>

I think both ways should be okay. I thought the proposed way is okay
because it is used in more than one place and is probably slightly
easier to follow by having a separate variable.

> Does this issue deserve a test? A small wal_receiver_timeout. Although, I'm 
> not
> sure how stable the test will be.
>

Yes, the main part is how to write a stable test because estimating
how many changes are enough for the configured wal_receiver_timeout to
pass on all the buildfarm machines is tricky. Also, I am not sure how
important is to test this behavior because based on this theory we
should have tests for all kinds of timeouts.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-05-09 Thread Euler Taveira

On Mon, May 9, 2022, at 3:47 AM, Amit Kapila wrote:
> Thanks. The patch LGTM. I'll push and back-patch this after the
> current minor release is done unless there are more comments related
> to this work.
Looks sane to me. (I only tested the HEAD version)

+   boolend_xact = ctx->end_xact;

Do you really need a new variable here? It has the same name and the new one
isn't changed during the execution.

Does this issue deserve a test? A small wal_receiver_timeout. Although, I'm not
sure how stable the test will be.

--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-05-09 Thread Masahiko Sawada

On Mon, May 9, 2022 at 3:47 PM Amit Kapila  wrote:
>
> On Fri, May 6, 2022 at 12:42 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Fri, May 6, 2022 at 9:54 AM Masahiko Sawada  
> > wrote:
> > > On Wed, May 4, 2022 at 7:18 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada 
> > > wrote:
> > > > >
> > > > > On Mon, May 2, 2022 at 11:32 AM Amit Kapila 
> > > wrote:
> > > > > >
> > > > > >
> > > > > > So, shall we go back to the previous approach of using a separate
> > > > > > function update_replication_progress?
> > > > >
> > > > > Ok, agreed.
> > > > >
> > > >
> > > > Attached, please find the updated patch accordingly. Currently, I have
> > > > prepared it for HEAD, if you don't see any problem with this, we can
> > > > prepare the back-branch patches based on this.
> > >
> > > Thank you for updating the patch. Looks good to me.
> > Thanks for your review.
> >
> > Improve the back-branch patches according to the discussion.
> >
>

The patches look good to me too.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-05-08 Thread Amit Kapila

On Fri, May 6, 2022 at 12:42 PM wangw.f...@fujitsu.com
 wrote:
>
> On Fri, May 6, 2022 at 9:54 AM Masahiko Sawada  wrote:
> > On Wed, May 4, 2022 at 7:18 PM Amit Kapila  wrote:
> > >
> > > On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada 
> > wrote:
> > > >
> > > > On Mon, May 2, 2022 at 11:32 AM Amit Kapila 
> > wrote:
> > > > >
> > > > >
> > > > > So, shall we go back to the previous approach of using a separate
> > > > > function update_replication_progress?
> > > >
> > > > Ok, agreed.
> > > >
> > >
> > > Attached, please find the updated patch accordingly. Currently, I have
> > > prepared it for HEAD, if you don't see any problem with this, we can
> > > prepare the back-branch patches based on this.
> >
> > Thank you for updating the patch. Looks good to me.
> Thanks for your review.
>
> Improve the back-branch patches according to the discussion.
>

Thanks. The patch LGTM. I'll push and back-patch this after the
current minor release is done unless there are more comments related
to this work.

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2022-05-06 Thread wangw.f...@fujitsu.com

On Fri, May 6, 2022 at 9:54 AM Masahiko Sawada  wrote:
> On Wed, May 4, 2022 at 7:18 PM Amit Kapila  wrote:
> >
> > On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada 
> wrote:
> > >
> > > On Mon, May 2, 2022 at 11:32 AM Amit Kapila 
> wrote:
> > > >
> > > >
> > > > So, shall we go back to the previous approach of using a separate
> > > > function update_replication_progress?
> > >
> > > Ok, agreed.
> > >
> >
> > Attached, please find the updated patch accordingly. Currently, I have
> > prepared it for HEAD, if you don't see any problem with this, we can
> > prepare the back-branch patches based on this.
> 
> Thank you for updating the patch. Looks good to me.
Thanks for your review.

Improve the back-branch patches according to the discussion.
Move the CHANGES_THRESHOLD logic from function OutputPluginUpdateProgress to
new funcion update_replication_progress.
In addition, improve all patches formatting with pgindent.

Attach the patches.

Regards,
Wang wei


HEAD_v21-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  HEAD_v21-0001-Fix-the-logical-replication-timeout-during-large.patch


REL14_v4-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL14_v4-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL13_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL13_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL12_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL12_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL11_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL11_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL10_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL10_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch

Re: Logical replication timeout problem

2022-05-05 Thread Masahiko Sawada

On Wed, May 4, 2022 at 7:18 PM Amit Kapila  wrote:
>
> On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada  wrote:
> >
> > On Mon, May 2, 2022 at 11:32 AM Amit Kapila  wrote:
> > >
> > >
> > > So, shall we go back to the previous approach of using a separate
> > > function update_replication_progress?
> >
> > Ok, agreed.
> >
>
> Attached, please find the updated patch accordingly. Currently, I have
> prepared it for HEAD, if you don't see any problem with this, we can
> prepare the back-branch patches based on this.

Thank you for updating the patch. Looks good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-05-04 Thread Amit Kapila

On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada  wrote:
>
> On Mon, May 2, 2022 at 11:32 AM Amit Kapila  wrote:
> >
> >
> > So, shall we go back to the previous approach of using a separate
> > function update_replication_progress?
>
> Ok, agreed.
>

Attached, please find the updated patch accordingly. Currently, I have
prepared it for HEAD, if you don't see any problem with this, we can
prepare the back-branch patches based on this.

-- 
With Regards,
Amit Kapila.

v20-0001-Fix-the-logical-replication-timeout-during-large.patch
Description: Binary data

Re: Logical replication timeout problem

2022-05-01 Thread Masahiko Sawada

On Mon, May 2, 2022 at 11:32 AM Amit Kapila  wrote:
>
> On Mon, May 2, 2022 at 7:33 AM Masahiko Sawada  wrote:
> > On Thu, Apr 28, 2022 at 7:01 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > Hi Sawada-san, Wang
> > >
> > > I was looking at the patch and noticed that we moved some logic from
> > > update_replication_progress() to OutputPluginUpdateProgress() like
> > > your suggestion.
> > >
> > > I have a question about this change. In the patch we added some
> > > restriction in this function OutputPluginUpdateProgress() like below:
> > >
> > > + /*
> > > + * If we are at the end of transaction LSN, update progress tracking.
> > > + * Otherwise, after continuously processing CHANGES_THRESHOLD changes, we
> > > + * try to send a keepalive message if required.
> > > + */
> > > + if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD)
> > > + {
> > > + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid,
> > > + skipped_xact);
> > > + changes_count = 0;
> > > + }
> > >
> > > After the patch, we won't be able to always invoke the update_progress() 
> > > if the
> > > caller are at the middle of transaction(e.g. end_xact = false). The 
> > > bebavior of the
> > > public function OutputPluginUpdateProgress() is changed. I am thinking is 
> > > it ok to
> > > change this at back-branches ?
> > >
> > > Because OutputPluginUpdateProgress() is a public function for the 
> > > extension
> > > developer, just a little concerned about the behavior change here.
> >
> > Good point.
> >
> > As you pointed out, it would not be good if we change the behavior of
> > OutputPluginUpdateProgress() in back branches. Also, after more
> > thought, it is not a good idea even for HEAD since there might be
> > background workers that use logical decoding and the timeout checking
> > might not be relevant at all with them.
> >
>
> So, shall we go back to the previous approach of using a separate
> function update_replication_progress?

Ok, agreed.

>
> > BTW, I think you're concerned about the plugins that call
> > OutputPluginUpdateProgress() at the middle of the transaction (i.e.,
> > end_xact = false). We have the following change that makes the
> > walsender not update the progress at the middle of the transaction. Do
> > you think it is okay since it's not common usage to call
> > OutputPluginUpdateProgress() at the middle of the transaction by the
> > plugin that is used by the walsender?
> >
>
> We have done that purposefully as otherwise, the lag tracker shows
> incorrect information. See email [1]. The reason is that we always get
> ack from subscribers for transaction end. Also, prior to this patch we
> never call the lag tracker recording apart from the transaction end,
> so as a bug fix we shouldn't try to change it.

Make sense.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-05-01 Thread Amit Kapila

On Mon, May 2, 2022 at 7:33 AM Masahiko Sawada  wrote:
> On Thu, Apr 28, 2022 at 7:01 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > Hi Sawada-san, Wang
> >
> > I was looking at the patch and noticed that we moved some logic from
> > update_replication_progress() to OutputPluginUpdateProgress() like
> > your suggestion.
> >
> > I have a question about this change. In the patch we added some
> > restriction in this function OutputPluginUpdateProgress() like below:
> >
> > + /*
> > + * If we are at the end of transaction LSN, update progress tracking.
> > + * Otherwise, after continuously processing CHANGES_THRESHOLD changes, we
> > + * try to send a keepalive message if required.
> > + */
> > + if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD)
> > + {
> > + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid,
> > + skipped_xact);
> > + changes_count = 0;
> > + }
> >
> > After the patch, we won't be able to always invoke the update_progress() if 
> > the
> > caller are at the middle of transaction(e.g. end_xact = false). The 
> > bebavior of the
> > public function OutputPluginUpdateProgress() is changed. I am thinking is 
> > it ok to
> > change this at back-branches ?
> >
> > Because OutputPluginUpdateProgress() is a public function for the extension
> > developer, just a little concerned about the behavior change here.
>
> Good point.
>
> As you pointed out, it would not be good if we change the behavior of
> OutputPluginUpdateProgress() in back branches. Also, after more
> thought, it is not a good idea even for HEAD since there might be
> background workers that use logical decoding and the timeout checking
> might not be relevant at all with them.
>

So, shall we go back to the previous approach of using a separate
function update_replication_progress?

> BTW, I think you're concerned about the plugins that call
> OutputPluginUpdateProgress() at the middle of the transaction (i.e.,
> end_xact = false). We have the following change that makes the
> walsender not update the progress at the middle of the transaction. Do
> you think it is okay since it's not common usage to call
> OutputPluginUpdateProgress() at the middle of the transaction by the
> plugin that is used by the walsender?
>

We have done that purposefully as otherwise, the lag tracker shows
incorrect information. See email [1]. The reason is that we always get
ack from subscribers for transaction end. Also, prior to this patch we
never call the lag tracker recording apart from the transaction end,
so as a bug fix we shouldn't try to change it.

[1] - 
https://www.postgresql.org/message-id/OS3PR01MB62755D216245199554DDC8DB9EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-05-01 Thread Masahiko Sawada

On Thu, Apr 28, 2022 at 7:01 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, April 20, 2022 3:21 PM Masahiko Sawada  
> wrote:
> >
> > BTW the changes in
> > REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch,
> > adding end_xact to LogicalDecodingContext, seems good to me and it
> > might be better than the approach of v17 patch from plugin developers’
> > perspective? This is because they won’t need to pass true/false to
> > end_xact of  OutputPluginUpdateProgress(). Furthermore, if we do what
> > we do in update_replication_progress() in
> > OutputPluginUpdateProgress(), what plugins need to do will be just to
> > call OutputPluginUpdate() in every callback and they don't need to
> > have the CHANGES_THRESHOLD logic. What do you think?
>
> Hi Sawada-san, Wang
>
> I was looking at the patch and noticed that we moved some logic from
> update_replication_progress() to OutputPluginUpdateProgress() like
> your suggestion.
>
> I have a question about this change. In the patch we added some
> restriction in this function OutputPluginUpdateProgress() like below:
>
> + /*
> + * If we are at the end of transaction LSN, update progress tracking.
> + * Otherwise, after continuously processing CHANGES_THRESHOLD changes, we
> + * try to send a keepalive message if required.
> + */
> + if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD)
> + {
> + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid,
> + skipped_xact);
> + changes_count = 0;
> + }
>
> After the patch, we won't be able to always invoke the update_progress() if 
> the
> caller are at the middle of transaction(e.g. end_xact = false). The bebavior 
> of the
> public function OutputPluginUpdateProgress() is changed. I am thinking is it 
> ok to
> change this at back-branches ?
>
> Because OutputPluginUpdateProgress() is a public function for the extension
> developer, just a little concerned about the behavior change here.

Good point.

As you pointed out, it would not be good if we change the behavior of
OutputPluginUpdateProgress() in back branches. Also, after more
thought, it is not a good idea even for HEAD since there might be
background workers that use logical decoding and the timeout checking
might not be relevant at all with them.

BTW, I think you're concerned about the plugins that call
OutputPluginUpdateProgress() at the middle of the transaction (i.e.,
end_xact = false). We have the following change that makes the
walsender not update the progress at the middle of the transaction. Do
you think it is okay since it's not common usage to call
OutputPluginUpdateProgress() at the middle of the transaction by the
plugin that is used by the walsender?

 #define WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS 1000
- if (!TimestampDifferenceExceeds(sendTime, now,
+ if (end_xact && TimestampDifferenceExceeds(sendTime, now,
  WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS))
- return;
+ {
+ LagTrackerWrite(lsn, now);
+ sendTime = now;
+ }

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Logical replication timeout problem

2022-04-28 Thread wangw.f...@fujitsu.com

On Thur, Apr 28, 2022 at 6:26 PM Amit Kapila  wrote:
> On Thu, Apr 21, 2022 at 3:21 PM wangw.f...@fujitsu.com
>  wrote:
> >
> 
> I think it is better to keep the new variable 'end_xact' at the end of
> the struct where it belongs for HEAD. In back branches, we can keep it
> at the place as you have. Apart from that, I have made some cosmetic
> changes and changed a few comments in the attached. Let's use this to
> prepare patches for back-branches.
Thanks for your review and improvement.

I improved the back-branch patches according to your modifications.
Attach the back-branch patches for REL10~REL14.
(Also attach the patch for HEAD, I did not make any changes to this patch.)

BTW, I found Hou-san shared some points. After our discussion, I will update
the patches if required.

Regards,
Wang wei


HEAD_v19-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  HEAD_v19-0001-Fix-the-logical-replication-timeout-during-large.patch


REL14_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL14_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL13_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL13_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL12-REL11_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL12-REL11_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL10_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL10_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch

Re: Logical replication timeout problem

2022-04-28 Thread Amit Kapila

On Thu, Apr 21, 2022 at 3:21 PM wangw.f...@fujitsu.com
 wrote:
>

I think it is better to keep the new variable 'end_xact' at the end of
the struct where it belongs for HEAD. In back branches, we can keep it
at the place as you have. Apart from that, I have made some cosmetic
changes and changed a few comments in the attached. Let's use this to
prepare patches for back-branches.

-- 
With Regards,
Amit Kapila.


v19-0001-Fix-the-logical-replication-timeout-during-large.patch
Description: Binary data

RE: Logical replication timeout problem

2022-04-28 Thread houzj.f...@fujitsu.com

On Wednesday, April 20, 2022 3:21 PM Masahiko Sawada  
wrote:
> 
> BTW the changes in
> REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch,
> adding end_xact to LogicalDecodingContext, seems good to me and it
> might be better than the approach of v17 patch from plugin developers’
> perspective? This is because they won’t need to pass true/false to
> end_xact of  OutputPluginUpdateProgress(). Furthermore, if we do what
> we do in update_replication_progress() in
> OutputPluginUpdateProgress(), what plugins need to do will be just to
> call OutputPluginUpdate() in every callback and they don't need to
> have the CHANGES_THRESHOLD logic. What do you think?

Hi Sawada-san, Wang

I was looking at the patch and noticed that we moved some logic from
update_replication_progress() to OutputPluginUpdateProgress() like
your suggestion.

I have a question about this change. In the patch we added some
restriction in this function OutputPluginUpdateProgress() like below:

+ /*
+ * If we are at the end of transaction LSN, update progress tracking.
+ * Otherwise, after continuously processing CHANGES_THRESHOLD changes, we
+ * try to send a keepalive message if required.
+ */
+ if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD)
+ {
+ ctx->update_progress(ctx, ctx->write_location, ctx->write_xid,
+ skipped_xact);
+ changes_count = 0;
+ }

After the patch, we won't be able to always invoke the update_progress() if the
caller are at the middle of transaction(e.g. end_xact = false). The bebavior of 
the
public function OutputPluginUpdateProgress() is changed. I am thinking is it ok 
to
change this at back-branches ?

Because OutputPluginUpdateProgress() is a public function for the extension
developer, just a little concerned about the behavior change here.

Besides, the check of 'end_xact' and the 'CHANGES_THRESHOLD' seems specified to
the pgoutput. I am not very sure that if plugin author also needs these
logic(they might want to change the strategy), so I'd like to confirm it with
you.

Best regards,
Hou zj

RE: Logical replication timeout problem

2022-04-21 Thread wangw.f...@fujitsu.com

On Wed, Apr 21, 2022 at 10:15 AM I wrote:
> The comments by Sawada-San sound reasonable to me.
> After doing check, I found that padding in HEAD is the same as in REL14.
> So I change the approach of patch for HEAD just like the patch for REL14.

Also attach the back-branch patches for REL10~REL13.
(REL12 and REL11 patch are the same, so only post one patch for these two
branches.)

The patch for HEAD:
HEAD_v18-0001-Fix-the-logical-replication-timeout-during-large.patch
The patch for REL14:
REL14_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch
The patch for REL13:
REL13_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch
The patch for REL12 and REL11:
REL12-REL11_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch
The patch for REL10:
REL10_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch

BTW, after doing check, I found that padding in REL11~REL13 are similar as HEAD
and REL14 (7 bytes padding after fast_forward). But in REL10, the padding is
different. There are three parts padding behind the following member variables:
- 4 bytes after options
- 6 bytes after prepared_write
- 4 bytes after write_xid
So, in the patches for branches REL11~HEAD, I add the new variable after
fast_forward. In the patch for branch REL10, I add the new variable after
prepared_write.
For each version, the size of the structure does not change after applying the
patch.

Regards,
Wang wei


HEAD_v18-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  HEAD_v18-0001-Fix-the-logical-replication-timeout-during-large.patch


REL14_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL14_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL13_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL13_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL12-REL11_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL12-REL11_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL10_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL10_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch

Re: Logical replication timeout problem

2022-04-20 Thread Masahiko Sawada

On Thu, Apr 21, 2022 at 11:19 AM Amit Kapila  wrote:
>
> On Wed, Apr 20, 2022 at 6:22 PM Masahiko Sawada  wrote:
> >
> > On Wed, Apr 20, 2022 at 7:12 PM Amit Kapila  wrote:
> > >
> >
> > > I think it would
> > > be then better to have it in the same place in HEAD as well?
> >
> > As far as I can see in the v17 patch, which is for HEAD, we don't add
> > a variable to LogicalDecodingContext, but did you refer to another
> > patch?
> >
>
> No, I thought it is better to follow the same approach in HEAD as
> well. Do you see any problem with it?

No, that makes sense to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-04-20 Thread Amit Kapila

On Wed, Apr 20, 2022 at 6:22 PM Masahiko Sawada  wrote:
>
> On Wed, Apr 20, 2022 at 7:12 PM Amit Kapila  wrote:
> >
>
> > I think it would
> > be then better to have it in the same place in HEAD as well?
>
> As far as I can see in the v17 patch, which is for HEAD, we don't add
> a variable to LogicalDecodingContext, but did you refer to another
> patch?
>

No, I thought it is better to follow the same approach in HEAD as
well. Do you see any problem with it?

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2022-04-20 Thread wangw.f...@fujitsu.com

On Wed, Apr 20, 2022 at 6:13 PM Amit Kapila  wrote:
> On Wed, Apr 20, 2022 at 2:38 PM Amit Kapila  wrote:
> >
> > On Wed, Apr 20, 2022 at 12:51 PM Masahiko Sawada
>  wrote:
> > >
> > > On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com
> > >  wrote:
> > > > ```
> > >
> > > I'm concerned that this 4-byte padding at the end of the struct could
> > > depend on platforms (there might be no padding in 32-bit platforms?).
> > >
> >
> > Good point, but ...
> >
> > > It seems to me that it's better to put it after fast_forward where the
> > > new field should fall within the padding space.
> > >
> >
> > Can we add the variable in between the existing variables in the
> > structure in the back branches?
> >
> 
> I think it should be fine if it falls in the padding space. We have
> done similar changes recently in back-branches [1]. I think it would
> be then better to have it in the same place in HEAD as well?
> 
> [1] -
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=10520f4346
> 876aad4941797c2255a21bdac74739
Thanks for your comments.

The comments by Sawada-San sound reasonable to me.
After doing check, I found that padding in HEAD is the same as in REL14.
So I change the approach of patch for HEAD just like the patch for REL14.

On Wed, Apr 20, 2022 at 3:21 PM Masahiko Sawada  wrote:
> I'm concerned that this 4-byte padding at the end of the struct could
> depend on platforms (there might be no padding in 32-bit platforms?).
> It seems to me that it's better to put it after fast_forward where the
> new field should fall within the padding space.
Fixed. Add new variable after fast_forward.

> BTW the changes in
> REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch,
> adding end_xact to LogicalDecodingContext, seems good to me and it
> might be better than the approach of v17 patch from plugin developers’
> perspective? This is because they won’t need to pass true/false to
> end_xact of  OutputPluginUpdateProgress(). Furthermore, if we do what
> we do in update_replication_progress() in
> OutputPluginUpdateProgress(), what plugins need to do will be just to
> call OutputPluginUpdate() in every callback and they don't need to
> have the CHANGES_THRESHOLD logic. What do you think?
Change the approach of patch for HEAD. (The size of structure does not change.)
Also move the logical of function update_replication_progress to function
OutputPluginUpdateProgress.

Attach the patches. [suggestion by Sawada-San]
1. Change the position of the new variable in structure.
2. Change the approach of the patch for HEAD.
3. Delete the new function update_replication_progress.

Regards,
Wang wei


HEAD_v18-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  HEAD_v18-0001-Fix-the-logical-replication-timeout-during-large.patch


REL14_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL14_v2-0001-Fix-the-logical-replication-timeout-during-large-.patch

Re: Logical replication timeout problem

2022-04-20 Thread Masahiko Sawada

On Wed, Apr 20, 2022 at 7:12 PM Amit Kapila  wrote:
>
> On Wed, Apr 20, 2022 at 2:38 PM Amit Kapila  wrote:
> >
> > On Wed, Apr 20, 2022 at 12:51 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com
> > >  wrote:
> > > > ```
> > >
> > > I'm concerned that this 4-byte padding at the end of the struct could
> > > depend on platforms (there might be no padding in 32-bit platforms?).
> > >
> >
> > Good point, but ...
> >
> > > It seems to me that it's better to put it after fast_forward where the
> > > new field should fall within the padding space.
> > >
> >
> > Can we add the variable in between the existing variables in the
> > structure in the back branches?
> >
>
> I think it should be fine if it falls in the padding space. We have
> done similar changes recently in back-branches [1].

Yes.

> I think it would
> be then better to have it in the same place in HEAD as well?

As far as I can see in the v17 patch, which is for HEAD, we don't add
a variable to LogicalDecodingContext, but did you refer to another
patch?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-04-20 Thread Amit Kapila

On Wed, Apr 20, 2022 at 2:38 PM Amit Kapila  wrote:
>
> On Wed, Apr 20, 2022 at 12:51 PM Masahiko Sawada  
> wrote:
> >
> > On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com
> >  wrote:
> > > ```
> >
> > I'm concerned that this 4-byte padding at the end of the struct could
> > depend on platforms (there might be no padding in 32-bit platforms?).
> >
>
> Good point, but ...
>
> > It seems to me that it's better to put it after fast_forward where the
> > new field should fall within the padding space.
> >
>
> Can we add the variable in between the existing variables in the
> structure in the back branches?
>

I think it should be fine if it falls in the padding space. We have
done similar changes recently in back-branches [1]. I think it would
be then better to have it in the same place in HEAD as well?

[1] - 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=10520f4346876aad4941797c2255a21bdac74739

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-04-20 Thread Amit Kapila

On Wed, Apr 20, 2022 at 12:51 PM Masahiko Sawada  wrote:
>
> On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com
>  wrote:
> > ```
>
> I'm concerned that this 4-byte padding at the end of the struct could
> depend on platforms (there might be no padding in 32-bit platforms?).
>

Good point, but ...

> It seems to me that it's better to put it after fast_forward where the
> new field should fall within the padding space.
>

Can we add the variable in between the existing variables in the
structure in the back branches? Normally, we add at the end to avoid
any breakage of existing apps. See commit 56e366f675 and discussion at
[1]. That is related to enum but I think we follow the same for
structures.

[1] - 
https://www.postgresql.org/message-id/7dab0929-a966-0c0a-4726-878fced2fe00%40enterprisedb.com
-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-04-20 Thread Masahiko Sawada

On Wed, Apr 20, 2022 at 11:46 AM wangw.f...@fujitsu.com
 wrote:
>
> On Mon, Apr 18, 2022 at 00:36 PM Amit Kapila  wrote:
> > On Mon, Apr 18, 2022 at 9:29 AM Amit Kapila  wrote:
> > >
> > > On Thu, Apr 14, 2022 at 5:52 PM Euler Taveira  wrote:
> > > >
> > > > On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote:
> > > >
> > > > Sawada-San, Euler, do you have any opinion on this approach? I
> > > > personally still prefer the approach implemented in v10 [1]
> > > > especially due to the latest finding by Wang-San that we can't
> > > > update the lag-tracker apart from when it is invoked at the transaction 
> > > > end.
> > > > However, I am fine if we like this approach more.
> > > >
> > > > It seems v15 is simpler and less error prone than v10. v10 has a mix
> > > > of
> > > > OutputPluginUpdateProgress() and the new function update_progress().
> > > > The v10 also calls update_progress() for every change action in
> > > > pgoutput_change(). It is not a good approach for maintainability --
> > > > new changes like sequences need extra calls.
> > > >
> > >
> > > Okay, let's use the v15 approach as Sawada-San also seems to have a
> > > preference for that.
> > >
> > > > However, as you mentioned there should handle the track lag case.
> > > >
> > > > Both patches change the OutputPluginUpdateProgress() so it cannot be
> > > > backpatched. Are you planning to backpatch it? If so, the boolean
> > > > variable (last_write or end_xacts depending of which version you are
> > > > considering) could be added to LogicalDecodingContext.
> > > >
> > >
> > > If we add it to LogicalDecodingContext then I think we have to always
> > > reset the variable after its use which will make it look ugly and
> > > error-prone. I was not thinking to backpatch it because of the API
> > > change but I guess if we want to backpatch then we can add it to
> > > LogicalDecodingContext for back-branches. I am not sure if that will
> > > look committable but surely we can try.
> > >
> >
> > Even, if we want to add the variable in the struct in back-branches, we 
> > need to
> > ensure not to change the size of the struct as it is exposed, see email [1] 
> > for a
> > similar mistake we made in another case.
> >
> > [1] - https://www.postgresql.org/message-
> > id/2358496.1649168259%40sss.pgh.pa.us
> Thanks for your comments.
>
> I did some checks about adding the new variable in LogicalDecodingContext.
> I found that because of padding, if we add the new variable at the end of
> structure, it dose not make the structure size change. I verified this in
> REL_10~REL_14.
>
> So as suggested by Euler-San and Amit-San, I wrote the patch for REL_14. 
> Attach
> this patch. To prevent patch confusion, the patch for HEAD is also attached.
> The patch for REL_14:
> REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch
> The patch for HEAD:
> v17-0001-Fix-the-logical-replication-timeout-during-large.patch
>
> The following is the details of checks.
> On gcc/Linux/x86-64, in REL_14, by looking at the size of each member variable
> in the structure LogicalDecodingContext, I found that there are three parts
> padding behind the following member variables:
> - 7 bytes after fast_forward
> - 4 bytes after prepared_write
> - 4 bytes after write_xid
>
> If we add the new variable at the end of structure (bool takes one byte), it
> means we will only consume one byte of padding after member write_xid. And
> then, at the end of the struct, 3 padding are still required. For easy
> understanding, please refer to the following simple calculation.
> (In REL14, the size of structure LogicalDecodingContext is 304 bytes.)
> Before adding new variable (In REL14):
> 8+8+8+8+8+1+168+8+8+8+8+8+8+8+8+1+1+1+1+8+4  =  ‭289 (if padding is not 
> considered)
>  +7  +4  +4  =  +15 (the padding)
> So, the size of structure LogicalDecodingContext is 289+15=304.
> After adding new variable (In REL14 with patch):
> 8+8+8+8+8+1+168+8+8+8+8+8+8+8+8+1+1+1+1+8+4+1  =  ‭290‬ (if padding is not 
> considered)
>  +7  +4+3  =  +14 (the padding)
> So, the size of structure LogicalDecodingContext is 290+14=304.
>
> BTW, the size of structure LogicalDecodingContext in REL_10~REL_13 is 184, 
> 200,
> 200,200 respectively. And I found that at the end of the structure
> LogicalDecodingContext, there are always the following members:
> ```
> XLogRecPtr  write_location;   --> 8
> TransactionId write_xid;  --> 4
>   --> There are 4 padding after write_xid.
> ```

I'm concerned that this 4-byte padding at the end of the struct could
depend on platforms (there might be no padding in 32-bit platforms?).
It seems to me that it's better to put it after fast_forward where the
new field should fall within the padding space.

BTW the changes in
REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch,
adding end_xact to LogicalDecodingContext, seems good t

RE: Logical replication timeout problem

2022-04-19 Thread wangw.f...@fujitsu.com

On Mon, Apr 18, 2022 at 00:36 PM Amit Kapila  wrote:
> On Mon, Apr 18, 2022 at 9:29 AM Amit Kapila  wrote:
> >
> > On Thu, Apr 14, 2022 at 5:52 PM Euler Taveira  wrote:
> > >
> > > On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote:
> > >
> > > Sawada-San, Euler, do you have any opinion on this approach? I
> > > personally still prefer the approach implemented in v10 [1]
> > > especially due to the latest finding by Wang-San that we can't
> > > update the lag-tracker apart from when it is invoked at the transaction 
> > > end.
> > > However, I am fine if we like this approach more.
> > >
> > > It seems v15 is simpler and less error prone than v10. v10 has a mix
> > > of
> > > OutputPluginUpdateProgress() and the new function update_progress().
> > > The v10 also calls update_progress() for every change action in
> > > pgoutput_change(). It is not a good approach for maintainability --
> > > new changes like sequences need extra calls.
> > >
> >
> > Okay, let's use the v15 approach as Sawada-San also seems to have a
> > preference for that.
> >
> > > However, as you mentioned there should handle the track lag case.
> > >
> > > Both patches change the OutputPluginUpdateProgress() so it cannot be
> > > backpatched. Are you planning to backpatch it? If so, the boolean
> > > variable (last_write or end_xacts depending of which version you are
> > > considering) could be added to LogicalDecodingContext.
> > >
> >
> > If we add it to LogicalDecodingContext then I think we have to always
> > reset the variable after its use which will make it look ugly and
> > error-prone. I was not thinking to backpatch it because of the API
> > change but I guess if we want to backpatch then we can add it to
> > LogicalDecodingContext for back-branches. I am not sure if that will
> > look committable but surely we can try.
> >
> 
> Even, if we want to add the variable in the struct in back-branches, we need 
> to
> ensure not to change the size of the struct as it is exposed, see email [1] 
> for a
> similar mistake we made in another case.
> 
> [1] - https://www.postgresql.org/message-
> id/2358496.1649168259%40sss.pgh.pa.us
Thanks for your comments.

I did some checks about adding the new variable in LogicalDecodingContext.
I found that because of padding, if we add the new variable at the end of
structure, it dose not make the structure size change. I verified this in
REL_10~REL_14.

So as suggested by Euler-San and Amit-San, I wrote the patch for REL_14. Attach
this patch. To prevent patch confusion, the patch for HEAD is also attached.
The patch for REL_14:
REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch
The patch for HEAD:
v17-0001-Fix-the-logical-replication-timeout-during-large.patch

The following is the details of checks.
On gcc/Linux/x86-64, in REL_14, by looking at the size of each member variable
in the structure LogicalDecodingContext, I found that there are three parts
padding behind the following member variables:
- 7 bytes after fast_forward
- 4 bytes after prepared_write
- 4 bytes after write_xid

If we add the new variable at the end of structure (bool takes one byte), it
means we will only consume one byte of padding after member write_xid. And
then, at the end of the struct, 3 padding are still required. For easy
understanding, please refer to the following simple calculation.
(In REL14, the size of structure LogicalDecodingContext is 304 bytes.)
Before adding new variable (In REL14):
8+8+8+8+8+1+168+8+8+8+8+8+8+8+8+1+1+1+1+8+4  =  ‭289 (if padding is not 
considered)
 +7  +4  +4  =  +15 (the padding)
So, the size of structure LogicalDecodingContext is 289+15=304.
After adding new variable (In REL14 with patch):
8+8+8+8+8+1+168+8+8+8+8+8+8+8+8+1+1+1+1+8+4+1  =  ‭290‬ (if padding is not 
considered)
 +7  +4+3  =  +14 (the padding)
So, the size of structure LogicalDecodingContext is 290+14=304.

BTW, the size of structure LogicalDecodingContext in REL_10~REL_13 is 184, 200,
200,200 respectively. And I found that at the end of the structure
LogicalDecodingContext, there are always the following members:
```
XLogRecPtr  write_location;   --> 8
TransactionId write_xid;  --> 4
  --> There are 4 padding after write_xid.
```
It means at the end of structure LogicalDecodingContext, there are 4 bytes
padding. So, if we add a new bool type variable (It takes one byte) at the end
of the structure LogicalDecodingContext, I think in the current REL_10~REL_14,
because of padding, the size of the structure LogicalDecodingContext will not
change.

Regards,
Wang wei


REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch


v17-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v17-0001-Fix-the-logical-replication-timeout-during-large.patch

RE: Logical replication timeout problem

2022-04-18 Thread wangw.f...@fujitsu.com

On Mon, Apr 19, 2022 at 9:32 AM Masahiko Sawada  wrote:
> Thank you for updating the patch.
Thanks for your comments.

> + * For a large transaction, if we don't send any change to the
> + downstream for a
> + * long time(exceeds the wal_receiver_timeout of standby) then it can
> timeout.
> + * This can happen when all or most of the changes are either not
> + published or
> + * got filtered out.
> 
> + */
> + if(end_xact || ++changes_count >= CHANGES_THRESHOLD) {
> 
> We need a whitespace before '(' at above two places. The rest looks good to 
> me.
Fix these.

Attach the new patch.
1. Fix wrong formatting. [suggestion by Sawada-San]

Regards,
Wang wei


v17-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v17-0001-Fix-the-logical-replication-timeout-during-large.patch

Re: Logical replication timeout problem

2022-04-18 Thread Masahiko Sawada

On Mon, Apr 18, 2022 at 3:16 PM wangw.f...@fujitsu.com
 wrote:
>
> On Mon, Apr 18, 2022 at 00:35 PM Masahiko Sawada  
> wrote:
> > On Mon, Apr 18, 2022 at 1:01 PM Amit Kapila  wrote:
> > >
> > > On Thu, Apr 14, 2022 at 5:50 PM Masahiko Sawada 
> > wrote:
> > > >
> > > > On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila 
> > wrote:
> > > > >
> > > > > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
> > > > >  wrote:
> > > > > >
> > > > > > So I skip tracking lag during a transaction just like the current 
> > > > > > HEAD.
> > > > > > Attach the new patch.
> > > > > >
> > > > >
> > > > > Thanks, please find the updated patch where I have slightly
> > > > > modified the comments.
> > > > >
> > > > > Sawada-San, Euler, do you have any opinion on this approach? I
> > > > > personally still prefer the approach implemented in v10 [1]
> > > > > especially due to the latest finding by Wang-San that we can't
> > > > > update the lag-tracker apart from when it is invoked at the 
> > > > > transaction end.
> > > > > However, I am fine if we like this approach more.
> > > >
> > > > Thank you for updating the patch.
> > > >
> > > > The current patch looks much better than v10 which requires to call
> > > > to
> > > > update_progress() every path.
> > > >
> > > > Regarding v15 patch, I'm concerned a bit that the new function name,
> > > > update_progress(), is too generic. How about
> > > > update_replation_progress() or something more specific name?
> > > >
> > >
> > > Do you intend to say update_replication_progress()? The word
> > > 'replation' doesn't make sense to me. I am fine with this suggestion.
> >
> > Yeah, that was a typo. I meant update_replication_progress().
> Thanks for your comments.
>
> > > > Regarding v15 patch, I'm concerned a bit that the new function name,
> > > > update_progress(), is too generic. How about
> > > > update_replation_progress() or something more specific name?
> Improve as suggested. Change the name from update_progress to
> update_replication_progress.
>
> > > > ---
> > > > +if (end_xact)
> > > > +{
> > > > +/* Update progress tracking at xact end. */
> > > > +OutputPluginUpdateProgress(ctx, skipped_xact, 
> > > > end_xact);
> > > > +changes_count = 0;
> > > > +return;
> > > > +}
> > > > +
> > > > +/*
> > > > + * After continuously processing CHANGES_THRESHOLD changes,
> > > > we try to send
> > > > + * a keepalive message if required.
> > > > + *
> > > > + * We don't want to try sending a keepalive message after
> > > > processing each
> > > > + * change as that can have overhead. Testing reveals that 
> > > > there is no
> > > > + * noticeable overhead in doing it after continuously
> > > > processing 100 or so
> > > > + * changes.
> > > > + */
> > > > +#define CHANGES_THRESHOLD 100
> > > > +if (++changes_count >= CHANGES_THRESHOLD)
> > > > +{
> > > > +OutputPluginUpdateProgress(ctx, skipped_xact, 
> > > > end_xact);
> > > > +changes_count = 0;
> > > > +}
> > > >
> > > > Can we merge two if branches since we do the same things? Or did you
> > > > separate them for better readability?
> Improve as suggested. Merge two if-branches.
>
> Attach the new patch.
> 1. Rename the new function(update_progress) to update_replication_progress. 
> [suggestion by Sawada-San]
> 2. Merge two if-branches in new function update_replication_progress. 
> [suggestion by Sawada-San.]
> 3. Improve comments to make them clear. [suggestions by Euler-San.]

Thank you for updating the patch.

+ * For a large transaction, if we don't send any change to the downstream for a
+ * long time(exceeds the wal_receiver_timeout of standby) then it can timeout.
+ * This can happen when all or most of the changes are either not published or
+ * got filtered out.

+ */
+ if(end_xact || ++changes_count >= CHANGES_THRESHOLD)
+ {

We need a whitespace before '(' at above two places. The rest looks good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Logical replication timeout problem

2022-04-17 Thread wangw.f...@fujitsu.com

On Thur, Apr 14, 2022 at 8:21 PM Euler Taveira  wrote:
>
Thanks for your comments.

> + * For a large transaction, if we don't send any change to the downstream 
> for a
> + * long time then it can timeout. This can happen when all or most of the
> + * changes are either not published or got filtered out.
> 
> We should probable mention that "long time" is wal_receiver_timeout on
> subscriber.
Improve as suggested.
Add "(exceeds the wal_receiver_timeout of standby)" to explain what "long time"
means.

> +* change as that can have overhead. Testing reveals that there is no
> +* noticeable overhead in doing it after continuously processing 100 or so
> +* changes.
> 
> Tests revealed that ...
Improve as suggested.

> +* We don't have a mechanism to get the ack for any LSN other than end 
> xact
> +* lsn from the downstream. So, we track lag only for end xact lsn's.
> 
> s/lsn/LSN/ and s/lsn's/LSNs/
> 
> I would say "end of transaction LSN".
Improve as suggested.

> + * If too many changes are processed then try to send a keepalive message to
> + * receiver to avoid timeouts.
> 
> In logical replication, if too many changes are processed then try to send a
> keepalive message. It might avoid a timeout in the subscriber.
Improve as suggested.

Kindly have a look at new patch shared in [1].

[1] - 
https://www.postgresql.org/message-id/OS3PR01MB627561344A2C7ECF68E41D7E9EF39%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei

RE: Logical replication timeout problem

2022-04-17 Thread wangw.f...@fujitsu.com

On Mon, Apr 18, 2022 at 00:35 PM Masahiko Sawada  wrote:
> On Mon, Apr 18, 2022 at 1:01 PM Amit Kapila  wrote:
> >
> > On Thu, Apr 14, 2022 at 5:50 PM Masahiko Sawada 
> wrote:
> > >
> > > On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila 
> wrote:
> > > >
> > > > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
> > > >  wrote:
> > > > >
> > > > > So I skip tracking lag during a transaction just like the current 
> > > > > HEAD.
> > > > > Attach the new patch.
> > > > >
> > > >
> > > > Thanks, please find the updated patch where I have slightly
> > > > modified the comments.
> > > >
> > > > Sawada-San, Euler, do you have any opinion on this approach? I
> > > > personally still prefer the approach implemented in v10 [1]
> > > > especially due to the latest finding by Wang-San that we can't
> > > > update the lag-tracker apart from when it is invoked at the transaction 
> > > > end.
> > > > However, I am fine if we like this approach more.
> > >
> > > Thank you for updating the patch.
> > >
> > > The current patch looks much better than v10 which requires to call
> > > to
> > > update_progress() every path.
> > >
> > > Regarding v15 patch, I'm concerned a bit that the new function name,
> > > update_progress(), is too generic. How about
> > > update_replation_progress() or something more specific name?
> > >
> >
> > Do you intend to say update_replication_progress()? The word
> > 'replation' doesn't make sense to me. I am fine with this suggestion.
> 
> Yeah, that was a typo. I meant update_replication_progress().
Thanks for your comments.

> > > Regarding v15 patch, I'm concerned a bit that the new function name,
> > > update_progress(), is too generic. How about
> > > update_replation_progress() or something more specific name?
Improve as suggested. Change the name from update_progress to
update_replication_progress.

> > > ---
> > > +if (end_xact)
> > > +{
> > > +/* Update progress tracking at xact end. */
> > > +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
> > > +changes_count = 0;
> > > +return;
> > > +}
> > > +
> > > +/*
> > > + * After continuously processing CHANGES_THRESHOLD changes,
> > > we try to send
> > > + * a keepalive message if required.
> > > + *
> > > + * We don't want to try sending a keepalive message after
> > > processing each
> > > + * change as that can have overhead. Testing reveals that there 
> > > is no
> > > + * noticeable overhead in doing it after continuously
> > > processing 100 or so
> > > + * changes.
> > > + */
> > > +#define CHANGES_THRESHOLD 100
> > > +if (++changes_count >= CHANGES_THRESHOLD)
> > > +{
> > > +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
> > > +changes_count = 0;
> > > +}
> > > 
> > > Can we merge two if branches since we do the same things? Or did you
> > > separate them for better readability?
Improve as suggested. Merge two if-branches.

Attach the new patch.
1. Rename the new function(update_progress) to update_replication_progress. 
[suggestion by Sawada-San]
2. Merge two if-branches in new function update_replication_progress. 
[suggestion by Sawada-San.]
3. Improve comments to make them clear. [suggestions by Euler-San.]

Regards,
Wang wei


v16-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v16-0001-Fix-the-logical-replication-timeout-during-large.patch

Re: Logical replication timeout problem

2022-04-17 Thread Amit Kapila

On Mon, Apr 18, 2022 at 9:29 AM Amit Kapila  wrote:
>
> On Thu, Apr 14, 2022 at 5:52 PM Euler Taveira  wrote:
> >
> > On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote:
> >
> > Sawada-San, Euler, do you have any opinion on this approach? I
> > personally still prefer the approach implemented in v10 [1] especially
> > due to the latest finding by Wang-San that we can't update the
> > lag-tracker apart from when it is invoked at the transaction end.
> > However, I am fine if we like this approach more.
> >
> > It seems v15 is simpler and less error prone than v10. v10 has a mix of
> > OutputPluginUpdateProgress() and the new function update_progress(). The v10
> > also calls update_progress() for every change action in pgoutput_change(). 
> > It
> > is not a good approach for maintainability -- new changes like sequences 
> > need
> > extra calls.
> >
>
> Okay, let's use the v15 approach as Sawada-San also seems to have a
> preference for that.
>
> > However, as you mentioned there should handle the track lag case.
> >
> > Both patches change the OutputPluginUpdateProgress() so it cannot be
> > backpatched. Are you planning to backpatch it? If so, the boolean variable
> > (last_write or end_xacts depending of which version you are considering) 
> > could
> > be added to LogicalDecodingContext.
> >
>
> If we add it to LogicalDecodingContext then I think we have to always
> reset the variable after its use which will make it look ugly and
> error-prone. I was not thinking to backpatch it because of the API
> change but I guess if we want to backpatch then we can add it to
> LogicalDecodingContext for back-branches. I am not sure if that will
> look committable but surely we can try.
>

Even, if we want to add the variable in the struct in back-branches,
we need to ensure not to change the size of the struct as it is
exposed, see email [1] for a similar mistake we made in another case.

[1] - https://www.postgresql.org/message-id/2358496.1649168259%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-04-17 Thread Masahiko Sawada

On Mon, Apr 18, 2022 at 1:01 PM Amit Kapila  wrote:
>
> On Thu, Apr 14, 2022 at 5:50 PM Masahiko Sawada  wrote:
> >
> > On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila  wrote:
> > >
> > > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
> > >  wrote:
> > > >
> > > > So I skip tracking lag during a transaction just like the current HEAD.
> > > > Attach the new patch.
> > > >
> > >
> > > Thanks, please find the updated patch where I have slightly modified
> > > the comments.
> > >
> > > Sawada-San, Euler, do you have any opinion on this approach? I
> > > personally still prefer the approach implemented in v10 [1] especially
> > > due to the latest finding by Wang-San that we can't update the
> > > lag-tracker apart from when it is invoked at the transaction end.
> > > However, I am fine if we like this approach more.
> >
> > Thank you for updating the patch.
> >
> > The current patch looks much better than v10 which requires to call to
> > update_progress() every path.
> >
> > Regarding v15 patch, I'm concerned a bit that the new function name,
> > update_progress(), is too generic. How about
> > update_replation_progress() or something more specific name?
> >
>
> Do you intend to say update_replication_progress()? The word
> 'replation' doesn't make sense to me. I am fine with this suggestion.

Yeah, that was a typo. I meant update_replication_progress().

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-04-17 Thread Amit Kapila

On Thu, Apr 14, 2022 at 5:50 PM Masahiko Sawada  wrote:
>
> On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila  wrote:
> >
> > On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
> >  wrote:
> > >
> > > So I skip tracking lag during a transaction just like the current HEAD.
> > > Attach the new patch.
> > >
> >
> > Thanks, please find the updated patch where I have slightly modified
> > the comments.
> >
> > Sawada-San, Euler, do you have any opinion on this approach? I
> > personally still prefer the approach implemented in v10 [1] especially
> > due to the latest finding by Wang-San that we can't update the
> > lag-tracker apart from when it is invoked at the transaction end.
> > However, I am fine if we like this approach more.
>
> Thank you for updating the patch.
>
> The current patch looks much better than v10 which requires to call to
> update_progress() every path.
>
> Regarding v15 patch, I'm concerned a bit that the new function name,
> update_progress(), is too generic. How about
> update_replation_progress() or something more specific name?
>

Do you intend to say update_replication_progress()? The word
'replation' doesn't make sense to me. I am fine with this suggestion.

>
> ---
> +if (end_xact)
> +{
> +/* Update progress tracking at xact end. */
> +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
> +changes_count = 0;
> +return;
> +}
> +
> +/*
> + * After continuously processing CHANGES_THRESHOLD changes,
> we try to send
> + * a keepalive message if required.
> + *
> + * We don't want to try sending a keepalive message after
> processing each
> + * change as that can have overhead. Testing reveals that there is no
> + * noticeable overhead in doing it after continuously
> processing 100 or so
> + * changes.
> + */
> +#define CHANGES_THRESHOLD 100
> +if (++changes_count >= CHANGES_THRESHOLD)
> +{
> +OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
> +changes_count = 0;
> +}
>
> Can we merge two if branches since we do the same things? Or did you
> separate them for better readability?
>

I think it is fine to merge the two checks.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-04-17 Thread Amit Kapila

On Thu, Apr 14, 2022 at 5:52 PM Euler Taveira  wrote:
>
> On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote:
>
> Sawada-San, Euler, do you have any opinion on this approach? I
> personally still prefer the approach implemented in v10 [1] especially
> due to the latest finding by Wang-San that we can't update the
> lag-tracker apart from when it is invoked at the transaction end.
> However, I am fine if we like this approach more.
>
> It seems v15 is simpler and less error prone than v10. v10 has a mix of
> OutputPluginUpdateProgress() and the new function update_progress(). The v10
> also calls update_progress() for every change action in pgoutput_change(). It
> is not a good approach for maintainability -- new changes like sequences need
> extra calls.
>

Okay, let's use the v15 approach as Sawada-San also seems to have a
preference for that.

> However, as you mentioned there should handle the track lag case.
>
> Both patches change the OutputPluginUpdateProgress() so it cannot be
> backpatched. Are you planning to backpatch it? If so, the boolean variable
> (last_write or end_xacts depending of which version you are considering) could
> be added to LogicalDecodingContext.
>

If we add it to LogicalDecodingContext then I think we have to always
reset the variable after its use which will make it look ugly and
error-prone. I was not thinking to backpatch it because of the API
change but I guess if we want to backpatch then we can add it to
LogicalDecodingContext for back-branches. I am not sure if that will
look committable but surely we can try.

> (You should probably consider this approach
> for skipped_xact too)
>

As mentioned, I think it will be more error-prone and we already have
other xact related parameters in that and similar APIs. So, I am not
sure why you want to prefer that?

>
> Does this same issue occur for long transactions? I mean keep a long
> transaction open and execute thousands of transactions.
>

No, this problem won't happen for such cases because we will only try
to send it at the commit time. Note that this problem happens only
when we don't send anything to the subscriber till a timeout happens.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-04-14 Thread Euler Taveira

On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote:
> On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > So I skip tracking lag during a transaction just like the current HEAD.
> > Attach the new patch.
> >
> 
> Thanks, please find the updated patch where I have slightly modified
> the comments.
> 
> Sawada-San, Euler, do you have any opinion on this approach? I
> personally still prefer the approach implemented in v10 [1] especially
> due to the latest finding by Wang-San that we can't update the
> lag-tracker apart from when it is invoked at the transaction end.
> However, I am fine if we like this approach more.
It seems v15 is simpler and less error prone than v10. v10 has a mix of
OutputPluginUpdateProgress() and the new function update_progress(). The v10
also calls update_progress() for every change action in pgoutput_change(). It
is not a good approach for maintainability -- new changes like sequences need
extra calls. However, as you mentioned there should handle the track lag case.

Both patches change the OutputPluginUpdateProgress() so it cannot be
backpatched. Are you planning to backpatch it? If so, the boolean variable
(last_write or end_xacts depending of which version you are considering) could
be added to LogicalDecodingContext. (You should probably consider this approach
for skipped_xact too)

+ * For a large transaction, if we don't send any change to the downstream for a
+ * long time then it can timeout. This can happen when all or most of the
+ * changes are either not published or got filtered out.

We should probable mention that "long time" is wal_receiver_timeout on
subscriber.

+* change as that can have overhead. Testing reveals that there is no
+* noticeable overhead in doing it after continuously processing 100 or so
+* changes.

Tests revealed that ...

+* We don't have a mechanism to get the ack for any LSN other than end xact
+* lsn from the downstream. So, we track lag only for end xact lsn's.

s/lsn/LSN/ and s/lsn's/LSNs/

I would say "end of transaction LSN".

+ * If too many changes are processed then try to send a keepalive message to
+ * receiver to avoid timeouts.

In logical replication, if too many changes are processed then try to send a
keepalive message. It might avoid a timeout in the subscriber.

Does this same issue occur for long transactions? I mean keep a long
transaction open and execute thousands of transactions.

BEGIN;
INSERT INTO foo (a) VALUES(1);
-- wait a few hours while executing 10^x transactions
INSERT INTO foo (a) VALUES(2);
COMMIT;


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-04-14 Thread Masahiko Sawada

On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila  wrote:
>
> On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > So I skip tracking lag during a transaction just like the current HEAD.
> > Attach the new patch.
> >
>
> Thanks, please find the updated patch where I have slightly modified
> the comments.
>
> Sawada-San, Euler, do you have any opinion on this approach? I
> personally still prefer the approach implemented in v10 [1] especially
> due to the latest finding by Wang-San that we can't update the
> lag-tracker apart from when it is invoked at the transaction end.
> However, I am fine if we like this approach more.

Thank you for updating the patch.

The current patch looks much better than v10 which requires to call to
update_progress() every path.

Regarding v15 patch, I'm concerned a bit that the new function name,
update_progress(), is too generic. How about
update_replation_progress() or something more specific name?

---
+if (end_xact)
+{
+/* Update progress tracking at xact end. */
+OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
+changes_count = 0;
+return;
+}
+
+/*
+ * After continuously processing CHANGES_THRESHOLD changes,
we try to send
+ * a keepalive message if required.
+ *
+ * We don't want to try sending a keepalive message after
processing each
+ * change as that can have overhead. Testing reveals that there is no
+ * noticeable overhead in doing it after continuously
processing 100 or so
+ * changes.
+ */
+#define CHANGES_THRESHOLD 100
+if (++changes_count >= CHANGES_THRESHOLD)
+{
+OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
+changes_count = 0;
+}

Can we merge two if branches since we do the same things? Or did you
separate them for better readability?

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Logical replication timeout problem

2022-04-13 Thread Amit Kapila

On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
 wrote:
>
> So I skip tracking lag during a transaction just like the current HEAD.
> Attach the new patch.
>

Thanks, please find the updated patch where I have slightly modified
the comments.

Sawada-San, Euler, do you have any opinion on this approach? I
personally still prefer the approach implemented in v10 [1] especially
due to the latest finding by Wang-San that we can't update the
lag-tracker apart from when it is invoked at the transaction end.
However, I am fine if we like this approach more.

[1] - 
https://www.postgresql.org/message-id/OS3PR01MB6275E0C2B4D9E488AD7CBA209E1F9%40OS3PR01MB6275.jpnprd01.prod.outlook.com
-- 
With Regards,
Amit Kapila.

v15-0001-Fix-the-logical-replication-timeout-during-large.patch
Description: Binary data

RE: Logical replication timeout problem

2022-04-11 Thread wangw.f...@fujitsu.com

On Mon, Apr 11, 2022 at 2:39 PM I wrote:
> Attach the new patch.
Also, share test results and details.

To check that the lsn information used for the calculation is what we expected,
I get some information by adding logs in the function LagTrackerRead.

Summary of test results:
- In current HEAD and current HEAD with v14 patch, we could found the
  information of same lsn as received from subscriber-side in lag_tracker.
- In current HEAD with v13 patch, we could hardly found the information of same
  lsn in lag_tracker.

Attach the details:
[The log by HEAD]
the lsn we received from subscriber  |  the lsn whose time we used to calculate 
in lag_tracker
382826584|  382826584
743884840|  743884840
1104943232   |  1104943232
1468949424   |  1468949424
1469521216   |  1469521216

[The log by HEAD with v14 patch]
the lsn we received from subscriber  |  the lsn whose time we used to calculate 
in lag_tracker
382826584|  382826584
743890672|  743890672
1105074264   |  1105074264
1469127040   |  1469127040
1830591240   |  1830591240

[The log by HEAD with v13 patch]
the lsn we received from subscriber  |  the lsn whose time we used to calculate 
in lag_tracker
382826584|  359848728 
743884840|  713808560 
1105010640   |  1073978544
1468517536   |  1447850160
1469516328   |  1469516328

Regards,
Wang wei

RE: Logical replication timeout problem

2022-04-10 Thread wangw.f...@fujitsu.com

On Wed, Apr 7, 2022 at 1:34 PM Amit Kapila   wrote:
> On Wed, Apr 6, 2022 at 6:30 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Wed, Apr 6, 2022 at 1:58 AM Amit Kapila  wrote:
> > On Wed, Apr 6, 2022 at 4:32 AM Amit Kapila  wrote:
> > > Also, let's try to evaluate how it impacts lag functionality for large
> transactions?
> > I think this patch will not affect lag functionality. It will updates the 
> > lag
> > field of view pg_stat_replication more frequently.
> > IIUC, when invoking function WalSndUpdateProgress, it will store the lsn of
> > change and invoking time in lag_tracker. Then when invoking function
> > ProcessStandbyReplyMessage, it will calculate the lag field according to the
> > message from subscriber and the information in lag_tracker. This patch does
> > not modify this logic, but only increases the frequency of invoking.
> > Please let me know if I understand wrong.
> >
> 
> No, your understanding seems correct to me. But what I want to check
> is if calling the progress function more often has any impact on
> lag-related fields in pg_stat_replication? I think you need to check
> the impact of large transaction replay.
Thanks for the explanation.

After doing some checks, I found that the v13 patch makes the calculations of
lag functionality inaccurate.

In short, v13 patch lets us try to track lag more frequently and try to send a
keepalive message to subscribers. But in order to prevent flooding the lag
tracker, we could not track lag more than once within
WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress).
This means we may lose informations that needs to be tracked.
For example, suppose there is a large transaction with lsn from lsn1 to lsn3.
In HEAD, when we calculate the lag time for lsn3, the lag time of lsn3 is
(now - lsn3.time).
But with v13 patch, when we calculate the lag time for lsn3, because there
maybe no informations of lsn3 but has informations of lsn2 in lag_tracker, the
lag time of lsn3 is (now - t2.time). (see function LagTrackerRead)
Therefore, if we lose the informations that need to be tracked, the lag time
becomes large and inaccurate.

So I skip tracking lag during a transaction just like the current HEAD.
Attach the new patch.

Regards,
Wang wei

v14-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v14-0001-Fix-the-logical-replication-timeout-during-large.patch

RE: Logical replication timeout problem

2022-04-07 Thread wangw.f...@fujitsu.com

On Wed, Apr 7, 2022 at 1:34 PM Amit Kapila   wrote:
>
Thanks for your comments.

> One comment:
> +static void
> +update_progress(LogicalDecodingContext *ctx, bool skipped_xact, bool
> end_xact)
> +{
> + static int changes_count = 0;
> +
> + if (end_xact)
> + {
> + /* Update progress tracking at xact end. */
> + OutputPluginUpdateProgress(ctx, skipped_xact);
> + changes_count = 0;
> + }
> + /*
> + * After continuously processing CHANGES_THRESHOLD changes, update
> progress
> + * which will also try to send a keepalive message if required.
> 
> I think you can simply return after making changes_count = 0. There
> should be an empty line before starting the next comment.
Improve as suggested.
BTW, there is a conflict in current HEAD when applying v12 because of the
commit 2c7ea57. Also rebase it.

Attach the new patch.
1. Make some improvements to the new function update_progress. [suggestion by 
Amit-San]
2. Rebase the patch because the commit 2c7ea57 in current HEAD.

Regards,
Wang wei


v13-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v13-0001-Fix-the-logical-replication-timeout-during-large.patch

Re: Logical replication timeout problem

2022-04-06 Thread Amit Kapila

On Wed, Apr 6, 2022 at 6:30 PM wangw.f...@fujitsu.com
 wrote:
>
> On Wed, Apr 6, 2022 at 1:59 AM Amit Kapila  wrote:
> On Wed, Apr 6, 2022 at 4:32 AM Amit Kapila  wrote:
> >
> Thanks for your comments.
>
> >  typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct
> > LogicalDecodingContext *lr,
> >   XLogRecPtr Ptr,
> >   TransactionId xid,
> > - bool skipped_xact
> > + bool skipped_xact,
> > + bool last_write
> >
> > In this approach, I don't think we need an additional parameter last_write. 
> > Let's
> > do the work related to keepalive without a parameter, do you see any problem
> > with that?
> I agree with you. Modify this point.
>
> > I think this patch doesn't take into account that we call
> > OutputPluginUpdateProgress() from APIs like pgoutput_commit_txn(). I
> > think we should always call the new function update_progress from
> > those existing call sites and arrange the function such that when
> > called from xact end APIs like pgoutput_commit_txn(), it always call
> > OutputPluginUpdateProgress and make changes_count as 0.
> Improve it.
> Add two new input to function update_progress.(skipped_xact and end_xact).
> Modify the function invoke from OutputPluginUpdateProgress to update_progress.
>
> > Also, let's try to evaluate how it impacts lag functionality for large 
> > transactions?
> I think this patch will not affect lag functionality. It will updates the lag
> field of view pg_stat_replication more frequently.
> IIUC, when invoking function WalSndUpdateProgress, it will store the lsn of
> change and invoking time in lag_tracker. Then when invoking function
> ProcessStandbyReplyMessage, it will calculate the lag field according to the
> message from subscriber and the information in lag_tracker. This patch does
> not modify this logic, but only increases the frequency of invoking.
> Please let me know if I understand wrong.
>

No, your understanding seems correct to me. But what I want to check
is if calling the progress function more often has any impact on
lag-related fields in pg_stat_replication? I think you need to check
the impact of large transaction replay.

One comment:
+static void
+update_progress(LogicalDecodingContext *ctx, bool skipped_xact, bool end_xact)
+{
+ static int changes_count = 0;
+
+ if (end_xact)
+ {
+ /* Update progress tracking at xact end. */
+ OutputPluginUpdateProgress(ctx, skipped_xact);
+ changes_count = 0;
+ }
+ /*
+ * After continuously processing CHANGES_THRESHOLD changes, update progress
+ * which will also try to send a keepalive message if required.

I think you can simply return after making changes_count = 0. There
should be an empty line before starting the next comment.

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2022-04-06 Thread wangw.f...@fujitsu.com

On Wed, Apr 6, 2022 at 1:59 AM Amit Kapila  wrote:
On Wed, Apr 6, 2022 at 4:32 AM Amit Kapila  wrote:
>
Thanks for your comments.

>  typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct
> LogicalDecodingContext *lr,
>   XLogRecPtr Ptr,
>   TransactionId xid,
> - bool skipped_xact
> + bool skipped_xact,
> + bool last_write
> 
> In this approach, I don't think we need an additional parameter last_write. 
> Let's
> do the work related to keepalive without a parameter, do you see any problem
> with that?
I agree with you. Modify this point.

> I think this patch doesn't take into account that we call
> OutputPluginUpdateProgress() from APIs like pgoutput_commit_txn(). I
> think we should always call the new function update_progress from
> those existing call sites and arrange the function such that when
> called from xact end APIs like pgoutput_commit_txn(), it always call
> OutputPluginUpdateProgress and make changes_count as 0.
Improve it.
Add two new input to function update_progress.(skipped_xact and end_xact).
Modify the function invoke from OutputPluginUpdateProgress to update_progress.

> Also, let's try to evaluate how it impacts lag functionality for large 
> transactions?
I think this patch will not affect lag functionality. It will updates the lag
field of view pg_stat_replication more frequently.
IIUC, when invoking function WalSndUpdateProgress, it will store the lsn of
change and invoking time in lag_tracker. Then when invoking function
ProcessStandbyReplyMessage, it will calculate the lag field according to the
message from subscriber and the information in lag_tracker. This patch does
not modify this logic, but only increases the frequency of invoking.
Please let me know if I understand wrong.

Attach the new patch.
1. Remove the new function input parameters in this patch(parameter last_write
of WalSndUpdateProgress). [suggestion by Amit-San]
2. Also invoke function update_progress in other xact end APIs like
pgoutput_commit_txn. [suggestion by Amit-San]

Regards,
Wang wei


v12-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v12-0001-Fix-the-logical-replication-timeout-during-large.patch

Re: Logical replication timeout problem

2022-04-06 Thread Amit Kapila

On Wed, Apr 6, 2022 at 11:28 AM Amit Kapila  wrote:
>
> On Wed, Apr 6, 2022 at 11:09 AM wangw.f...@fujitsu.com
>  wrote:
> >
> > According to your suggestion, improve the patch to make it more generic.
> > Attach the new patch.
> >
>
>  typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct
> LogicalDecodingContext *lr,
>   XLogRecPtr Ptr,
>   TransactionId xid,
> - bool skipped_xact
> + bool skipped_xact,
> + bool last_write
>
> In this approach, I don't think we need an additional parameter
> last_write. Let's do the work related to keepalive without a
> parameter, do you see any problem with that?
>

I think this patch doesn't take into account that we call
OutputPluginUpdateProgress() from APIs like pgoutput_commit_txn(). I
think we should always call the new function update_progress from
those existing call sites and arrange the function such that when
called from xact end APIs like pgoutput_commit_txn(), it always call
OutputPluginUpdateProgress and make changes_count as 0.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-04-05 Thread Amit Kapila

On Wed, Apr 6, 2022 at 11:09 AM wangw.f...@fujitsu.com
 wrote:
>
> On Fri, Apr 1, 2022 at 12:09 AM Amit Kapila  wrote:
> > On Fri, Apr 1, 2022 at 8:28 AM Euler Taveira  wrote:
> > >
> > > It seems I didn't make myself clear. The callbacks I'm referring to the
> > > *_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a
> > > *_cb_wrapper() function, we have something like:
> > >
> > > if (ctx->progress & PGOUTPUT_PROGRESS_FOO)
> > > NewUpdateProgress(ctx, false);
> > >
> > > The NewUpdateProgress function would contain a logic similar to the
> > > update_progress() from the proposed patch. (A different function name here
> > just
> > > to avoid confusion.)
> > >
> > > The output plugin is responsible to set ctx->progress with the callback
> > > variables (for example, PGOUTPUT_PROGRESS_CHANGE for change_cb())
> > that we would
> > > like to run NewUpdateProgress.
> > >
> >
> > This sounds like a conflicting approach to what we currently do.
> > Currently, OutputPluginUpdateProgress() is called from the xact
> > related pgoutput functions like pgoutput_commit_txn(),
> > pgoutput_prepare_txn(), pgoutput_commit_prepared_txn(), etc. So, if we
> > follow what you are saying then for some of the APIs like
> > pgoutput_change/_message/_truncate, we need to set the parameter to
> > invoke NewUpdateProgress() which will internally call
> > OutputPluginUpdateProgress(), and for the remaining APIs, we will call
> > in the corresponding pgoutput_* function. I feel if we want to make it
> > more generic than the current patch, it is better to directly call
> > what you are referring to here as NewUpdateProgress() in all remaining
> > APIs like pgoutput_change/_truncate, etc.
> Thanks for your comments.
>
> According to your suggestion, improve the patch to make it more generic.
> Attach the new patch.
>

 typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct
LogicalDecodingContext *lr,
  XLogRecPtr Ptr,
  TransactionId xid,
- bool skipped_xact
+ bool skipped_xact,
+ bool last_write

In this approach, I don't think we need an additional parameter
last_write. Let's do the work related to keepalive without a
parameter, do you see any problem with that?

Also, let's try to evaluate how it impacts lag functionality for large
transactions?

-- 
With Regards,
Amit Kapila.

RE: Logical replication timeout problem

2022-04-05 Thread wangw.f...@fujitsu.com

On Fri, Apr 1, 2022 at 12:09 AM Amit Kapila  wrote:
> On Fri, Apr 1, 2022 at 8:28 AM Euler Taveira  wrote:
> >
> > On Thu, Mar 31, 2022, at 11:27 PM, Amit Kapila wrote:
> >
> > This is exactly our initial analysis and we have tried a patch on
> > these lines and it has a noticeable overhead. See [1]. Calling this
> > for each change or each skipped change can bring noticeable overhead
> > that is why we decided to call it after a certain threshold (100) of
> > skipped changes. Now, surely as mentioned in my previous reply we can
> > make it generic such that instead of calling this (update_progress
> > function as in the patch) for skipped cases, we call it always. Will
> > that make it better?
> >
> > That's what I have in mind but using a different approach.
> >
> > > The functions CreateInitDecodingContext and CreateDecodingContext
> receives the
> > > update_progress function as a parameter. These functions are called in 2
> > > places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) and
> (b)
> > > SQL logical decoding functions (pg_logical_*_changes). Case (a) uses
> > > WalSndUpdateProgress as a progress function. Case (b) does not have one
> because
> > > it is not required -- local decoding/communication. There is no custom
> update
> > > progress routine for each output plugin which leads me to the question:
> > > couldn't we encapsulate the update progress call into the callback 
> > > functions?
> > >
> >
> > Sorry, I don't get your point. What exactly do you mean by this?
> > AFAIS, currently we call this output plugin API in pgoutput functions
> > only, do you intend to get it invoked from a different place?
> >
> > It seems I didn't make myself clear. The callbacks I'm referring to the
> > *_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a
> > *_cb_wrapper() function, we have something like:
> >
> > if (ctx->progress & PGOUTPUT_PROGRESS_FOO)
> > NewUpdateProgress(ctx, false);
> >
> > The NewUpdateProgress function would contain a logic similar to the
> > update_progress() from the proposed patch. (A different function name here
> just
> > to avoid confusion.)
> >
> > The output plugin is responsible to set ctx->progress with the callback
> > variables (for example, PGOUTPUT_PROGRESS_CHANGE for change_cb())
> that we would
> > like to run NewUpdateProgress.
> >
> 
> This sounds like a conflicting approach to what we currently do.
> Currently, OutputPluginUpdateProgress() is called from the xact
> related pgoutput functions like pgoutput_commit_txn(),
> pgoutput_prepare_txn(), pgoutput_commit_prepared_txn(), etc. So, if we
> follow what you are saying then for some of the APIs like
> pgoutput_change/_message/_truncate, we need to set the parameter to
> invoke NewUpdateProgress() which will internally call
> OutputPluginUpdateProgress(), and for the remaining APIs, we will call
> in the corresponding pgoutput_* function. I feel if we want to make it
> more generic than the current patch, it is better to directly call
> what you are referring to here as NewUpdateProgress() in all remaining
> APIs like pgoutput_change/_truncate, etc.
Thanks for your comments.

According to your suggestion, improve the patch to make it more generic.
Attach the new patch.

Regards,
Wang wei


v11-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  v11-0001-Fix-the-logical-replication-timeout-during-large.patch

Re: Logical replication timeout problem

2022-03-31 Thread Amit Kapila

On Fri, Apr 1, 2022 at 8:28 AM Euler Taveira  wrote:
>
> On Thu, Mar 31, 2022, at 11:27 PM, Amit Kapila wrote:
>
> This is exactly our initial analysis and we have tried a patch on
> these lines and it has a noticeable overhead. See [1]. Calling this
> for each change or each skipped change can bring noticeable overhead
> that is why we decided to call it after a certain threshold (100) of
> skipped changes. Now, surely as mentioned in my previous reply we can
> make it generic such that instead of calling this (update_progress
> function as in the patch) for skipped cases, we call it always. Will
> that make it better?
>
> That's what I have in mind but using a different approach.
>
> > The functions CreateInitDecodingContext and CreateDecodingContext receives 
> > the
> > update_progress function as a parameter. These functions are called in 2
> > places: (a) streaming replication protocol (CREATE_REPLICATION_SLOT) and (b)
> > SQL logical decoding functions (pg_logical_*_changes). Case (a) uses
> > WalSndUpdateProgress as a progress function. Case (b) does not have one 
> > because
> > it is not required -- local decoding/communication. There is no custom 
> > update
> > progress routine for each output plugin which leads me to the question:
> > couldn't we encapsulate the update progress call into the callback 
> > functions?
> >
>
> Sorry, I don't get your point. What exactly do you mean by this?
> AFAIS, currently we call this output plugin API in pgoutput functions
> only, do you intend to get it invoked from a different place?
>
> It seems I didn't make myself clear. The callbacks I'm referring to the
> *_cb_wrapper functions. After every ctx->callbacks.foo_cb() call into a
> *_cb_wrapper() function, we have something like:
>
> if (ctx->progress & PGOUTPUT_PROGRESS_FOO)
> NewUpdateProgress(ctx, false);
>
> The NewUpdateProgress function would contain a logic similar to the
> update_progress() from the proposed patch. (A different function name here 
> just
> to avoid confusion.)
>
> The output plugin is responsible to set ctx->progress with the callback
> variables (for example, PGOUTPUT_PROGRESS_CHANGE for change_cb()) that we 
> would
> like to run NewUpdateProgress.
>

This sounds like a conflicting approach to what we currently do.
Currently, OutputPluginUpdateProgress() is called from the xact
related pgoutput functions like pgoutput_commit_txn(),
pgoutput_prepare_txn(), pgoutput_commit_prepared_txn(), etc. So, if we
follow what you are saying then for some of the APIs like
pgoutput_change/_message/_truncate, we need to set the parameter to
invoke NewUpdateProgress() which will internally call
OutputPluginUpdateProgress(), and for the remaining APIs, we will call
in the corresponding pgoutput_* function. I feel if we want to make it
more generic than the current patch, it is better to directly call
what you are referring to here as NewUpdateProgress() in all remaining
APIs like pgoutput_change/_truncate, etc.

-- 
With Regards,
Amit Kapila.

1 2 >

1 - 100 of 197 matches

Mail list logo