Re: Logical replication keepalive flood

2021-10-01 Thread Kyotaro Horiguchi
At Thu, 30 Sep 2021 17:07:08 +0530, Amit Kapila wrote in > On Thu, Sep 30, 2021 at 1:26 PM Kyotaro Horiguchi > wrote: > > If the comment describes the objective correctly, the only possible > > impact would be that there may be a case where server responds a bit > > slowly for a shutdown

Re: Logical replication keepalive flood

2021-10-01 Thread Greg Nancarrow
On Thu, Sep 30, 2021 at 5:56 PM Kyotaro Horiguchi wrote: > > After the patch applied, that keepalive is sent only when the loop is > actually going to sleep some time. In case the next WAL doesn't come > for KEEPALIVE_TIMEOUT milliseconds, it sends a keepalive. There's a > dubious behavior when

Re: Logical replication keepalive flood

2021-09-30 Thread Amit Kapila
On Thu, Sep 30, 2021 at 3:41 PM Kyotaro Horiguchi wrote: > > At Thu, 30 Sep 2021 17:08:35 +0900 (JST), Kyotaro Horiguchi > wrote in > > At Thu, 30 Sep 2021 17:21:03 +1000, Greg Nancarrow > > wrote in > > > Actually, with the patch applied, I find that "make check-world" fails > > >

Re: Logical replication keepalive flood

2021-09-30 Thread Amit Kapila
On Thu, Sep 30, 2021 at 1:26 PM Kyotaro Horiguchi wrote: > > At Thu, 30 Sep 2021 16:21:25 +1000, Greg Nancarrow > wrote in > > On Thu, Sep 30, 2021 at 3:56 PM Amit Kapila wrote: > > > > > > > Also, more to the point this special keep_alive seems to be sent for > > > synchronous replication and

Re: Logical replication keepalive flood

2021-09-30 Thread Kyotaro Horiguchi
At Thu, 30 Sep 2021 17:08:35 +0900 (JST), Kyotaro Horiguchi wrote in > At Thu, 30 Sep 2021 17:21:03 +1000, Greg Nancarrow > wrote in > > Actually, with the patch applied, I find that "make check-world" fails > > (006_logical_decoding, test 7). > > Mmm.. > > t/006_logical_decoding.pl ..

Re: Logical replication keepalive flood

2021-09-30 Thread Greg Nancarrow
On Thu, Sep 30, 2021 at 6:08 PM Kyotaro Horiguchi wrote: > > At Thu, 30 Sep 2021 17:21:03 +1000, Greg Nancarrow > wrote in > > Actually, with the patch applied, I find that "make check-world" fails > > (006_logical_decoding, test 7). > > Mmm.. > > t/006_logical_decoding.pl .. 4/14 > # Failed

Re: Logical replication keepalive flood

2021-09-30 Thread Kyotaro Horiguchi
At Thu, 30 Sep 2021 17:21:03 +1000, Greg Nancarrow wrote in > Actually, with the patch applied, I find that "make check-world" fails > (006_logical_decoding, test 7). Mmm.. t/006_logical_decoding.pl .. 4/14 # Failed test 'pg_recvlogical acknowledged changes' # at

Re: Logical replication keepalive flood

2021-09-30 Thread Kyotaro Horiguchi
At Thu, 30 Sep 2021 16:21:25 +1000, Greg Nancarrow wrote in > On Thu, Sep 30, 2021 at 3:56 PM Amit Kapila wrote: > > > > It claims to skip sending keepalive if the sleep time is shorter than > > KEEPALIVE_TIMEOUT (a new threshold) but the above code seems to > > suggest it sends in both cases.

Re: Logical replication keepalive flood

2021-09-30 Thread Greg Nancarrow
On Thu, Sep 30, 2021 at 1:19 PM Hou, Zhijie/侯 志杰 wrote: > > I tested that patch and can see the keepalive message is reduced and > the patch won't cause the current regression test fail. > Actually, with the patch applied, I find that "make check-world" fails (006_logical_decoding, test 7).

Re: Logical replication keepalive flood

2021-09-30 Thread Greg Nancarrow
On Thu, Sep 30, 2021 at 3:56 PM Amit Kapila wrote: > > I am not able to understand some parts of that patch. > > + If the sleep is shorter > + * than KEEPALIVE_TIMEOUT milliseconds, we skip sending a keepalive to > + * prevent it from getting too-frequent. > + */ > + if (MyWalSnd->flush < sentPtr

Re: Logical replication keepalive flood

2021-09-29 Thread Amit Kapila
On Thu, Sep 30, 2021 at 8:49 AM Hou, Zhijie/侯 志杰 wrote: > > I noticed another patch that Horiguchi-San posted earlier[1]. > > The approach in that patch is to splits the sleep into two > pieces. If the first sleep reaches the timeout then send a keepalive > then sleep for the remaining time. > >

Re: Logical replication keepalive flood

2021-09-19 Thread Greg Nancarrow
On Sun, Sep 19, 2021 at 3:47 PM Amit Kapila wrote: > > > > > Yes, pg_recvlogical seems to be relying on receiving a keepalive for > > its "--endpos" logic to work (and the 006 test is relying on '' record > > output from pg_recvlogical in this case). > > But is it correct to be relying on a

Re: Logical replication keepalive flood

2021-09-18 Thread Amit Kapila
On Fri, Sep 17, 2021 at 3:03 PM Greg Nancarrow wrote: > > On Thu, Sep 16, 2021 at 10:36 PM Amit Kapila wrote: > > > > I think here the reason is that the first_lsn of a transaction is > > always equal to end_lsn of the previous transaction (See comments > > above first_lsn and end_lsn fields of

Re: Logical replication keepalive flood

2021-09-17 Thread Greg Nancarrow
On Thu, Sep 16, 2021 at 10:36 PM Amit Kapila wrote: > > I think here the reason is that the first_lsn of a transaction is > always equal to end_lsn of the previous transaction (See comments > above first_lsn and end_lsn fields of ReorderBufferTXN). That may be the case, but those comments

Re: Logical replication keepalive flood

2021-09-17 Thread Amit Kapila
On Fri, Sep 17, 2021 at 12:42 PM Peter Smith wrote: > > On Thu, Sep 16, 2021 at 10:59 AM houzj.f...@fujitsu.com > wrote: > > Hello Hous-san, thanks for including the steps. Unfortunately, no > matter what I tried, I could never get the patch to display the > problem "BEGIN 709" for the 2nd time

Re: Logical replication keepalive flood

2021-09-17 Thread Peter Smith
On Thu, Sep 16, 2021 at 10:59 AM houzj.f...@fujitsu.com wrote: > > From Tuesday, September 14, 2021 1:39 PM Greg Nancarrow > wrote: > > However, the problem I found is that, with the patch applied, there is > > a test failure when running “make check-world”: > > > > t/006_logical_decoding.pl

RE: Logical replication keepalive flood

2021-09-17 Thread houzj.f...@fujitsu.com
On Thursday, September 16, 2021 8:36 PM Amit Kapila : > On Thu, Sep 16, 2021 at 6:29 AM houzj.f...@fujitsu.com > wrote: > > > > After applying the patch, > > I saw the same problem and can reproduce it by the following steps: > > > > 1) execute the SQLs. > > ---SQL--- > > CREATE

Re: Logical replication keepalive flood

2021-09-16 Thread Amit Kapila
On Thu, Aug 12, 2021 at 8:03 AM Peter Smith wrote: > > That base data is showing there are similar numbers of keepalives sent > as there are calls made to WalSndWaitForWal. IIUC it means that mostly > the loop is sending the special keepalives on the *first* iteration, > but by the time of the

Re: Logical replication keepalive flood

2021-09-16 Thread Amit Kapila
On Thu, Sep 16, 2021 at 6:29 AM houzj.f...@fujitsu.com wrote: > > From Tuesday, September 14, 2021 1:39 PM Greg Nancarrow > wrote: > > However, the problem I found is that, with the patch applied, there is > > a test failure when running “make check-world”: > > > > t/006_logical_decoding.pl

RE: Logical replication keepalive flood

2021-09-15 Thread houzj.f...@fujitsu.com
From Tuesday, September 14, 2021 1:39 PM Greg Nancarrow wrote: > However, the problem I found is that, with the patch applied, there is > a test failure when running “make check-world”: > > t/006_logical_decoding.pl 4/14 > # Failed test 'pg_recvlogical acknowledged changes' > #

Re: Logical replication keepalive flood

2021-09-13 Thread Greg Nancarrow
On Thu, Aug 12, 2021 at 12:33 PM Peter Smith wrote: > > This data shows the special keepalives are now greatly reduced from > 1000s to just 10s. > > Thoughts? > I could easily see the flood of keepalives with the test setup described by the original problem reporter (Abbas Butt). I found that

Re: Logical replication keepalive flood

2021-08-13 Thread Peter Smith
FYI - Here are some more counter results with/without the skip patch [1] applied. This is the same test setup as before except now using *synchronous* pub/sub. // Test setup: - using synchronous pub/sub - subscription is for the pgbench_history table - pgbench is run for 10 seconds -

Re: Logical replication keepalive flood

2021-08-11 Thread Peter Smith
Hi. By using Kyotaro's "counting" patch I was able to reproduce very similar results to what he had earlier posted [1]. AFAIK I have the same test scenario that he was using. Test setup: - using async pub/sub - subscription is for the pgbench_history table - pgbench is run for 10 seconds -

Re: Logical replication keepalive flood

2021-06-12 Thread Amit Kapila
On Fri, Jun 11, 2021 at 7:07 AM Kyotaro Horiguchi wrote: > > At Thu, 10 Jun 2021 12:18:00 +0530, Amit Kapila > wrote in > > Good analysis. I think this analysis has shown that walsender is > > sending messages at top speed as soon as they are generated. So, I am > > wondering why there is any

Re: Logical replication keepalive flood

2021-06-10 Thread Kyotaro Horiguchi
At Thu, 10 Jun 2021 12:18:00 +0530, Amit Kapila wrote in > Good analysis. I think this analysis has shown that walsender is > sending messages at top speed as soon as they are generated. So, I am > wondering why there is any need to wait/sleep in such a workload. One > possibility that occurred

Re: Logical replication keepalive flood

2021-06-10 Thread Amit Kapila
On Thu, Jun 10, 2021 at 11:42 AM Kyotaro Horiguchi wrote: > > At Thu, 10 Jun 2021 15:00:16 +0900 (JST), Kyotaro Horiguchi > wrote in > > At Wed, 9 Jun 2021 17:32:25 +0500, Abbas Butt > > wrote in > > > > > > On Wed, Jun 9, 2021 at 2:30 PM Amit Kapila > > > wrote: > > > > Is it possible that

Re: Logical replication keepalive flood

2021-06-10 Thread Kyotaro Horiguchi
At Thu, 10 Jun 2021 15:00:16 +0900 (JST), Kyotaro Horiguchi wrote in > At Wed, 9 Jun 2021 17:32:25 +0500, Abbas Butt > wrote in > > > > On Wed, Jun 9, 2021 at 2:30 PM Amit Kapila wrote: > > > Is it possible that the write/flush location is not > > > updated at the pace at which we expect?

Re: Logical replication keepalive flood

2021-06-10 Thread Kyotaro Horiguchi
At Wed, 9 Jun 2021 17:32:25 +0500, Abbas Butt wrote in > > On Wed, Jun 9, 2021 at 2:30 PM Amit Kapila wrote: > > Does these keepalive messages are sent at the same frequency even for > > subscribers? > > Yes, I have tested it with one publisher and one subscriber. > The moment I start

Re: Logical replication keepalive flood

2021-06-09 Thread Abbas Butt
Hi, On Wed, Jun 9, 2021 at 2:30 PM Amit Kapila wrote: > On Wed, Jun 9, 2021 at 1:47 PM Kyotaro Horiguchi > wrote: > > > > At Wed, 9 Jun 2021 11:21:55 +0900, Kyotaro Horiguchi < > horikyota@gmail.com> wrote in > > > The issue - if actually it is - we send a keep-alive packet before a > > >

Re: Logical replication keepalive flood

2021-06-09 Thread Amit Kapila
On Wed, Jun 9, 2021 at 1:47 PM Kyotaro Horiguchi wrote: > > At Wed, 9 Jun 2021 11:21:55 +0900, Kyotaro Horiguchi > wrote in > > The issue - if actually it is - we send a keep-alive packet before a > > quite short sleep. > > > > We really want to send it if the sleep gets long but we cannot

Re: Logical replication keepalive flood

2021-06-09 Thread Kyotaro Horiguchi
At Wed, 9 Jun 2021 11:21:55 +0900, Kyotaro Horiguchi wrote in > The issue - if actually it is - we send a keep-alive packet before a > quite short sleep. > > We really want to send it if the sleep gets long but we cannot predict > that before entering a sleep. > > Let me think a little more

Re: Logical replication keepalive flood

2021-06-08 Thread Kyotaro Horiguchi
Hi. On 2021/06/08 21:21, Abbas Butt wrote: Hi Kyotaro, I have tried to test your patches. Unfortunately even after applying the patches the WAL Sender is still sending too frequent keepalive messages. Sorry for the bogus patch.  I must have seen something impossible. The keep-alive packet is

Re: Logical replication keepalive flood

2021-06-08 Thread Abbas Butt
Hi Kyotaro, I have tried to test your patches. Unfortunately even after applying the patches the WAL Sender is still sending too frequent keepalive messages. In my opinion the fix is to make sure that wal_sender_timeout/2 has passed before sending the keepalive message in the code fragment I had

Re: Logical replication keepalive flood

2021-06-07 Thread Kyotaro Horiguchi
At Tue, 08 Jun 2021 10:05:36 +0900 (JST), Kyotaro Horiguchi wrote in > At Mon, 7 Jun 2021 15:26:05 +0500, Abbas Butt > wrote in > > On Mon, Jun 7, 2021 at 3:13 PM Amit Kapila wrote: > > > I am not sure sending feedback every time before sleep is a good idea, > > > this might lead to

Re: Logical replication keepalive flood

2021-06-07 Thread Kyotaro Horiguchi
At Mon, 7 Jun 2021 15:26:05 +0500, Abbas Butt wrote in > On Mon, Jun 7, 2021 at 3:13 PM Amit Kapila wrote: > > > The immediate cause is pg_recvlogical doesn't send a reply before > > > sleeping. Currently it sends replies every 10 seconds intervals. > > > > > > > Yeah, but one can use -s

Re: Logical replication keepalive flood

2021-06-07 Thread Abbas Butt
On Mon, Jun 7, 2021 at 3:13 PM Amit Kapila wrote: > On Mon, Jun 7, 2021 at 12:54 PM Kyotaro Horiguchi > wrote: > > > > At Sat, 5 Jun 2021 16:08:00 +0500, Abbas Butt < > abbas.b...@enterprisedb.com> wrote in > > > Hi, > > > I have observed the following behavior with PostgreSQL 13.3. > > > > > >

Re: Logical replication keepalive flood

2021-06-07 Thread Amit Kapila
On Mon, Jun 7, 2021 at 12:54 PM Kyotaro Horiguchi wrote: > > At Sat, 5 Jun 2021 16:08:00 +0500, Abbas Butt > wrote in > > Hi, > > I have observed the following behavior with PostgreSQL 13.3. > > > > The WAL sender process sends approximately 500 keepalive messages per > > second to

Re: Logical replication keepalive flood

2021-06-07 Thread Kyotaro Horiguchi
At Sat, 5 Jun 2021 16:08:00 +0500, Abbas Butt wrote in > Hi, > I have observed the following behavior with PostgreSQL 13.3. > > The WAL sender process sends approximately 500 keepalive messages per > second to pg_recvlogical. > These keepalive messages are totally un-necessary. > Keepalives

Logical replication keepalive flood

2021-06-05 Thread Abbas Butt
Hi, I have observed the following behavior with PostgreSQL 13.3. The WAL sender process sends approximately 500 keepalive messages per second to pg_recvlogical. These keepalive messages are totally un-necessary. Keepalives should be sent only if there is no network traffic and a certain time