Re: Logical replication keepalive flood

2021-06-09 Thread Abbas Butt
Hi,

On Wed, Jun 9, 2021 at 2:30 PM Amit Kapila  wrote:

> On Wed, Jun 9, 2021 at 1:47 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Wed, 9 Jun 2021 11:21:55 +0900, Kyotaro Horiguchi <
> horikyota@gmail.com> wrote in
> > > The issue - if actually it is - we send a keep-alive packet before a
> > > quite short sleep.
> > >
> > > We really want to send it if the sleep gets long but we cannot predict
> > > that before entering a sleep.
> > >
> > > Let me think a little more on this..
> >
> > After some investigation, I find out that the keepalives are sent
> > almost always after XLogSendLogical requests for the *next* record.
> >
>
> Does these keepalive messages are sent at the same frequency even for
> subscribers?


Yes, I have tested it with one publisher and one subscriber.
The moment I start pgbench session I can see keepalive messages sent and
replied by the subscriber with same frequency.


> Basically, I wanted to check if we have logical
> replication set up between 2 nodes then do we send these keep-alive
> messages flood?


Yes we do.


> If not, then why is it different in the case of
> pg_recvlogical?


Nothing, the WAL sender behaviour is same in both cases.


> Is it possible that the write/flush location is not
> updated at the pace at which we expect?


Well, it is async replication. The receiver can choose to update LSNs at
its own will, say after 10 mins interval.
It should only impact the size of WAL retained by the server.

Please see commit 41d5f8ad73
> which seems to be talking about a similar problem.
>

That commit does not address this problem.


>
> --
> With Regards,
> Amit Kapila.
>


-- 
-- 
*Abbas*
Senior Architect


Ph: 92.334.5100153
Skype ID: gabbasb
edbpostgres.com

*Follow us on Twitter*
@EnterpriseDB


Re: Logical replication keepalive flood

2021-06-08 Thread Abbas Butt
Hi Kyotaro,
I have tried to test your patches. Unfortunately even after applying the
patches
the WAL Sender is still sending too frequent keepalive messages.
In my opinion the fix is to make sure that wal_sender_timeout/2 has passed
before sending
the keepalive message in the code fragment I had shared earlier.
In  other words we should replace the call to
WalSndKeepalive(false);
with
WalSndKeepaliveIfNecessary(false);

Do you agree with the suggested fix?

On Tue, Jun 8, 2021 at 10:09 AM Kyotaro Horiguchi 
wrote:

> At Tue, 08 Jun 2021 10:05:36 +0900 (JST), Kyotaro Horiguchi <
> horikyota@gmail.com> wrote in
> > At Mon, 7 Jun 2021 15:26:05 +0500, Abbas Butt <
> abbas.b...@enterprisedb.com> wrote in
> > > On Mon, Jun 7, 2021 at 3:13 PM Amit Kapila 
> wrote:
> > > > I am not sure sending feedback every time before sleep is a good
> idea,
> > > > this might lead to unnecessarily sending more messages. Can we try by
> > > > using one-second interval with -s option to see how it behaves? As a
> > > > matter of comparison the similar logic in workers.c uses
> > > > wal_receiver_timeout to send such an update message rather than
> > > > sending it every time before sleep.
> >
> > Logical walreceiver sends a feedback when walrcv_eceive() doesn't
> > receive a byte.  If its' not good that pg_recvlogical does the same
> > thing, do we need to improve logical walsender's behavior as well?
>
> For the clarity, only the change in the walsender side can stop the
> flood.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>


-- 
-- 
*Abbas*
Senior Architect


Ph: 92.334.5100153
Skype ID: gabbasb
edbpostgres.com

*Follow us on Twitter*
@EnterpriseDB


Re: Logical replication keepalive flood

2021-06-07 Thread Abbas Butt
On Mon, Jun 7, 2021 at 3:13 PM Amit Kapila  wrote:

> On Mon, Jun 7, 2021 at 12:54 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Sat, 5 Jun 2021 16:08:00 +0500, Abbas Butt <
> abbas.b...@enterprisedb.com> wrote in
> > > Hi,
> > > I have observed the following behavior with PostgreSQL 13.3.
> > >
> > > The WAL sender process sends approximately 500 keepalive messages per
> > > second to pg_recvlogical.
> > > These keepalive messages are totally un-necessary.
> > > Keepalives should be sent only if there is no network traffic and a
> certain
> > > time (half of wal_sender_timeout) passes.
> > > These keepalive messages not only choke the network but also impact the
> > > performance of the receiver,
> > > because the receiver has to process the received message and then
> decide
> > > whether to reply to it or not.
> > > The receiver remains busy doing this activity 500 times a second.
> >
> > I can reproduce the problem.
> >
> > > On investigation it is revealed that the following code fragment in
> > > function WalSndWaitForWal in file walsender.c is responsible for
> sending
> > > these frequent keepalives:
> > >
> > > if (MyWalSnd->flush < sentPtr &&
> > > MyWalSnd->write < sentPtr &&
> > > !waiting_for_ping_response)
> > > WalSndKeepalive(false);
> >
> > The immediate cause is pg_recvlogical doesn't send a reply before
> > sleeping. Currently it sends replies every 10 seconds intervals.
> >
>
> Yeah, but one can use -s option to send it at lesser intervals.
>

That option can impact pg_recvlogical, it will not impact the server
sending keepalives too frequently.
By default the status interval is 10 secs, still we are getting 500
keepalives a second from the server.


>
> > So the attached first patch stops the flood.
> >
>
> I am not sure sending feedback every time before sleep is a good idea,
> this might lead to unnecessarily sending more messages. Can we try by
> using one-second interval with -s option to see how it behaves? As a
> matter of comparison the similar logic in workers.c uses
> wal_receiver_timeout to send such an update message rather than
> sending it every time before sleep.
>
> > That said, I don't think it is not intended that logical walsender
> > sends keep-alive packets with such a high frequency.  It happens
> > because walsender actually doesn't wait at all because it waits on
> > WL_SOCKET_WRITEABLE because the keep-alive packet inserted just before
> > is always pending.
> >
> > So as the attached second, we should try to flush out the keep-alive
> > packets if possible before checking pg_is_send_pending().
> >
>
> /* Send keepalive if the time has come */
>   WalSndKeepaliveIfNecessary();
>
> + /* We may have queued a keep alive packet. flush it before sleeping. */
> + pq_flush_if_writable();
>
> We already call pq_flush_if_writable() from WalSndKeepaliveIfNecessary
> after sending the keep-alive message, so not sure how this helps?
>
> --
> With Regards,
> Amit Kapila.
>


-- 
-- 
*Abbas*
Senior Architect


Ph: 92.334.5100153
Skype ID: gabbasb
edbpostgres.com

*Follow us on Twitter*
@EnterpriseDB


Logical replication keepalive flood

2021-06-05 Thread Abbas Butt
Hi,
I have observed the following behavior with PostgreSQL 13.3.

The WAL sender process sends approximately 500 keepalive messages per
second to pg_recvlogical.
These keepalive messages are totally un-necessary.
Keepalives should be sent only if there is no network traffic and a certain
time (half of wal_sender_timeout) passes.
These keepalive messages not only choke the network but also impact the
performance of the receiver,
because the receiver has to process the received message and then decide
whether to reply to it or not.
The receiver remains busy doing this activity 500 times a second.

On investigation it is revealed that the following code fragment in
function WalSndWaitForWal in file walsender.c is responsible for sending
these frequent keepalives:

if (MyWalSnd->flush < sentPtr &&
MyWalSnd->write < sentPtr &&
!waiting_for_ping_response)
WalSndKeepalive(false);

waiting_for_ping_response is normally false, and flush and write will
always be less than sentPtr (Receiver's LSNs cannot advance server's LSNs)

Here are the steps to reproduce:
1. Start the database server.
2. Setup pgbench tables.
  ./pgbench -i -s 50 -h 192.168.5.140 -p 7654 -U abbas postgres
3. Create a logical replication slot.
   SELECT * FROM pg_create_logical_replication_slot('my_slot',
'test_decoding');
4. Start pg_recvlogical.
  ./pg_recvlogical --slot=my_slot --verbose -d postgres -h 192.168.5.140 -p
7654 -U abbas --start -f -
5. Run pgbench
  ./pgbench -U abbas -h 192.168.5.140 -p 7654  -c 2 -j 2 -T 1200 -n postgres
6. Observer network traffic to find the keepalive flood.

Alternately modify the above code fragment to see approx 500 keepalive log
messages a second

if (MyWalSnd->flush < sentPtr &&
MyWalSnd->write < sentPtr &&
!waiting_for_ping_response)
{
elog(LOG, "[Keepalive]  wrt ptr %X/%X  snt ptr %X/%X ",
   (uint32) (MyWalSnd->write >> 32),
   (uint32) MyWalSnd->write,
   (uint32) (sentPtr >> 32),
   (uint32) sentPtr);
WalSndKeepalive(false);
}

Opinions?

-- 
-- 
*Abbas*
Senior Architect


Ph: 92.334.5100153
Skype ID: gabbasb
edbpostgres.com

*Follow us on Twitter*
@EnterpriseDB


How to test GSSAPI based encryption support

2019-12-25 Thread Abbas Butt
Hi,

I want to test GSSAPI based encryption support added via commit ID
b0b39f72b9904bcb80f97b35837ccff1578aa4b8.

I have built source --with-gssapi.
After installation I added the following line in pg_hba.conf
hostgssenc  all  all   172.16.214.149/24   trust
Next I ran the server and ran psql using
./psql 'host=172.16.214.149 port=5432 dbname=postgres user=abbas
gssencmode=require'
and it resulted in the following error:
psql: error: could not connect to server: GSSAPI encryption required but
was impossible (possibly no credential cache, no server support, or using a
local socket)

What steps should I follow If I want to test just the encryption support?

If GSSAPI based encryption support cannot be tested without GSSAPI
(kerberos) based authentication, then what is the purpose of having trust
as authentication method for hostgssenc connections?

Best Regards
-- 
*Abbas*
Architect

Ph: 92.334.5100153
Skype ID: gabbasb
www.enterprisedb.co m


*Follow us on Twitter*
@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars, whitepapers
 and more