Re: Minimal logical decoding on standbys

2023-04-11 Thread Noah Misch
On Tue, Apr 11, 2023 at 01:10:57PM -0700, Andres Freund wrote:
> On 2023-04-11 11:04:50 +0200, Drouvot, Bertrand wrote:
> > On 4/11/23 10:55 AM, Drouvot, Bertrand wrote:
> > > I think we might want to add:
> > > 
> > > $node_primary->wait_for_replay_catchup($node_standby);
> > > 
> > > before calling the slot creation.

> Pushed. Seems like a clear race in the test, so I didn't think it was worth
> waiting for testing it on hoverfly.

We'll see what happens in the next run.

> I think we should lower the log level, but perhaps wait for a few more cycles
> in case there are random failures?

Fine with me.

> I wonder if we should make the connections in poll_query_until to reduce
> verbosity - it's pretty annoying how much that can bloat the log. Perhaps also
> introduce some backoff? It's really annoying to have to trawl through all
> those logs when there's a problem.

Agreed.  My ranked wish list for poll_query_until is:

1. Exponential backoff
2. Closed-loop time control via Time::HiRes or similar, instead of assuming
   that ten loops complete in ~1s.  I've seen the loop take 3x as long as the
   intended timeout.
3. Connect less often than today's once per probe




Re: Minimal logical decoding on standbys

2023-04-11 Thread Tom Lane
Andres Freund  writes:
> I think we should lower the log level, but perhaps wait for a few more cycles
> in case there are random failures?

Removing

-log_min_messages = 'debug2'
-log_error_verbosity = verbose

not only reduces 035's log output volume from 1.6MB to 260kB,
but also speeds it up nontrivially: on my machine it takes
about 8.50 sec as of HEAD, but 8.09 sec after silencing the
extra logging.  So I definitely want to see that out of there.

regards, tom lane




Re: Minimal logical decoding on standbys

2023-04-11 Thread Andres Freund
Hi,

On 2023-04-11 11:04:50 +0200, Drouvot, Bertrand wrote:
> On 4/11/23 10:55 AM, Drouvot, Bertrand wrote:
> > Hi,
> > 
> > I think we might want to add:
> > 
> > $node_primary->wait_for_replay_catchup($node_standby);
> > 
> > before calling the slot creation.
> > 
> > It's done in the attached, would it be possible to give it a try please?
> 
> Actually, let's also wait for the cascading standby to catchup too, like done 
> in the attached.

Pushed. Seems like a clear race in the test, so I didn't think it was worth
waiting for testing it on hoverfly.

I think we should lower the log level, but perhaps wait for a few more cycles
in case there are random failures?

I wonder if we should make the connections in poll_query_until to reduce
verbosity - it's pretty annoying how much that can bloat the log. Perhaps also
introduce some backoff? It's really annoying to have to trawl through all
those logs when there's a problem.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-11 Thread Drouvot, Bertrand

Hi,

On 4/11/23 10:55 AM, Drouvot, Bertrand wrote:

Hi,

I think we might want to add:

$node_primary->wait_for_replay_catchup($node_standby);

before calling the slot creation.

It's done in the attached, would it be possible to give it a try please?


Actually, let's also wait for the cascading standby to catchup too, like done 
in the attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index ba98a18bd2..94a8384c31 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -653,9 +653,15 @@ $node_standby->reload;
 $node_primary->psql('postgres', q[CREATE DATABASE testdb]);
 $node_primary->safe_psql('testdb', qq[CREATE TABLE decoding_test(x integer, y 
text);]);
 
+# Wait for the standby to catchup before creating the slots
+$node_primary->wait_for_replay_catchup($node_standby);
+
 # create the logical slots
 create_logical_slots($node_standby, 'promotion_');
 
+# Wait for the cascading standby to catchup before creating the slots
+$node_standby->wait_for_replay_catchup($node_cascading_standby, $node_primary);
+
 # create the logical slots on the cascading standby too
 create_logical_slots($node_cascading_standby, 'promotion_');
 


Re: Minimal logical decoding on standbys

2023-04-11 Thread Drouvot, Bertrand

Hi,

On 4/11/23 10:20 AM, Drouvot, Bertrand wrote:

Hi,

On 4/11/23 7:36 AM, Noah Misch wrote:

On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote:

--- /dev/null
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -0,0 +1,720 @@
+# logical decoding on standby : test logical decoding,
+# recovery conflict and standby promotion.

...

+$node_primary->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+});


Buildfarm member hoverfly stopped reporting in when this test joined the tree.
It's currently been stuck here for 140 minutes:



Thanks for the report!

It's looping on:

2023-04-11 02:57:52.516 UTC [62718288:5] 035_standby_logical_decoding.pl LOG:  
0: statement: SELECT restart_lsn IS NOT NULL
     FROM pg_catalog.pg_replication_slots WHERE slot_name = 
'promotion_inactiveslot'

And the reason is that the slot is not being created:

$ grep "CREATE_REPLICATION_SLOT" 035_standby_logical_decoding_standby.log | 
tail -2
2023-04-11 02:57:47.287 UTC [9241178:15] 035_standby_logical_decoding.pl STATEMENT:  
CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 
'nothing')
2023-04-11 02:57:47.622 UTC [9241178:23] 035_standby_logical_decoding.pl STATEMENT:  
CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 
'nothing')

Not sure why the slot is not being created.

There is also "replication apply delay" increasing:

2023-04-11 02:57:49.183 UTC [13304488:253] DEBUG:  0: sendtime 2023-04-11 
02:57:49.111363+00 receipttime 2023-04-11 02:57:49.183512+00 replication apply 
delay 644 ms transfer latency 73 ms
2023-04-11 02:57:49.184 UTC [13304488:259] DEBUG:  0: sendtime 2023-04-11 
02:57:49.183461+00 receipttime 2023-04-11 02:57:49.1842+00 replication apply 
delay 645 ms transfer latency 1 ms
2023-04-11 02:57:49.221 UTC [13304488:265] DEBUG:  0: sendtime 2023-04-11 
02:57:49.184166+00 receipttime 2023-04-11 02:57:49.221059+00 replication apply 
delay 682 ms transfer latency 37 ms
2023-04-11 02:57:49.222 UTC [13304488:271] DEBUG:  0: sendtime 2023-04-11 
02:57:49.221003+00 receipttime 2023-04-11 02:57:49.222144+00 replication apply 
delay 683 ms transfer latency 2 ms
2023-04-11 02:57:49.222 UTC [13304488:277] DEBUG:  0: sendtime 2023-04-11 
02:57:49.222095+00 receipttime 2023-04-11 02:57:49.2228+00 replication apply 
delay 684 ms transfer latency 1 ms

Noah, I think hoverfly is yours, would it be possible to have access (I'm not 
an AIX expert though) or check if you see a slot creation hanging and if so why?



Well, we can see in 035_standby_logical_decoding_standby.log:

2023-04-11 02:57:49.180 UTC [62718258:5] [unknown] FATAL:  3D000: database 
"testdb" does not exist

While, on the primary:

2023-04-11 02:57:48.505 UTC [62718254:5] 035_standby_logical_decoding.pl LOG:  
0: statement: CREATE DATABASE testdb

The TAP test is doing:

"
##
# Test standby promotion and logical decoding behavior
# after the standby gets promoted.
##

$node_standby->reload;

$node_primary->psql('postgres', q[CREATE DATABASE testdb]);
$node_primary->safe_psql('testdb', qq[CREATE TABLE decoding_test(x integer, y 
text);]);

# create the logical slots
create_logical_slots($node_standby, 'promotion_');
"

I think we might want to add:

$node_primary->wait_for_replay_catchup($node_standby);

before calling the slot creation.

It's done in the attached, would it be possible to give it a try please?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index ba98a18bd2..ad845aee28 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -653,6 +653,9 @@ $node_standby->reload;
 $node_primary->psql('postgres', q[CREATE DATABASE testdb]);
 $node_primary->safe_psql('testdb', qq[CREATE TABLE decoding_test(x integer, y 
text);]);
 
+# Wait for the standby to catchup before creating the slots
+$node_primary->wait_for_replay_catchup($node_standby);
+
 # create the logical slots
 create_logical_slots($node_standby, 'promotion_');
 


Re: Minimal logical decoding on standbys

2023-04-11 Thread Drouvot, Bertrand

Hi,

On 4/11/23 7:36 AM, Noah Misch wrote:

On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote:

--- /dev/null
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -0,0 +1,720 @@
+# logical decoding on standby : test logical decoding,
+# recovery conflict and standby promotion.

...

+$node_primary->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+});


Buildfarm member hoverfly stopped reporting in when this test joined the tree.
It's currently been stuck here for 140 minutes:



Thanks for the report!

It's looping on:

2023-04-11 02:57:52.516 UTC [62718288:5] 035_standby_logical_decoding.pl LOG:  
0: statement: SELECT restart_lsn IS NOT NULL
FROM pg_catalog.pg_replication_slots WHERE slot_name = 
'promotion_inactiveslot'

And the reason is that the slot is not being created:

$ grep "CREATE_REPLICATION_SLOT" 035_standby_logical_decoding_standby.log | 
tail -2
2023-04-11 02:57:47.287 UTC [9241178:15] 035_standby_logical_decoding.pl STATEMENT:  
CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 
'nothing')
2023-04-11 02:57:47.622 UTC [9241178:23] 035_standby_logical_decoding.pl STATEMENT:  
CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 
'nothing')

Not sure why the slot is not being created.

There is also "replication apply delay" increasing:

2023-04-11 02:57:49.183 UTC [13304488:253] DEBUG:  0: sendtime 2023-04-11 
02:57:49.111363+00 receipttime 2023-04-11 02:57:49.183512+00 replication apply 
delay 644 ms transfer latency 73 ms
2023-04-11 02:57:49.184 UTC [13304488:259] DEBUG:  0: sendtime 2023-04-11 
02:57:49.183461+00 receipttime 2023-04-11 02:57:49.1842+00 replication apply 
delay 645 ms transfer latency 1 ms
2023-04-11 02:57:49.221 UTC [13304488:265] DEBUG:  0: sendtime 2023-04-11 
02:57:49.184166+00 receipttime 2023-04-11 02:57:49.221059+00 replication apply 
delay 682 ms transfer latency 37 ms
2023-04-11 02:57:49.222 UTC [13304488:271] DEBUG:  0: sendtime 2023-04-11 
02:57:49.221003+00 receipttime 2023-04-11 02:57:49.222144+00 replication apply 
delay 683 ms transfer latency 2 ms
2023-04-11 02:57:49.222 UTC [13304488:277] DEBUG:  0: sendtime 2023-04-11 
02:57:49.222095+00 receipttime 2023-04-11 02:57:49.2228+00 replication apply 
delay 684 ms transfer latency 1 ms

Noah, I think hoverfly is yours, would it be possible to have access (I'm not 
an AIX expert though) or check if you see a slot creation hanging and if so why?


===
$ tail -n5 regress_log_035_standby_logical_decoding
[02:57:48.390](0.100s) ok 66 - otherslot on standby not dropped

### Reloading node "standby"
# Running: pg_ctl -D 
/scratch/nm/farm/xlc64v16/HEAD/pgsql.build/src/test/recovery/tmp_check/t_035_standby_logical_decoding_standby_data/pgdata
 reload
server signaled
===

I've posted a tarball of the current logs at
https://drive.google.com/file/d/1JIZ5hSHBsKjEgU5WOGHOqXB7Z_-9XT5u/view?usp=sharing.
The test times out (PG_TEST_TIMEOUT_DEFAULT=5400), and uploading logs then
fails with 413 Request Entity Too Large.  Is the above
log_min_messages='debug2' important?  Removing that may make the logs small
enough to upload normally.


I think debug2 might still be useful while investigating this issue (I'll 
compare a working TAP run with this one).

Regards

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-10 Thread Noah Misch
On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote:
> --- /dev/null
> +++ b/src/test/recovery/t/035_standby_logical_decoding.pl
> @@ -0,0 +1,720 @@
> +# logical decoding on standby : test logical decoding,
> +# recovery conflict and standby promotion.
...
> +$node_primary->append_conf('postgresql.conf', q{
> +wal_level = 'logical'
> +max_replication_slots = 4
> +max_wal_senders = 4
> +log_min_messages = 'debug2'
> +log_error_verbosity = verbose
> +});

Buildfarm member hoverfly stopped reporting in when this test joined the tree.
It's currently been stuck here for 140 minutes:

===
$ tail -n5 regress_log_035_standby_logical_decoding
[02:57:48.390](0.100s) ok 66 - otherslot on standby not dropped

### Reloading node "standby"
# Running: pg_ctl -D 
/scratch/nm/farm/xlc64v16/HEAD/pgsql.build/src/test/recovery/tmp_check/t_035_standby_logical_decoding_standby_data/pgdata
 reload
server signaled
===

I've posted a tarball of the current logs at
https://drive.google.com/file/d/1JIZ5hSHBsKjEgU5WOGHOqXB7Z_-9XT5u/view?usp=sharing.
The test times out (PG_TEST_TIMEOUT_DEFAULT=5400), and uploading logs then
fails with 413 Request Entity Too Large.  Is the above
log_min_messages='debug2' important?  Removing that may make the logs small
enough to upload normally.




Re: Minimal logical decoding on standbys

2023-04-08 Thread Jonathan S. Katz

On 4/8/23 5:27 AM, Andres Freund wrote:

Hi,

On 2023-04-07 14:27:09 -0700, Andres Freund wrote:

I think I'll push these in a few hours. While this needed more changes than
I'd like shortly before the freeze, I think they're largely not in very
interesting bits and pieces - and this feature has been in the works for about
three eternities, and it is blocking a bunch of highly requested features.

If anybody still has energy, I would appreciate a look at 0001, 0002, the new
pieces I added, to make what's now 0003 and 0004 cleaner.


Pushed. Thanks all!

I squashed some of the changes. There didn't seem to be a need for a separate
stats commit, or a separate docs commit. Besides that, I did find plenty of
grammar issues, and a bunch of formatting issues.

Let's see what the buildfarm says.


Thanks to everyone for working on this feature -- this should have a big 
impact on users of logical replication!


While it still needs to get through the beta period etc. this is a big 
milestone for what's been a multi-year effort to support this.


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Minimal logical decoding on standbys

2023-04-08 Thread Andres Freund
Hi,

On 2023-04-07 14:27:09 -0700, Andres Freund wrote:
> I think I'll push these in a few hours. While this needed more changes than
> I'd like shortly before the freeze, I think they're largely not in very
> interesting bits and pieces - and this feature has been in the works for about
> three eternities, and it is blocking a bunch of highly requested features.
>
> If anybody still has energy, I would appreciate a look at 0001, 0002, the new
> pieces I added, to make what's now 0003 and 0004 cleaner.

Pushed. Thanks all!

I squashed some of the changes. There didn't seem to be a need for a separate
stats commit, or a separate docs commit. Besides that, I did find plenty of
grammar issues, and a bunch of formatting issues.

Let's see what the buildfarm says.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-08 Thread Bertrand Drouvot
Hi,

New wording works for me, thanks!

Bertrand

Le sam. 8 avr. 2023, 08:26, Andres Freund  a écrit :

> Hi,
>
> On 2023-04-07 11:12:26 -0700, Andres Freund wrote:
> > + 
> > +  
> > +   confl_active_logicalslot
> bigint
> > +  
> > +  
> > +   Number of active logical slots in this database that have been
> > +   invalidated because they conflict with recovery (note that
> inactive ones
> > +   are also invalidated but do not increment this counter)
> > +  
> > + 
> >  
> > 
> >
>
> This seems wrong to me. The counter is not for invalidated slots, it's for
> recovery conflict interrupts. If phrased that way, the parenthetical would
> be
> unnecessary.
>
> I think something like
>Number of uses of logical slots in this database that have been
>canceled due to old snapshots or a too low  linkend="guc-wal-level"/>
>on the primary
>
> would work and fit with the documentation of the other fields? Reads a bit
> stilted, but so do several of the other fields...
>
> Greetings,
>
> Andres Freund
>


Re: Minimal logical decoding on standbys

2023-04-08 Thread Andres Freund
Hi,

On 2023-04-07 11:12:26 -0700, Andres Freund wrote:
> + 
> +  
> +   confl_active_logicalslot 
> bigint
> +  
> +  
> +   Number of active logical slots in this database that have been
> +   invalidated because they conflict with recovery (note that inactive 
> ones
> +   are also invalidated but do not increment this counter)
> +  
> + 
>  
> 
>

This seems wrong to me. The counter is not for invalidated slots, it's for
recovery conflict interrupts. If phrased that way, the parenthetical would be
unnecessary.

I think something like
   Number of uses of logical slots in this database that have been
   canceled due to old snapshots or a too low 
   on the primary

would work and fit with the documentation of the other fields? Reads a bit
stilted, but so do several of the other fields...

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-07 Thread Amit Kapila
On Sat, Apr 8, 2023 at 9:31 AM Andres Freund  wrote:
>
> On 2023-04-08 09:15:05 +0530, Amit Kapila wrote:
> > The new approach for invalidation looks clean. BTW, I see minor
> > inconsistency in the following two error messages (errmsg):
>
> Thanks for checking.
>
>
> > if (MyReplicationSlot->data.invalidated == RS_INVAL_WAL)
> > ereport(ERROR,
> > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > errmsg("can no longer get changes from replication slot \"%s\"",
> > NameStr(MyReplicationSlot->data.name)),
> > errdetail("This slot has been invalidated because it exceeded the
> > maximum reserved size.")));
> >
> > if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE)
> > ereport(ERROR,
> > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > errmsg("cannot read from logical replication slot \"%s\"",
> > NameStr(MyReplicationSlot->data.name)),
> > errdetail("This slot has been invalidated because it was conflicting
> > with recovery.")));
> >
> > Won't it be better to keep the same errmsg in the above two cases?
>
> Probably - do you have a preference? I think the former is a bit better?
>

+1 for the former.

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-07 Thread Jonathan S. Katz

On 4/8/23 12:01 AM, Andres Freund wrote:

Hi,

On 2023-04-08 09:15:05 +0530, Amit Kapila wrote:

The new approach for invalidation looks clean. BTW, I see minor
inconsistency in the following two error messages (errmsg):


Thanks for checking.



if (MyReplicationSlot->data.invalidated == RS_INVAL_WAL)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("can no longer get changes from replication slot \"%s\"",
NameStr(MyReplicationSlot->data.name)),
errdetail("This slot has been invalidated because it exceeded the
maximum reserved size.")));

if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot read from logical replication slot \"%s\"",
NameStr(MyReplicationSlot->data.name)),
errdetail("This slot has been invalidated because it was conflicting
with recovery.")));

Won't it be better to keep the same errmsg in the above two cases?


Probably - do you have a preference? I think the former is a bit better?


+1 for the former, though perhaps "receive" instead of "get?"

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Minimal logical decoding on standbys

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-08 09:15:05 +0530, Amit Kapila wrote:
> The new approach for invalidation looks clean. BTW, I see minor
> inconsistency in the following two error messages (errmsg):

Thanks for checking.


> if (MyReplicationSlot->data.invalidated == RS_INVAL_WAL)
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> errmsg("can no longer get changes from replication slot \"%s\"",
> NameStr(MyReplicationSlot->data.name)),
> errdetail("This slot has been invalidated because it exceeded the
> maximum reserved size.")));
> 
> if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE)
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> errmsg("cannot read from logical replication slot \"%s\"",
> NameStr(MyReplicationSlot->data.name)),
> errdetail("This slot has been invalidated because it was conflicting
> with recovery.")));
> 
> Won't it be better to keep the same errmsg in the above two cases?

Probably - do you have a preference? I think the former is a bit better?

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-07 Thread Amit Kapila
On Fri, Apr 7, 2023 at 11:42 PM Andres Freund  wrote:
>
> On 2023-04-07 08:47:57 -0700, Andres Freund wrote:
> > Integrated all of these.
>
> Here's my current version. Changes:
> - Integrated Bertrand's changes
> - polished commit messages of 0001-0003
> - edited code comments for 0003, including
>   InvalidateObsoleteReplicationSlots()'s header
> - added a bump of SLOT_VERSION to 0001
> - moved addition of pg_log_standby_snapshot() to 0007
> - added a catversion bump for pg_log_standby_snapshot()
> - moved all the bits dealing with procsignals from 0003 to 0004, now the split
>   makes sense IMO
> - combined a few more sucessive ->safe_psql() calls
>

The new approach for invalidation looks clean. BTW, I see minor
inconsistency in the following two error messages (errmsg):

if (MyReplicationSlot->data.invalidated == RS_INVAL_WAL)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("can no longer get changes from replication slot \"%s\"",
NameStr(MyReplicationSlot->data.name)),
errdetail("This slot has been invalidated because it exceeded the
maximum reserved size.")));

if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot read from logical replication slot \"%s\"",
NameStr(MyReplicationSlot->data.name)),
errdetail("This slot has been invalidated because it was conflicting
with recovery.")));

Won't it be better to keep the same errmsg in the above two cases?

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 18:32:04 -0400, Melanie Plageman wrote:
> Code review only of 0001-0005.
> 
> I noticed you had two 0008, btw.

Yea, sorry for that. One was the older version. Just before sending the patch
I saw another occurance of a test failure, which I then fixed. In the course
of that I changed something in the patch subject.


> On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote:
> > Hi,
> > 
> > On 2023-04-07 08:47:57 -0700, Andres Freund wrote:
> > > Integrated all of these.
> > 
> > From 0e038eb5dfddec500fbf4625775d1fa508a208f6 Mon Sep 17 00:00:00 2001
> > From: Andres Freund 
> > Date: Thu, 6 Apr 2023 20:00:07 -0700
> > Subject: [PATCH va67 1/9] Replace a replication slot's invalidated_at LSN 
> > with
> >  an enum
> > 
> > diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
> > index 8872c80cdfe..ebcb637baed 100644
> > --- a/src/include/replication/slot.h
> > +++ b/src/include/replication/slot.h
> > @@ -37,6 +37,17 @@ typedef enum ReplicationSlotPersistency
> > RS_TEMPORARY
> >  } ReplicationSlotPersistency;
> >  
> > +/*
> > + * Slots can be invalidated, e.g. due to max_slot_wal_keep_size. If so, the
> > + * 'invalidated' field is set to a value other than _NONE.
> > + */
> > +typedef enum ReplicationSlotInvalidationCause
> > +{
> > +   RS_INVAL_NONE,
> > +   /* required WAL has been removed */
> 
> I just wonder if RS_INVAL_WAL is too generic. Something like
> RS_INVAL_WAL_MISSING or similar may be better since it seems there are
> other inavlidation causes that may be related to WAL.

Renamed to RS_INVAL_WAL_REMOVED



> > From 52c25cc15abc4470d19e305d245b9362e6b8d6a3 Mon Sep 17 00:00:00 2001
> > From: Andres Freund 
> > Date: Fri, 7 Apr 2023 09:32:48 -0700
> > Subject: [PATCH va67 3/9] Support invalidating replication slots due to
> >  horizon and wal_level
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> > 
> > Needed for supporting logical decoding on a standby. The new invalidation
> > methods will be used in a subsequent commit.
> > 
> 
> You probably are aware, but applying 0003 and 0004 both gives me two
> warnings:
> 
> warning: 1 line adds whitespace errors.
> Warning: commit message did not conform to UTF-8.
> You may want to amend it after fixing the message, or set the config
> variable i18n.commitEncoding to the encoding your project uses.

I did see the whitespace error, but not the encoding error. We have a bunch of
other commit messages with Fabrizio's name "properly spelled" in, so I think
that's ok.



> > diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
> > index df23b7ed31e..c2a9accebf6 100644
> > --- a/src/backend/replication/slot.c
> > +++ b/src/backend/replication/slot.c
> > @@ -1241,8 +1241,58 @@ ReplicationSlotReserveWal(void)
> >  }
> >  
> >  /*
> > - * Helper for InvalidateObsoleteReplicationSlots -- acquires the given slot
> > - * and mark it invalid, if necessary and possible.
> > + * Report that replication slot needs to be invalidated
> > + */
> > +static void
> > +ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
> > +  bool terminating,
> > +  int pid,
> > +  NameData slotname,
> > +  XLogRecPtr restart_lsn,
> > +  XLogRecPtr oldestLSN,
> > +  TransactionId 
> > snapshotConflictHorizon)
> > +{
> > +   StringInfoData err_detail;
> > +   boolhint = false;
> > +
> > +   initStringInfo(_detail);
> > +
> > +   switch (cause)
> > +   {
> > +   case RS_INVAL_WAL:
> > +   hint = true;
> > +   appendStringInfo(_detail, _("The slot's restart_lsn 
> > %X/%X exceeds the limit by %llu bytes."),
> > +
> > LSN_FORMAT_ARGS(restart_lsn),
> 
> I'm not sure what the below cast is meant to do. If you are trying to
> protect against overflow/underflow, I think you'd need to cast before
> doing the subtraction.


> > +(unsigned long long) 
> > (oldestLSN - restart_lsn));
> > +   break;

That's our current way of passing 64bit numbers to format string
functions. It ends up as a 64bit number everywhere, even windows (with its
stupid ILP32 model).



> > +   case RS_INVAL_HORIZON:
> > +   appendStringInfo(_detail, _("The slot conflicted 
> > with xid horizon %u."),
> > +
> > snapshotConflictHorizon);
> > +   break;
> > +
> > +   case RS_INVAL_WAL_LEVEL:
> > +   appendStringInfo(_detail, _("Logical decoding on 
> > standby requires wal_level to be at least logical on the primary server"));
> > +   break;
> > +   case 

Re: Minimal logical decoding on standbys

2023-04-07 Thread Melanie Plageman
Code review only of 0001-0005.

I noticed you had two 0008, btw.

On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2023-04-07 08:47:57 -0700, Andres Freund wrote:
> > Integrated all of these.
> 
> From 0e038eb5dfddec500fbf4625775d1fa508a208f6 Mon Sep 17 00:00:00 2001
> From: Andres Freund 
> Date: Thu, 6 Apr 2023 20:00:07 -0700
> Subject: [PATCH va67 1/9] Replace a replication slot's invalidated_at LSN with
>  an enum
> 
> diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
> index 8872c80cdfe..ebcb637baed 100644
> --- a/src/include/replication/slot.h
> +++ b/src/include/replication/slot.h
> @@ -37,6 +37,17 @@ typedef enum ReplicationSlotPersistency
>   RS_TEMPORARY
>  } ReplicationSlotPersistency;
>  
> +/*
> + * Slots can be invalidated, e.g. due to max_slot_wal_keep_size. If so, the
> + * 'invalidated' field is set to a value other than _NONE.
> + */
> +typedef enum ReplicationSlotInvalidationCause
> +{
> + RS_INVAL_NONE,
> + /* required WAL has been removed */

I just wonder if RS_INVAL_WAL is too generic. Something like
RS_INVAL_WAL_MISSING or similar may be better since it seems there are
other inavlidation causes that may be related to WAL.

> + RS_INVAL_WAL,
> +} ReplicationSlotInvalidationCause;
> +

0002 LGTM

> From 52c25cc15abc4470d19e305d245b9362e6b8d6a3 Mon Sep 17 00:00:00 2001
> From: Andres Freund 
> Date: Fri, 7 Apr 2023 09:32:48 -0700
> Subject: [PATCH va67 3/9] Support invalidating replication slots due to
>  horizon and wal_level
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Needed for supporting logical decoding on a standby. The new invalidation
> methods will be used in a subsequent commit.
> 

You probably are aware, but applying 0003 and 0004 both gives me two
warnings:

warning: 1 line adds whitespace errors.
Warning: commit message did not conform to UTF-8.
You may want to amend it after fixing the message, or set the config
variable i18n.commitEncoding to the encoding your project uses.

> diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
> index df23b7ed31e..c2a9accebf6 100644
> --- a/src/backend/replication/slot.c
> +++ b/src/backend/replication/slot.c
> @@ -1241,8 +1241,58 @@ ReplicationSlotReserveWal(void)
>  }
>  
>  /*
> - * Helper for InvalidateObsoleteReplicationSlots -- acquires the given slot
> - * and mark it invalid, if necessary and possible.
> + * Report that replication slot needs to be invalidated
> + */
> +static void
> +ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
> +bool terminating,
> +int pid,
> +NameData slotname,
> +XLogRecPtr restart_lsn,
> +XLogRecPtr oldestLSN,
> +TransactionId 
> snapshotConflictHorizon)
> +{
> + StringInfoData err_detail;
> + boolhint = false;
> +
> + initStringInfo(_detail);
> +
> + switch (cause)
> + {
> + case RS_INVAL_WAL:
> + hint = true;
> + appendStringInfo(_detail, _("The slot's restart_lsn 
> %X/%X exceeds the limit by %llu bytes."),
> +  
> LSN_FORMAT_ARGS(restart_lsn),

I'm not sure what the below cast is meant to do. If you are trying to
protect against overflow/underflow, I think you'd need to cast before
doing the subtraction.

> +  (unsigned long long) 
> (oldestLSN - restart_lsn));
> + break;
> + case RS_INVAL_HORIZON:
> + appendStringInfo(_detail, _("The slot conflicted 
> with xid horizon %u."),
> +  
> snapshotConflictHorizon);
> + break;
> +
> + case RS_INVAL_WAL_LEVEL:
> + appendStringInfo(_detail, _("Logical decoding on 
> standby requires wal_level to be at least logical on the primary server"));
> + break;
> + case RS_INVAL_NONE:
> + pg_unreachable();
> + }

This ereport is quite hard to read. Is there any simplification you can
do of the ternaries without undue duplication?

> + ereport(LOG,
> + terminating ?
> + errmsg("terminating process %d to release replication 
> slot \"%s\"",
> +pid, NameStr(slotname)) :
> + errmsg("invalidating obsolete replication slot \"%s\"",
> +NameStr(slotname)),
> + errdetail_internal("%s", err_detail.data),
> + hint ? errhint("You might need to increase 
> max_slot_wal_keep_size.") : 0);
> +
> + 

Re: Minimal logical decoding on standbys

2023-04-07 Thread Alvaro Herrera
I gave a very quick look at 0001 and 0003.  I find no fault with 0001.
It was clear back when we added that stuff that invalidated_at was not
terribly useful -- I was just too conservative to not have it -- but now
that a lot of time has passed and we haven't done anything with it,
removing it seems perfectly OK.

As for 0003, I have no further concerns about the translatability.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)




Re: Minimal logical decoding on standbys

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 22:54:01 +0200, Drouvot, Bertrand wrote:
> That looks good to me

Cool.

I think I'll push these in a few hours. While this needed more changes than
I'd like shortly before the freeze, I think they're largely not in very
interesting bits and pieces - and this feature has been in the works for about
three eternities, and it is blocking a bunch of highly requested features.

If anybody still has energy, I would appreciate a look at 0001, 0002, the new
pieces I added, to make what's now 0003 and 0004 cleaner.


> 0005 is missing author/reviewer, I'd propose:
> [...]

Thanks, I'll integrate them...


> It's hard (given the amount of emails that have been send during all this 
> time),

Indeed.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-07 Thread Drouvot, Bertrand

Hi,

On 4/7/23 8:24 PM, Drouvot, Bertrand wrote:

Hi,

On 4/7/23 5:47 PM, Andres Freund wrote:

Hi,


- write a test that invalidated logical slots do not lead to retaining WAL


I'm not sure how to do that since pg_switch_wal() and friends can't be executed 
on
a standby.


You can do it on the primary and wait for the records to have been applied.



Thanks, will give it a try in a couple of hours.


I looked at it but I think we'd also need things like pg_walfile_name() on the 
standby but is not allowed.


Is this patchset sufficient to subscribe to a publication on a physical
standby, assuming the publication is created on the primary? If so, we should
have at least a minimal test. If not, we should note that restriction
explicitly.


I gave it a try and it does work.

"
node3 subscribes to node2 (standby).
Insert done in node1 (primary) where the publication is created => node3 see 
the changes.
"

I started to create the TAP test but currently stuck as the "create 
subscription" waits for a checkpoint/pg_log_standby_snapshot() on the primary.

So, trying to make use of things like:

"my %psql_subscriber = ('stdin' => '', 'stdout' => '');
$psql_subscriber{run} =
   $node_subscriber->background_psql('postgres', \$psql_subscriber{stdin},
     \$psql_subscriber{stdout},
     $psql_timeout);
$psql_subscriber{stdout} = '';
"

But in vain so far...



please find attached sub_in_progress.patch that "should work" but "does not" 
because
the wait_for_subscription_sync() call produces:

"
error running SQL: 'psql::1: ERROR:  recovery is in progress
HINT:  WAL control functions cannot be executed during recovery.'
while running 'psql -XAtq -d port=61441 host=/tmp/45dt3wqs2p dbname='postgres' 
-f - -v ON_ERROR_STOP=1' with sql 'SELECT pg_current_wal_lsn()'
"

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index 561dcd33c3..c3c0e718c8 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -8,14 +8,18 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
-my ($stdin, $stdout, $stderr, $cascading_stdout, $cascading_stderr, $ret, 
$handle, $slot);
+my ($stdin, $stdout, $stderr, $cascading_stdout, $cascading_stderr, 
$subscriber_stdin, $subscriber_stdout, $subscriber_stderr, $ret, $handle, 
$slot);
 
 my $node_primary = PostgreSQL::Test::Cluster->new('primary');
 my $node_standby = PostgreSQL::Test::Cluster->new('standby');
 my $node_cascading_standby = 
PostgreSQL::Test::Cluster->new('cascading_standby');
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
 my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
+my $psql_timeout =
+  IPC::Run::timer(2 * $PostgreSQL::Test::Utils::timeout_default);
 my $res;
 
+
 # Name for the physical slot on primary
 my $primary_slotname = 'primary_physical';
 my $standby_physical_slotname = 'standby_physical';
@@ -263,6 +267,7 @@ $node_standby->init_from_backup(
has_restoring => 1);
 $node_standby->append_conf('postgresql.conf',
qq[primary_slot_name = '$primary_slotname']);
+$node_standby->append_conf('postgresql.conf', 'max_replication_slots = 6');
 $node_standby->start;
 $node_primary->wait_for_replay_catchup($node_standby);
 $node_standby->safe_psql('testdb', qq[SELECT * FROM 
pg_create_physical_replication_slot('$standby_physical_slotname');]);
@@ -280,6 +285,20 @@ $node_cascading_standby->append_conf('postgresql.conf',
 $node_cascading_standby->start;
 $node_standby->wait_for_replay_catchup($node_cascading_standby, $node_primary);
 
+###
+# Initialize subscriber node
+###
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf('postgresql.conf', 'max_replication_slots = 4');
+$node_subscriber->start;
+
+my %psql_subscriber = ('subscriber_stdin' => '', 'subscriber_stdout' => '');
+$psql_subscriber{run} =
+  $node_subscriber->background_psql('postgres', 
\$psql_subscriber{subscriber_stdin},
+\$psql_subscriber{subscriber_stdout},
+$psql_timeout);
+$psql_subscriber{subscriber_stdout} = '';
+
 ##
 # Test that logical decoding on the standby
 # behaves correctly.
@@ -360,6 +379,43 @@ is( $node_primary->psql(
 3,
 'replaying logical slot from another database fails');
 
+##
+# Test that we can subscribe on the standby with the publication
+# created on the primary.
+##
+
+# Create a table on the primary
+$node_primary->safe_psql('postgres', "CREATE TABLE tab_rep (a int primary 
key)");
+
+# Create a table (same structure) on the subscriber node
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_rep (a int 

Re: Minimal logical decoding on standbys

2023-04-07 Thread Drouvot, Bertrand

Hi,

On 4/7/23 8:27 PM, Drouvot, Bertrand wrote:

Hi,




I think some of the patches might have more reviewers than really applicable,
and might also miss some. I'd appreciate if you could go over that...



Sure, will do in a couple of hours.



That looks good to me, just few remarks:

0005 is missing author/reviewer, I'd propose:

Author: "Drouvot, Bertrand" 
Author: Andres Freund 
Reviewed-by: Andres Freund 
Reviewed-by: Robert Haas 
Reviewed-by: Amit Kapila 
Discussion: 
https://postgr.es/m/20230407075009.igg7be27ha2ht...@awork3.anarazel.de

0006, I'd propose:

Author: "Drouvot, Bertrand" 
Reviewed-By: Jeff Davis 
Reviewed-By: Robert Haas 
Reviewed-by: Amit Kapila 
Reviewed-by: Masahiko Sawada 

0007, I'd propose:

Author: "Drouvot, Bertrand" 
Author: Andres Freund 
Author: Amit Khandekar  (in an older version)
Reviewed-by: FabrÌzio de Royes Mello 
Reviewed-by: Amit Kapila 
Reviewed-By: Robert Haas 

0009, I'd propose:

Author: "Drouvot, Bertrand" 
Author: Andres Freund 
Author: Amit Khandekar  (in an older version)
Reviewed-by: FabrÌzio de Royes Mello 
Reviewed-by: Amit Kapila 
Reviewed-By: Robert Haas 

It's hard (given the amount of emails that have been send during all this time),
but I do hope it's correct and that nobody is missing.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-07 Thread Drouvot, Bertrand

Hi,

On 4/7/23 8:12 PM, Andres Freund wrote:

Hi,

On 2023-04-07 08:47:57 -0700, Andres Freund wrote:

Integrated all of these.


Here's my current version. Changes:
- Integrated Bertrand's changes
- polished commit messages of 0001-0003
- edited code comments for 0003, including
   InvalidateObsoleteReplicationSlots()'s header
- added a bump of SLOT_VERSION to 0001
- moved addition of pg_log_standby_snapshot() to 0007
- added a catversion bump for pg_log_standby_snapshot()
- moved all the bits dealing with procsignals from 0003 to 0004, now the split
   makes sense IMO
- combined a few more sucessive ->safe_psql() calls



Thanks!


I see occasional failures in the tests, particularly in the new test using
pg_authid, but not solely. cfbot also seems to have seen these:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3740

I made a bogus attempt at a workaround for the pg_authid case last night. But
that didn't actually fix anything, it just changed the timing.

I think the issue is that VACUUM does not force WAL to be flushed at the end
(since it does not assign an xid). wait_for_replay_catchup() uses
$node->lsn('flush'), which, due to VACUUM not flushing, can be an LSN from
before VACUUM completed.

The problem can be made more likely by adding pg_usleep(100); before
walwriter.c's call to XLogBackgroundFlush().

We probably should introduce some infrastructure in Cluster.pm for this, but
for now I just added a 'flush_wal' table that we insert into after a
VACUUM. That guarantees a WAL flush.



Ack for the Cluster.pm "improvement" and thanks for the "workaround"!


I think some of the patches might have more reviewers than really applicable,
and might also miss some. I'd appreciate if you could go over that...



Sure, will do in a couple of hours.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-07 Thread Drouvot, Bertrand

Hi,

On 4/7/23 5:47 PM, Andres Freund wrote:

Hi,


- write a test that invalidated logical slots do not lead to retaining WAL


I'm not sure how to do that since pg_switch_wal() and friends can't be executed 
on
a standby.


You can do it on the primary and wait for the records to have been applied.



Thanks, will give it a try in a couple of hours.




- Further evolve the API of InvalidateObsoleteReplicationSlots()
- pass in the ReplicationSlotInvalidationCause we're trying to conflict on?
- rename xid to snapshotConflictHorizon, that'd be more in line with the
  ResolveRecoveryConflictWithSnapshot and easier to understand, I think



Done. The new API can be found in 
v65-66-InvalidateObsoleteReplicationSlots_API.patch
attached. It propagates the cause to InvalidatePossiblyObsoleteSlot() where a 
switch/case
can now be used.


Integrated. I moved the cause to the first argument, makes more sense to me
that way.


thanks!



I made it an error - it's a programming error, not some data level
inconsistency if that ever happens.


okay, makes sense.


Integrated all of these.


Thanks!




I think pg_log_standby_snapshot() should be added in "Allow logical decoding
on standby", not the commit adding the tests.


Yeah, that's a good point, I do agree.



Is this patchset sufficient to subscribe to a publication on a physical
standby, assuming the publication is created on the primary? If so, we should
have at least a minimal test. If not, we should note that restriction
explicitly.


I gave it a try and it does work.

"
node3 subscribes to node2 (standby).
Insert done in node1 (primary) where the publication is created => node3 see 
the changes.
"

I started to create the TAP test but currently stuck as the "create 
subscription" waits for a checkpoint/pg_log_standby_snapshot() on the primary.

So, trying to make use of things like:

"my %psql_subscriber = ('stdin' => '', 'stdout' => '');
$psql_subscriber{run} =
  $node_subscriber->background_psql('postgres', \$psql_subscriber{stdin},
\$psql_subscriber{stdout},
$psql_timeout);
$psql_subscriber{stdout} = '';
"

But in vain so far...

Will resume working on it in a couple of hours.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 08:47:57 -0700, Andres Freund wrote:
> Integrated all of these.

Here's my current version. Changes:
- Integrated Bertrand's changes
- polished commit messages of 0001-0003
- edited code comments for 0003, including
  InvalidateObsoleteReplicationSlots()'s header
- added a bump of SLOT_VERSION to 0001
- moved addition of pg_log_standby_snapshot() to 0007
- added a catversion bump for pg_log_standby_snapshot()
- moved all the bits dealing with procsignals from 0003 to 0004, now the split
  makes sense IMO
- combined a few more sucessive ->safe_psql() calls

I see occasional failures in the tests, particularly in the new test using
pg_authid, but not solely. cfbot also seems to have seen these:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3740

I made a bogus attempt at a workaround for the pg_authid case last night. But
that didn't actually fix anything, it just changed the timing.

I think the issue is that VACUUM does not force WAL to be flushed at the end
(since it does not assign an xid). wait_for_replay_catchup() uses
$node->lsn('flush'), which, due to VACUUM not flushing, can be an LSN from
before VACUUM completed.

The problem can be made more likely by adding pg_usleep(100); before
walwriter.c's call to XLogBackgroundFlush().

We probably should introduce some infrastructure in Cluster.pm for this, but
for now I just added a 'flush_wal' table that we insert into after a
VACUUM. That guarantees a WAL flush.


I think some of the patches might have more reviewers than really applicable,
and might also miss some. I'd appreciate if you could go over that...

Greetings,

Andres Freund
>From 0e038eb5dfddec500fbf4625775d1fa508a208f6 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 6 Apr 2023 20:00:07 -0700
Subject: [PATCH va67 1/9] Replace a replication slot's invalidated_at LSN with
 an enum

This is mainly useful because the upcoming logical-decoding-on-standby feature
adds further reasons for invalidating slots, and we don't want to end up with
multiple invalidated_* fields, or check different attributes.

Eventually we should consider not resetting restart_lsn when invalidating a
slot due to max_slot_wal_keep_size. But that's a user visible change, so left
for later.

Increases SLOT_VERSION, due to the changed field (with a different alignment,
no less).

Reviewed-by: "Drouvot, Bertrand" 
Discussion: https://postgr.es/m/20230407075009.igg7be27ha2ht...@awork3.anarazel.de
---
 src/include/replication/slot.h  | 15 +--
 src/backend/replication/slot.c  | 28 
 src/backend/replication/slotfuncs.c |  8 +++-
 src/tools/pgindent/typedefs.list|  1 +
 4 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8872c80cdfe..ebcb637baed 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -37,6 +37,17 @@ typedef enum ReplicationSlotPersistency
 	RS_TEMPORARY
 } ReplicationSlotPersistency;
 
+/*
+ * Slots can be invalidated, e.g. due to max_slot_wal_keep_size. If so, the
+ * 'invalidated' field is set to a value other than _NONE.
+ */
+typedef enum ReplicationSlotInvalidationCause
+{
+	RS_INVAL_NONE,
+	/* required WAL has been removed */
+	RS_INVAL_WAL,
+} ReplicationSlotInvalidationCause;
+
 /*
  * On-Disk data of a replication slot, preserved across restarts.
  */
@@ -72,8 +83,8 @@ typedef struct ReplicationSlotPersistentData
 	/* oldest LSN that might be required by this replication slot */
 	XLogRecPtr	restart_lsn;
 
-	/* restart_lsn is copied here when the slot is invalidated */
-	XLogRecPtr	invalidated_at;
+	/* RS_INVAL_NONE if valid, or the reason for having been invalidated */
+	ReplicationSlotInvalidationCause invalidated;
 
 	/*
 	 * Oldest LSN that the client has acked receipt for.  This is used as the
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 2293c0c6fc3..df23b7ed31e 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -89,7 +89,7 @@ typedef struct ReplicationSlotOnDisk
 	sizeof(ReplicationSlotOnDisk) - ReplicationSlotOnDiskConstantSize
 
 #define SLOT_MAGIC		0x1051CA1	/* format identifier */
-#define SLOT_VERSION	2		/* version for new files */
+#define SLOT_VERSION	3		/* version for new files */
 
 /* Control array for replication slot management */
 ReplicationSlotCtlData *ReplicationSlotCtl = NULL;
@@ -855,8 +855,7 @@ ReplicationSlotsComputeRequiredXmin(bool already_locked)
 		SpinLockAcquire(>mutex);
 		effective_xmin = s->effective_xmin;
 		effective_catalog_xmin = s->effective_catalog_xmin;
-		invalidated = (!XLogRecPtrIsInvalid(s->data.invalidated_at) &&
-	   XLogRecPtrIsInvalid(s->data.restart_lsn));
+		invalidated = s->data.invalidated != RS_INVAL_NONE;
 		SpinLockRelease(>mutex);
 
 		/* invalidated slots need not apply */
@@ -901,14 +900,20 @@ ReplicationSlotsComputeRequiredLSN(void)
 	{
 		

Re: Minimal logical decoding on standbys

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 17:13:13 +0200, Drouvot, Bertrand wrote:
> On 4/7/23 9:50 AM, Andres Freund wrote:
> > I added a check for !invalidated to
> > ReplicationSlotsComputeRequiredLSN() etc.
> > 
> 
> looked at 65-0001 and it looks good to me.
> 
> > Added new patch moving checks for invalid logical slots into
> > CreateDecodingContext(). Otherwise we end up with 5 or so checks, which 
> > makes
> > no sense. As far as I can tell the old message in
> > pg_logical_slot_get_changes_guts() was bogus, one couldn't get there having
> > "never previously reserved WAL"
> > 
> 
> looked at 65-0002 and it looks good to me.
> 
> > Split "Handle logical slot conflicts on standby." into two. I'm not sure 
> > that
> > should stay that way, but it made it easier to hack on
> > InvalidateObsoleteReplicationSlots.
> > 
> 
> looked at 65-0003 and the others.

Thanks for checking!


> > Todo:
> > - write a test that invalidated logical slots stay invalidated across a 
> > restart
> 
> Done in 65-66-0008 attached.

Cool.


> > - write a test that invalidated logical slots do not lead to retaining WAL
> 
> I'm not sure how to do that since pg_switch_wal() and friends can't be 
> executed on
> a standby.

You can do it on the primary and wait for the records to have been applied.


> > - Further evolve the API of InvalidateObsoleteReplicationSlots()
> >- pass in the ReplicationSlotInvalidationCause we're trying to conflict 
> > on?
> >- rename xid to snapshotConflictHorizon, that'd be more in line with the
> >  ResolveRecoveryConflictWithSnapshot and easier to understand, I think
> > 
> 
> Done. The new API can be found in 
> v65-66-InvalidateObsoleteReplicationSlots_API.patch
> attached. It propagates the cause to InvalidatePossiblyObsoleteSlot() where a 
> switch/case
> can now be used.

Integrated. I moved the cause to the first argument, makes more sense to me
that way.


> The "default" case does not emit an error since this code runs as part
> of checkpoint.

I made it an error - it's a programming error, not some data level
inconsistency if that ever happens.


> > - The test could stand a bit of cleanup and consolidation
> >- No need to start 4 psql processes to do 4 updates, just do it in one
> >  safe_psql()
> 
> Right, done in v65-66-0008-New-TAP-test-for-logical-decoding-on-standby.patch 
> attached.

> >- the sequence of drop_logical_slots(), create_logical_slots(),
> >  change_hot_standby_feedback_and_wait_for_xmins(), make_slot_active() is
> >  repeated quite a few times
> 
> grouped in reactive_slots_change_hfs_and_wait_for_xmins() in 65-66-0008 
> attached.
> 
> >- the stats queries checking for specific conflict counts, including
> >  preceding tests, is pretty painful. I suggest to reset the stats at the
> >  end of the test instead (likely also do the drop_logical_slot() there).
> 
> Good idea, done in 65-66-0008 attached.
> 
> >- it's hard to correlate postgres log and the tap test, because the slots
> >  are named the same across all tests. Perhaps they could have a per-test
> >  prefix?
> 
> Good point. Done in 65-66-0008 attached. Thanks to that and the stats reset 
> the
> check for invalidation is now done in a single function 
> "check_for_invalidation" that looks
> for invalidation messages in the logfile and in pg_stat_database_conflicts.
> 
> Thanks for the suggestions: the TAP test is now easier to read/understand.

Integrated all of these.


I think pg_log_standby_snapshot() should be added in "Allow logical decoding
on standby", not the commit adding the tests.


Is this patchset sufficient to subscribe to a publication on a physical
standby, assuming the publication is created on the primary? If so, we should
have at least a minimal test. If not, we should note that restriction
explicitly.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-07 Thread Drouvot, Bertrand

Hi,

On 4/7/23 9:50 AM, Andres Freund wrote:

Hi,
Here's my current working state - I'll go to bed soon.


Thanks a lot for this Andres!



Changes:

- shared catalog relations weren't handled correctly, because the dboid is
   InvalidOid for them. I wrote a test for that as well.

- ReplicationSlotsComputeRequiredXmin() took invalidated logical slots into
   account (ReplicationSlotsComputeLogicalRestartLSN() too, but it never looks
   at logical slots)

- I don't think the subset of slot xids that were checked when invalidating
   was right. We need to check effective_xmin and effective_catalog_xmin - the
   latter was using catalog_xmin.

- similarly, it wasn't right that specifically those two fields were
   overwritten when invalidated - as that was done, I suspect the changes might
   get lost on a restart...

- As mentioned previously, I did not like all the functions in slot.h, nor
   their naming. Not yet quite finished with that, but a good bit further

- There were a lot of unrelated changes, e.g. removing comments like
  * NB - this runs as part of checkpoint, so avoid raising errors if possible.

- I still don't like the order of the patches, fixing the walsender patches
   after introducing support for logical decoding on standby. Reordered.

- I don't think logical slots being invalidated as checked e.g. in
   pg_logical_replication_slot_advance()

- I didn't like much that InvalidatePossiblyObsoleteSlot() switched between
   kill() and SendProcSignal() based on the "conflict". There very well could
   be reasons to use InvalidatePossiblyObsoleteSlot() with an xid from outside
   of the startup process in the future. Instead I made it differentiate based
   on MyBackendType == B_STARTUP.



Thanks for all of this and the above explanations.



I also:

Added new patch that replaces invalidated_at with a new enum, 'invalidated',
listing the reason for the invalidation.


Yeah, that's a great idea.


I added a check for !invalidated to
ReplicationSlotsComputeRequiredLSN() etc.



looked at 65-0001 and it looks good to me.


Added new patch moving checks for invalid logical slots into
CreateDecodingContext(). Otherwise we end up with 5 or so checks, which makes
no sense. As far as I can tell the old message in
pg_logical_slot_get_changes_guts() was bogus, one couldn't get there having
"never previously reserved WAL"



looked at 65-0002 and it looks good to me.


Split "Handle logical slot conflicts on standby." into two. I'm not sure that
should stay that way, but it made it easier to hack on
InvalidateObsoleteReplicationSlots.



looked at 65-0003 and the others.

It's easier to understand/read the code now that the 
ReplicationSlotInvalidationCause
enum has been created and that data.invalidated also make use of the enum. It does 
"simplify"
the review and that looks good to me.



Todo:
- write a test that invalidated logical slots stay invalidated across a restart


Done in 65-66-0008 attached.


- write a test that invalidated logical slots do not lead to retaining WAL


I'm not sure how to do that since pg_switch_wal() and friends can't be executed 
on
a standby.


- Further evolve the API of InvalidateObsoleteReplicationSlots()
   - pass in the ReplicationSlotInvalidationCause we're trying to conflict on?
   - rename xid to snapshotConflictHorizon, that'd be more in line with the
 ResolveRecoveryConflictWithSnapshot and easier to understand, I think



Done. The new API can be found in 
v65-66-InvalidateObsoleteReplicationSlots_API.patch
attached. It propagates the cause to InvalidatePossiblyObsoleteSlot() where a 
switch/case
can now be used. The "default" case does not emit an error since this code runs 
as part
of checkpoint.


- The test could stand a bit of cleanup and consolidation
   - No need to start 4 psql processes to do 4 updates, just do it in one
 safe_psql()


Right, done in v65-66-0008-New-TAP-test-for-logical-decoding-on-standby.patch 
attached.


   - the sequence of drop_logical_slots(), create_logical_slots(),
 change_hot_standby_feedback_and_wait_for_xmins(), make_slot_active() is
 repeated quite a few times


grouped in reactive_slots_change_hfs_and_wait_for_xmins() in 65-66-0008 
attached.


   - the stats queries checking for specific conflict counts, including
 preceding tests, is pretty painful. I suggest to reset the stats at the
 end of the test instead (likely also do the drop_logical_slot() there).


Good idea, done in 65-66-0008 attached.


   - it's hard to correlate postgres log and the tap test, because the slots
 are named the same across all tests. Perhaps they could have a per-test
 prefix?


Good point. Done in 65-66-0008 attached. Thanks to that and the stats reset the
check for invalidation is now done in a single function 
"check_for_invalidation" that looks
for invalidation messages in the logfile and in pg_stat_database_conflicts.

Thanks for the suggestions: the TAP test is now easier to read/understand.


   

Re: Minimal logical decoding on standbys

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 08:09:50 +0200, Drouvot, Bertrand wrote:
> Hi,
>
> On 4/7/23 7:56 AM, Andres Freund wrote:
> > Hi,
> >
> > On 2023-04-07 07:02:04 +0200, Drouvot, Bertrand wrote:
> > > Done in V63 attached and did change the associated comment a bit.
> >
> > Can you send your changes incrementally, relative to V62? I'm polishing them
> > right now, and that'd make it a lot easier to apply your changes ontop.
> >
>
> Sure, please find them enclosed.

Thanks.


Here's my current working state - I'll go to bed soon.

Changes:

- shared catalog relations weren't handled correctly, because the dboid is
  InvalidOid for them. I wrote a test for that as well.

- ReplicationSlotsComputeRequiredXmin() took invalidated logical slots into
  account (ReplicationSlotsComputeLogicalRestartLSN() too, but it never looks
  at logical slots)

- I don't think the subset of slot xids that were checked when invalidating
  was right. We need to check effective_xmin and effective_catalog_xmin - the
  latter was using catalog_xmin.

- similarly, it wasn't right that specifically those two fields were
  overwritten when invalidated - as that was done, I suspect the changes might
  get lost on a restart...

- As mentioned previously, I did not like all the functions in slot.h, nor
  their naming. Not yet quite finished with that, but a good bit further

- There were a lot of unrelated changes, e.g. removing comments like
 * NB - this runs as part of checkpoint, so avoid raising errors if possible.

- I still don't like the order of the patches, fixing the walsender patches
  after introducing support for logical decoding on standby. Reordered.

- I don't think logical slots being invalidated as checked e.g. in
  pg_logical_replication_slot_advance()

- I didn't like much that InvalidatePossiblyObsoleteSlot() switched between
  kill() and SendProcSignal() based on the "conflict". There very well could
  be reasons to use InvalidatePossiblyObsoleteSlot() with an xid from outside
  of the startup process in the future. Instead I made it differentiate based
  on MyBackendType == B_STARTUP.


I also:

Added new patch that replaces invalidated_at with a new enum, 'invalidated',
listing the reason for the invalidation. I added a check for !invalidated to
ReplicationSlotsComputeRequiredLSN() etc.

Added new patch moving checks for invalid logical slots into
CreateDecodingContext(). Otherwise we end up with 5 or so checks, which makes
no sense. As far as I can tell the old message in
pg_logical_slot_get_changes_guts() was bogus, one couldn't get there having
"never previously reserved WAL"

Split "Handle logical slot conflicts on standby." into two. I'm not sure that
should stay that way, but it made it easier to hack on
InvalidateObsoleteReplicationSlots.


Todo:
- write a test that invalidated logical slots stay invalidated across a restart
- write a test that invalidated logical slots do not lead to retaining WAL
- Further evolve the API of InvalidateObsoleteReplicationSlots()
  - pass in the ReplicationSlotInvalidationCause we're trying to conflict on?
  - rename xid to snapshotConflictHorizon, that'd be more in line with the
ResolveRecoveryConflictWithSnapshot and easier to understand, I think

- The test could stand a bit of cleanup and consolidation
  - No need to start 4 psql processes to do 4 updates, just do it in one
safe_psql()
  - the sequence of drop_logical_slots(), create_logical_slots(),
change_hot_standby_feedback_and_wait_for_xmins(), make_slot_active() is
repeated quite a few times
  - the stats queries checking for specific conflict counts, including
preceding tests, is pretty painful. I suggest to reset the stats at the
end of the test instead (likely also do the drop_logical_slot() there).
  - it's hard to correlate postgres log and the tap test, because the slots
are named the same across all tests. Perhaps they could have a per-test
prefix?
  - numbering tests is a PITA, I had to renumber the later ones, when adding a
test for shared catalog tables


My attached version does include your v62-63 incremental chnages.

Greetings,

Andres Freund
>From 1e5461e0019678a92192b0dd5d9bf3f7105f504d Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 6 Apr 2023 20:00:07 -0700
Subject: [PATCH va65 1/9] replication slots: replace invalidated_at LSN with
 an enum

---
 src/include/replication/slot.h  | 15 +--
 src/backend/replication/slot.c  | 21 ++---
 src/backend/replication/slotfuncs.c |  8 +++-
 3 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8872c80cdfe..793f0701b88 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -37,6 +37,17 @@ typedef enum ReplicationSlotPersistency
 	RS_TEMPORARY
 } ReplicationSlotPersistency;
 
+/*
+ * Slots can be invalidated, e.g. due to max_slot_wal_keep_size. If so, the
+ * 'invalidated' field is 

Re: Minimal logical decoding on standbys

2023-04-07 Thread Drouvot, Bertrand

Hi,

On 4/7/23 7:56 AM, Andres Freund wrote:

Hi,

On 2023-04-07 07:02:04 +0200, Drouvot, Bertrand wrote:

Done in V63 attached and did change the associated comment a bit.


Can you send your changes incrementally, relative to V62? I'm polishing them
right now, and that'd make it a lot easier to apply your changes ontop.



Sure, please find them enclosed.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index a4f9a3c972..5b6d19d379 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -6,7 +6,7 @@ use warnings;
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 67;
+use Test::More;
 
 my ($stdin, $stdout, $stderr, $cascading_stdout, $cascading_stderr, $ret, 
$handle, $slot);
 
@@ -112,8 +112,7 @@ sub check_slots_dropped
check_pg_recvlogical_stderr($slot_user_handle, "conflict with 
recovery");
 }
 
-# Check if all the slots on standby are dropped. These include the 'activeslot'
-# that was acquired by make_slot_active(), and the non-active 'inactiveslot'.
+# Change hot_standby_feedback and check xmin and catalog_xmin values.
 sub change_hot_standby_feedback_and_wait_for_xmins
 {
my ($hsf, $invalidated) = @_;
@@ -560,7 +559,7 @@ ok( find_in_log(
   'activeslot slot invalidation is logged due to wal_level');
 
 # Verify that pg_stat_database_conflicts.confl_active_logicalslot has been 
updated
-# we now expect 3 conflicts reported as the counter persist across reloads
+# we now expect 4 conflicts reported as the counter persist across reloads
 ok( $node_standby->poll_query_until(
'postgres',
"select (confl_active_logicalslot = 4) from pg_stat_database_conflicts 
where datname = 'testdb'", 't'),
@@ -703,3 +702,5 @@ ok( pump_until(
 chomp($cascading_stdout);
 is($cascading_stdout, $expected,
 'got same expected output from pg_recvlogical decoding session on 
cascading standby');
+
+done_testing();
diff --git a/src/backend/access/transam/xlogrecovery.c 
b/src/backend/access/transam/xlogrecovery.c
index cc4f7b5302..e6427c54c5 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1945,7 +1945,7 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord 
*record, TimeLineID *repl
 * Indeed, logical walsenders on standby can't decode and send data 
until
 * it's been applied.
 *
-* Physical walsenders don't need to be wakon up during replay unless
+* Physical walsenders don't need to be woken up during replay unless
 * cascading replication is allowed and time line change occured (so 
that
 * they can notice that they are on a new time line).
 *
@@ -1953,9 +1953,11 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord 
*record, TimeLineID *repl
 *
 *  - physical walsenders in case of new time line and cascade
 *  replication is allowed.
-*  - always true for logical walsenders.
+*  - logical walsenders in case cascade replication is allowed (could 
not
+*  be created otherwise).
 */
-   WalSndWakeup(switchedTLI && AllowCascadeReplication(), true);
+   if (AllowCascadeReplication())
+   WalSndWakeup(switchedTLI, true);
 
/*
 * If rm_redo called XLogRequestWalReceiverReply, then we wake up the


Re: Minimal logical decoding on standbys

2023-04-06 Thread Andres Freund
Hi,

On 2023-04-07 07:02:04 +0200, Drouvot, Bertrand wrote:
> Done in V63 attached and did change the associated comment a bit.

Can you send your changes incrementally, relative to V62? I'm polishing them
right now, and that'd make it a lot easier to apply your changes ontop.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand

Hi,

On 4/7/23 5:47 AM, Amit Kapila wrote:

On Thu, Apr 6, 2023 at 7:50 PM Drouvot, Bertrand
 wrote:


Thanks! Will update 0005.



I noticed a few typos in the latest patches.

0004
1.
+ * Physical walsenders don't need to be wakon up during replay unless

Typo.


Thanks! Fixed in V63 just posted up-thread.



0005
2.
+# Check if all the slots on standby are dropped. These include the 'activeslot'
+# that was acquired by make_slot_active(), and the non-active 'inactiveslot'.
+sub check_slots_dropped
+{
+ my ($slot_user_handle) = @_;
+
+ is($node_standby->slot('inactiveslot')->{'slot_type'}, '',
'inactiveslot on standby dropped');
+ is($node_standby->slot('activeslot')->{'slot_type'}, '', 'activeslot
on standby dropped');
+
+ check_pg_recvlogical_stderr($slot_user_handle, "conflict with recovery");
+}
+
+# Check if all the slots on standby are dropped. These include the 'activeslot'
+# that was acquired by make_slot_active(), and the non-active 'inactiveslot'.
+sub change_hot_standby_feedback_and_wait_for_xmins
+{
+ my ($hsf, $invalidated) = @_;
+
+ $node_standby->append_conf('postgresql.conf',qq[
+ hot_standby_feedback = $hsf
+ ]);
+
+ $node_standby->reload;
+
+ if ($hsf && $invalidated)
+ {
...

The comment above change_hot_standby_feedback_and_wait_for_xmins seems
to be wrong. It seems to be copied from the previous test.



Good catch! Fixed in V63.



3.
+# Verify that pg_stat_database_conflicts.confl_active_logicalslot has
been updated
+# we now expect 3 conflicts reported as the counter persist across reloads
+ok( $node_standby->poll_query_until(
+ 'postgres',
+ "select (confl_active_logicalslot = 4) from
pg_stat_database_conflicts where datname = 'testdb'", 't'),
+ 'confl_active_logicalslot updated') or die "Timed out waiting
confl_active_logicalslot to be updated";

The comment incorrectly mentions 3 conflicts whereas the query expects 4.



Good catch, fixed in v63.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand

Hi,

On 4/6/23 4:20 PM, Drouvot, Bertrand wrote:

Hi,

On 4/6/23 3:39 PM, Amit Kapila wrote:

On Thu, Apr 6, 2023 at 6:32 PM Drouvot, Bertrand
 wrote:



I don't think it could be possible to create logical walsenders on a standby if
AllowCascadeReplication() is not true, or am I missing something?



Right, so why to even traverse walsenders for that case? What I was
imagining a code is like:
if (AllowCascadeReplication())
 WalSndWakeup(switchedTLI, true);

Do you see any problem with this change?


Not at all, it looks good to me.





Done in V63 attached and did change the associated comment a bit.



Few more minor comments on 0005
=
0005
1.
+   
+    Take a snapshot of running transactions and write this to WAL without
+    having to wait bgwriter or checkpointer to log one.

/wait bgwriter/wait for bgwriter

2.
+use Test::More tests => 67;

We no more use the number of tests. Please refer to other similar tests.



Thanks! Will update 0005.



Done in V63.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom b5ff35d8ace1e429c6a15d53203b00304a3ff1f4 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 14:08:11 +
Subject: [PATCH v63 6/6] Doc changes describing details about logical
 decoding.

Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/logicaldecoding.sgml | 27 +++
 1 file changed, 27 insertions(+)
 100.0% doc/src/sgml/

diff --git a/doc/src/sgml/logicaldecoding.sgml 
b/doc/src/sgml/logicaldecoding.sgml
index 4e912b4bd4..8651024b8a 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -316,6 +316,33 @@ postgres=# select * from 
pg_logical_slot_get_changes('regression_slot', NULL, NU
  may consume changes from a slot at any given time.
 
 
+
+ A logical replication slot can also be created on a hot standby. To 
prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot gets
+ invalidated. It's highly recommended to use a physical slot between the 
primary
+ and the standby. Otherwise, hot_standby_feedback will work, but only 
while the
+ connection is alive (for example a node restart would break it). Then, the
+ primary may delete system catalog rows that could be needed by the logical
+ decoding on the standby (as it does not know about the catalog_xmin on the
+ standby). Existing logical slots on standby also get invalidated if 
wal_level
+ on primary is reduced to less than 'logical'. This is done as soon as the
+ standby detects such a change in the WAL stream. It means, that for 
walsenders
+ that are lagging (if any), some WAL records up to the wal_level parameter 
change
+ on the primary won't be decoded.
+
+
+
+ For a logical slot to be created, it builds a historic snapshot, for which
+ information of all the currently running transactions is essential. On
+ primary, this information is available, but on standby, this information
+ has to be obtained from primary. So, slot creation may wait for some
+ activity to happen on the primary. If the primary is idle, creating a
+ logical slot on standby may take a noticeable time. One option to speed it
+ is to call the pg_log_standby_snapshot on the 
primary.
+
+
 
  
   Replication slots persist across crashes and know nothing about the state
-- 
2.34.1

From ee35ea655de6d3178a1ec1b7b70345e2f4adccb5 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 09:04:12 +
Subject: [PATCH v63 5/6] New TAP test for logical decoding on standby.

In addition to the new TAP test, this commit introduces a new 
pg_log_standby_snapshot()
function.

The idea is to be able to take a snapshot of running transactions and write this
to WAL without requesting for a (costly) checkpoint.

Author: Craig Ringer (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/func.sgml|  15 +
 src/backend/access/transam/xlogfuncs.c|  32 +
 src/backend/catalog/system_functions.sql  |   2 +
 src/include/catalog/pg_proc.dat   |   3 +
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  37 +
 src/test/recovery/meson.build |   1 +
 .../t/035_standby_logical_decoding.pl | 706 ++
 7 files changed, 796 insertions(+)
   3.1% src/backend/
   4.0% src/test/perl/PostgreSQL/Test/
  89.6% src/test/recovery/t/

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index dc44a74eb2..9253cd1c18 100644
--- 

Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand

Hi,

On 4/7/23 4:18 AM, Andres Freund wrote:

Hi,

TBH, I don't like the state of 0001 much. I'm working on polishing it now.



Thanks Andres!


A lot of the new functions in slot.h don't seem right to me:
- ObsoleteSlotIsInvalid() - isn't an obsolete slot by definition invalid?


bad naming, agree.


- Why does ObsoleteSlotIsInvalid() sometime check invalidated_at and sometimes
   not?


because part of the existing code was doing so (checking if s->data.restart_lsn 
is valid
with/without checking if data.invalidated_at is valid) and I thought it was 
better not
to change it.


- TransactionIdIsValidPrecedesOrEquals() shouldn't be defined in slot.h -
   also, it's not actually clear what semantics it's trying to have.


Oh right, my bad for the location.


- there's no commonality in naming between the functions used to test if a
   slot needs to be invalidated (SlotIsFreshEnough() vs
   LogicalSlotIsNotConflicting()).


Agree, my bad.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand

On 4/7/23 3:59 AM, Amit Kapila wrote:

On Fri, Apr 7, 2023 at 6:55 AM Andres Freund  wrote:


On 2023-04-06 12:10:57 +0530, Amit Kapila wrote:

After this, I think for backends that have active slots, it would
simply cancel the current query. Will that be sufficient? Because we
want the backend process should exit and release the slot so that the
startup process can mark it invalid.


We don't need them to exit, we just need them to release the slot. Which does
happen when the query is cancelled. Imagine if that weren't the case - if a
cancellation of pg_logical_slot_* wouldn't release the slot, we couldn't call
it again before disconnecting. I also did verify that indeed the slot is
released upon a cancellation.



makes sense. Thanks for the clarification!



+1, thanks Andres!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila
On Thu, Apr 6, 2023 at 7:50 PM Drouvot, Bertrand
 wrote:
>
> Thanks! Will update 0005.
>

I noticed a few typos in the latest patches.

0004
1.
+ * Physical walsenders don't need to be wakon up during replay unless

Typo.

0005
2.
+# Check if all the slots on standby are dropped. These include the 'activeslot'
+# that was acquired by make_slot_active(), and the non-active 'inactiveslot'.
+sub check_slots_dropped
+{
+ my ($slot_user_handle) = @_;
+
+ is($node_standby->slot('inactiveslot')->{'slot_type'}, '',
'inactiveslot on standby dropped');
+ is($node_standby->slot('activeslot')->{'slot_type'}, '', 'activeslot
on standby dropped');
+
+ check_pg_recvlogical_stderr($slot_user_handle, "conflict with recovery");
+}
+
+# Check if all the slots on standby are dropped. These include the 'activeslot'
+# that was acquired by make_slot_active(), and the non-active 'inactiveslot'.
+sub change_hot_standby_feedback_and_wait_for_xmins
+{
+ my ($hsf, $invalidated) = @_;
+
+ $node_standby->append_conf('postgresql.conf',qq[
+ hot_standby_feedback = $hsf
+ ]);
+
+ $node_standby->reload;
+
+ if ($hsf && $invalidated)
+ {
...

The comment above change_hot_standby_feedback_and_wait_for_xmins seems
to be wrong. It seems to be copied from the previous test.

3.
+# Verify that pg_stat_database_conflicts.confl_active_logicalslot has
been updated
+# we now expect 3 conflicts reported as the counter persist across reloads
+ok( $node_standby->poll_query_until(
+ 'postgres',
+ "select (confl_active_logicalslot = 4) from
pg_stat_database_conflicts where datname = 'testdb'", 't'),
+ 'confl_active_logicalslot updated') or die "Timed out waiting
confl_active_logicalslot to be updated";

The comment incorrectly mentions 3 conflicts whereas the query expects 4.

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila
On Fri, Apr 7, 2023 at 8:43 AM Andres Freund  wrote:
>
> On 2023-04-06 12:10:57 +0530, Amit Kapila wrote:
> > Also, it seems you have removed the checks related to
> > slots, is it because PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is only
> > used for logical slots? If so, do you think an Assert would make
> > sense?
>
> The asserts that have been added aren't correct. There's no guarantee that the
> receiver of the procsignal still holds the same slot or any slot at all.
>

For backends, that don't hold any slot, can we skip setting the
RecoveryConflictPending and other flags?

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-06 Thread Andres Freund
Hi,

On 2023-04-06 12:10:57 +0530, Amit Kapila wrote:
> Also, it seems you have removed the checks related to
> slots, is it because PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is only
> used for logical slots? If so, do you think an Assert would make
> sense?

The asserts that have been added aren't correct. There's no guarantee that the
receiver of the procsignal still holds the same slot or any slot at all.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-06 Thread Andres Freund
Hi,

TBH, I don't like the state of 0001 much. I'm working on polishing it now.

A lot of the new functions in slot.h don't seem right to me:
- ObsoleteSlotIsInvalid() - isn't an obsolete slot by definition invalid?
- Why does ObsoleteSlotIsInvalid() sometime check invalidated_at and sometimes
  not?
- DoNotInvalidateSlot() seems too generic a name for a function exposed to the
  outside of slot.c
- TransactionIdIsValidPrecedesOrEquals() shouldn't be defined in slot.h -
  also, it's not actually clear what semantics it's trying to have.
- there's no commonality in naming between the functions used to test if a
  slot needs to be invalidated (SlotIsFreshEnough() vs
  LogicalSlotIsNotConflicting()).

Leaving naming etc aside, most of these don't seem to belong in slot.h, but
should just be in slot.c - there aren't conceivable users from outside slot.c.


Independent of this patch: What's the point of invalidated_at? The only reads
of it are done like
invalidated = (!XLogRecPtrIsInvalid(s->data.invalidated_at) &&
   
XLogRecPtrIsInvalid(s->data.restart_lsn));
i.e. the actual LSN is not used.

ISTM that we should just have it be a boolean, and that it should be used by
the different kinds of invalidating a slot.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila
On Fri, Apr 7, 2023 at 6:55 AM Andres Freund  wrote:
>
> On 2023-04-06 12:10:57 +0530, Amit Kapila wrote:
> > After this, I think for backends that have active slots, it would
> > simply cancel the current query. Will that be sufficient? Because we
> > want the backend process should exit and release the slot so that the
> > startup process can mark it invalid.
>
> We don't need them to exit, we just need them to release the slot. Which does
> happen when the query is cancelled. Imagine if that weren't the case - if a
> cancellation of pg_logical_slot_* wouldn't release the slot, we couldn't call
> it again before disconnecting. I also did verify that indeed the slot is
> released upon a cancellation.
>

makes sense. Thanks for the clarification!

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-06 Thread Andres Freund
Hi,

On 2023-04-06 12:10:57 +0530, Amit Kapila wrote:
> After this, I think for backends that have active slots, it would
> simply cancel the current query. Will that be sufficient? Because we
> want the backend process should exit and release the slot so that the
> startup process can mark it invalid.

We don't need them to exit, we just need them to release the slot. Which does
happen when the query is cancelled. Imagine if that weren't the case - if a
cancellation of pg_logical_slot_* wouldn't release the slot, we couldn't call
it again before disconnecting. I also did verify that indeed the slot is
released upon a cancellation.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand

Hi,

On 4/6/23 3:39 PM, Amit Kapila wrote:

On Thu, Apr 6, 2023 at 6:32 PM Drouvot, Bertrand
 wrote:



I don't think it could be possible to create logical walsenders on a standby if
AllowCascadeReplication() is not true, or am I missing something?



Right, so why to even traverse walsenders for that case? What I was
imagining a code is like:
if (AllowCascadeReplication())
 WalSndWakeup(switchedTLI, true);

Do you see any problem with this change?


Not at all, it looks good to me.



Few more minor comments on 0005
=
0005
1.
+   
+Take a snapshot of running transactions and write this to WAL without
+having to wait bgwriter or checkpointer to log one.

/wait bgwriter/wait for bgwriter

2.
+use Test::More tests => 67;

We no more use the number of tests. Please refer to other similar tests.



Thanks! Will update 0005.

Regards,


--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila
On Thu, Apr 6, 2023 at 6:32 PM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 4/6/23 11:55 AM, Amit Kapila wrote:
> > On Thu, Apr 6, 2023 at 12:10 PM Amit Kapila  wrote:
> >>
> >> On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand
> >>  wrote:
> >>>
> >>
> >> Another comment on 0001.
> >>   extern void CheckSlotRequirements(void);
> >>   extern void CheckSlotPermissions(void);
> >> +extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid,
> >> TransactionId xid, char *reason);
> >>
> >> This doesn't seem to be called from anywhere.
> >>
> >
> > Few other comments:
> > ==
> > 0004
> > 1.
> > + *  - physical walsenders in case of new time line and cascade
> > + *  replication is allowed.
> > + *  - logical walsenders in case of new time line or recovery is in 
> > progress
> > + *  (logical decoding on standby).
> > + */
> > + WalSndWakeup(switchedTLI && AllowCascadeReplication(),
> > + switchedTLI || RecoveryInProgress());
> >
> > Do we need AllowCascadeReplication() check specifically for physical
> > walsenders? I think this should be true for both physical and logical
> > walsenders.
> >
>
> I don't think it could be possible to create logical walsenders on a standby 
> if
> AllowCascadeReplication() is not true, or am I missing something?
>

Right, so why to even traverse walsenders for that case? What I was
imagining a code is like:
if (AllowCascadeReplication())
WalSndWakeup(switchedTLI, true);

Do you see any problem with this change?

Few more minor comments on 0005
=
0005
1.
+   
+Take a snapshot of running transactions and write this to WAL without
+having to wait bgwriter or checkpointer to log one.

/wait bgwriter/wait for bgwriter

2.
+use Test::More tests => 67;

We no more use the number of tests. Please refer to other similar tests.

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand

Hi,

On 4/6/23 2:23 PM, Amit Kapila wrote:

On Thu, Apr 6, 2023 at 11:29 AM Amit Kapila  wrote:

Thinking some more on this, I think such a slot won't decode any other
records. During CreateInitDecodingContext->ReplicationSlotReserveWal,
for standby's, we use lastReplayedEndRecPtr as restart_lsn. This
should be a record before parameter_change record in the above
scenario. So, ideally, the first record to decode by such a walsender
should be parameter_change which will anyway error out. So, this
shouldn't be a problem.



Agree.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand

Hi,

On 4/6/23 7:59 AM, Amit Kapila wrote:

On Wed, Apr 5, 2023 at 6:14 PM Drouvot, Bertrand
 wrote:


On 4/5/23 12:28 PM, Amit Kapila wrote:

On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand
 wrote:

Maybe we could change the doc with something among those lines instead?

"
Existing logical slots on standby also get invalidated if wal_level on primary 
is reduced to
less than 'logical'. This is done as soon as the standby detects such a change 
in the WAL stream.

It means, that for walsenders that are lagging (if any), some WAL records up to 
the parameter change on the
primary won't be decoded".

I don't know whether this is what one would expect but that should be less of a 
surprise if documented.

What do you think?



Yeah, I think it is better to document to avoid any surprises if
nobody else sees any problem with it.


Ack.



This doesn't seem to be addressed in the latest version.


Right, I was waiting if "nobody else sees any problem with it".

Added it now in V62 posted up-thread.


And today, I
think I see one more point about this doc change:
+
+ A logical replication slot can also be created on a hot standby.
To prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot gets
+ invalidated. It's highly recommended to use a physical slot
between the primary
+ and the standby. Otherwise, hot_standby_feedback will work, but
only while the
+ connection is alive (for example a node restart would break it). Existing
+ logical slots on standby also get invalidated if wal_level on
primary is reduced to
+ less than 'logical'.

If hot_standby_feedback is not set then can logical decoding on
standby misbehave? If so, that is not very clear from this doc change
if that is acceptable.


I don't think it would misbehave but that primary may delete system catalog rows
that could be needed by the logical decoding on the standby (as it does not 
know about the
catalog_xmin on the standby).

Added this remark in V62.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand

Hi,

On 4/6/23 11:55 AM, Amit Kapila wrote:

On Thu, Apr 6, 2023 at 12:10 PM Amit Kapila  wrote:


On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand
 wrote:




Another comment on 0001.
  extern void CheckSlotRequirements(void);
  extern void CheckSlotPermissions(void);
+extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid,
TransactionId xid, char *reason);

This doesn't seem to be called from anywhere.



Few other comments:
==
0004
1.
+ *  - physical walsenders in case of new time line and cascade
+ *  replication is allowed.
+ *  - logical walsenders in case of new time line or recovery is in progress
+ *  (logical decoding on standby).
+ */
+ WalSndWakeup(switchedTLI && AllowCascadeReplication(),
+ switchedTLI || RecoveryInProgress());

Do we need AllowCascadeReplication() check specifically for physical
walsenders? I think this should be true for both physical and logical
walsenders.



I don't think it could be possible to create logical walsenders on a standby if
AllowCascadeReplication() is not true, or am I missing something?

If so, I think it has to be set to true for the logical walsenders in all the 
case (like
done in V62 posted up-thread).

Andres, made the point up-thread that RecoveryInProgress() is always true, and
as we don't want to be woken up only when there is a time line change then I 
think
it has to be always true for logical walsenders.


0005
2.
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -38,6 +38,7 @@
  #include "utils/pg_lsn.h"
  #include "utils/timestamp.h"
  #include "utils/tuplestore.h"
+#include "storage/standby.h"

The header includes should be in alphabetical order.



Good catch, thanks! Done in V62.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand

Hi,

On 4/6/23 8:40 AM, Amit Kapila wrote:

On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand

After this, I think for backends that have active slots, it would
simply cancel the current query. Will that be sufficient? Because we
want the backend process should exit and release the slot so that the
startup process can mark it invalid. 
For walsender, an ERROR will lead

to its exit, so that is fine. If this understanding is correct, then
if 'am_cascading_walsender' is false, we should set ProcDiePending
apart from other parameters. Sorry, I haven't tested this, so I could
be wrong here. 


Oops my bad. You are fully, right. Fixed in V62 posted up-thread


Also, it seems you have removed the checks related to
slots, is it because PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is only
used for logical slots? If so, do you think an Assert would make
sense?



Yes, indeed adding an Assert makes sense: done in V62 posted up-thread.



Another comment on 0001.
  extern void CheckSlotRequirements(void);
  extern void CheckSlotPermissions(void);
+extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid,
TransactionId xid, char *reason);

This doesn't seem to be called from anywhere.


Good catch, removed in V62 posted up-thread.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila
On Thu, Apr 6, 2023 at 11:29 AM Amit Kapila  wrote:
>
> >
>
> This doesn't seem to be addressed in the latest version. And today, I
> think I see one more point about this doc change:
> +
> + A logical replication slot can also be created on a hot standby.
> To prevent
> + VACUUM from removing required rows from the system
> + catalogs, hot_standby_feedback should be set on the
> + standby. In spite of that, if any required rows get removed, the slot 
> gets
> + invalidated. It's highly recommended to use a physical slot
> between the primary
> + and the standby. Otherwise, hot_standby_feedback will work, but
> only while the
> + connection is alive (for example a node restart would break it). 
> Existing
> + logical slots on standby also get invalidated if wal_level on
> primary is reduced to
> + less than 'logical'.
>
> If hot_standby_feedback is not set then can logical decoding on
> standby misbehave? If so, that is not very clear from this doc change
> if that is acceptable. One scenario where I think it can misbehave is
> if applying WAL records generated after changing wal_level from
> 'logical' to 'replica' physically removes catalog tuples that could be
> referenced by the logical decoding on the standby. Now, as mentioned
> in patch 0003's comment in decode.c that it is possible that some
> slots may creep even after we invalidate the slots on parameter
> change, so while decoding using that slot if some required catalog
> tuple has been removed by physical replication then the decoding can
> misbehave even before reaching XLOG_PARAMETER_CHANGE record.
>

Thinking some more on this, I think such a slot won't decode any other
records. During CreateInitDecodingContext->ReplicationSlotReserveWal,
for standby's, we use lastReplayedEndRecPtr as restart_lsn. This
should be a record before parameter_change record in the above
scenario. So, ideally, the first record to decode by such a walsender
should be parameter_change which will anyway error out. So, this
shouldn't be a problem.

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila
On Thu, Apr 6, 2023 at 12:10 PM Amit Kapila  wrote:
>
> On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand
>  wrote:
> >
>
> Another comment on 0001.
>  extern void CheckSlotRequirements(void);
>  extern void CheckSlotPermissions(void);
> +extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid,
> TransactionId xid, char *reason);
>
> This doesn't seem to be called from anywhere.
>

Few other comments:
==
0004
1.
+ *  - physical walsenders in case of new time line and cascade
+ *  replication is allowed.
+ *  - logical walsenders in case of new time line or recovery is in progress
+ *  (logical decoding on standby).
+ */
+ WalSndWakeup(switchedTLI && AllowCascadeReplication(),
+ switchedTLI || RecoveryInProgress());

Do we need AllowCascadeReplication() check specifically for physical
walsenders? I think this should be true for both physical and logical
walsenders.

0005
2.
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -38,6 +38,7 @@
 #include "utils/pg_lsn.h"
 #include "utils/timestamp.h"
 #include "utils/tuplestore.h"
+#include "storage/standby.h"

The header includes should be in alphabetical order.

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila
On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand
 wrote:
>
> On 4/5/23 3:15 PM, Amit Kapila wrote:
> > On Wed, Apr 5, 2023 at 6:14 PM Drouvot, Bertrand
> >  wrote:
> >>
> >> On 4/5/23 12:28 PM, Amit Kapila wrote:
> >>> On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand
> >>>  wrote:
> >>
> >>> minor nitpick:
> >>> +
> >>> + /* Intentional fall through to session cancel */
> >>> + /* FALLTHROUGH */
> >>>
> >>> Do we need to repeat fall through twice in different ways?
> >>>
> >>
> >> Do you mean, you'd prefer what was done in v52/0002?
> >>
> >
> > No, I was thinking that instead of two comments, we need one here.
> > But, now thinking about it, do we really need to fall through in this
> > case, if so why? Shouldn't this case be handled after
> > PROCSIG_RECOVERY_CONFLICT_DATABASE?
> >
>
> Indeed, thanks! Done in V61 posted up-thread.
>

After this, I think for backends that have active slots, it would
simply cancel the current query. Will that be sufficient? Because we
want the backend process should exit and release the slot so that the
startup process can mark it invalid. For walsender, an ERROR will lead
to its exit, so that is fine. If this understanding is correct, then
if 'am_cascading_walsender' is false, we should set ProcDiePending
apart from other parameters. Sorry, I haven't tested this, so I could
be wrong here. Also, it seems you have removed the checks related to
slots, is it because PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is only
used for logical slots? If so, do you think an Assert would make
sense?

Another comment on 0001.
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
+extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid,
TransactionId xid, char *reason);

This doesn't seem to be called from anywhere.

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila
On Wed, Apr 5, 2023 at 6:14 PM Drouvot, Bertrand
 wrote:
>
> On 4/5/23 12:28 PM, Amit Kapila wrote:
> > On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand
> >  wrote:
> >> Maybe we could change the doc with something among those lines instead?
> >>
> >> "
> >> Existing logical slots on standby also get invalidated if wal_level on 
> >> primary is reduced to
> >> less than 'logical'. This is done as soon as the standby detects such a 
> >> change in the WAL stream.
> >>
> >> It means, that for walsenders that are lagging (if any), some WAL records 
> >> up to the parameter change on the
> >> primary won't be decoded".
> >>
> >> I don't know whether this is what one would expect but that should be less 
> >> of a surprise if documented.
> >>
> >> What do you think?
> >>
> >
> > Yeah, I think it is better to document to avoid any surprises if
> > nobody else sees any problem with it.
>
> Ack.
>

This doesn't seem to be addressed in the latest version. And today, I
think I see one more point about this doc change:
+
+ A logical replication slot can also be created on a hot standby.
To prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot gets
+ invalidated. It's highly recommended to use a physical slot
between the primary
+ and the standby. Otherwise, hot_standby_feedback will work, but
only while the
+ connection is alive (for example a node restart would break it). Existing
+ logical slots on standby also get invalidated if wal_level on
primary is reduced to
+ less than 'logical'.

If hot_standby_feedback is not set then can logical decoding on
standby misbehave? If so, that is not very clear from this doc change
if that is acceptable. One scenario where I think it can misbehave is
if applying WAL records generated after changing wal_level from
'logical' to 'replica' physically removes catalog tuples that could be
referenced by the logical decoding on the standby. Now, as mentioned
in patch 0003's comment in decode.c that it is possible that some
slots may creep even after we invalidate the slots on parameter
change, so while decoding using that slot if some required catalog
tuple has been removed by physical replication then the decoding can
misbehave even before reaching XLOG_PARAMETER_CHANGE record.

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-05 Thread Andres Freund
Hi,

On 2023-04-05 17:56:14 +0200, Drouvot, Bertrand wrote:

> @@ -7963,6 +7963,23 @@ xlog_redo(XLogReaderState *record)
>   /* Update our copy of the parameters in pg_control */
>   memcpy(, XLogRecGetData(record), 
> sizeof(xl_parameter_change));
>
> + /*
> +  * Invalidate logical slots if we are in hot standby and the 
> primary
> +  * does not have a WAL level sufficient for logical decoding. 
> No need
> +  * to search for potentially conflicting logically slots if 
> standby is
> +  * running with wal_level lower than logical, because in that 
> case, we
> +  * would have either disallowed creation of logical slots or
> +  * invalidated existing ones.
> +  */
> + if (InRecovery && InHotStandby &&
> + xlrec.wal_level < WAL_LEVEL_LOGICAL &&
> + wal_level >= WAL_LEVEL_LOGICAL)
> + {
> + TransactionId ConflictHorizon = InvalidTransactionId;
> +
> + InvalidateObsoleteReplicationSlots(InvalidXLogRecPtr, 
> InvalidOid, );
> + }

I mentioned this before, but I still don't understand why
InvalidateObsoleteReplicationSlots() accepts ConflictHorizon as a
pointer. It's not even modified, as far as I can see?


>  /*
>   * Report shared-memory space needed by ReplicationSlotsShmemInit.
>   */
> @@ -855,8 +862,7 @@ ReplicationSlotsComputeRequiredXmin(bool already_locked)
>   SpinLockAcquire(>mutex);
>   effective_xmin = s->effective_xmin;
>   effective_catalog_xmin = s->effective_catalog_xmin;
> - invalidated = (!XLogRecPtrIsInvalid(s->data.invalidated_at) &&
> -
> XLogRecPtrIsInvalid(s->data.restart_lsn));
> + invalidated = ObsoleteSlotIsInvalid(s, true) || 
> LogicalReplicationSlotIsInvalid(s);
>   SpinLockRelease(>mutex);

I don't understand why we need to have two different functions for this.


>   /* invalidated slots need not apply */
> @@ -1225,28 +1231,92 @@ ReplicationSlotReserveWal(void)
>   }
>  }
>
> +
> +/*
> + * Report terminating or conflicting message.
> + *
> + * For both, logical conflict on standby and obsolete slot are handled.
> + */
> +static void
> +ReportTerminationInvalidation(bool terminating, bool check_on_xid, int pid,
> +   NameData slotname, 
> TransactionId *xid,
> +   XLogRecPtr 
> restart_lsn, XLogRecPtr oldestLSN)
> +{
> + StringInfoData err_msg;
> + StringInfoData err_detail;
> + boolhint = false;
> +
> + initStringInfo(_detail);
> +
> + if (check_on_xid)
> + {
> + if (!terminating)
> + {
> + initStringInfo(_msg);
> + appendStringInfo(_msg, _("invalidating replication 
> slot \"%s\" because it conflicts with recovery"),
> +  NameStr(slotname));

I still don't think the main error message should differ between invalidating
a slot due recovery and max_slot_wal_keep_size.

> +
>  /*
> - * Helper for InvalidateObsoleteReplicationSlots -- acquires the given slot
> - * and mark it invalid, if necessary and possible.
> + * Helper for InvalidateObsoleteReplicationSlots
> + *
> + * Acquires the given slot and mark it invalid, if necessary and possible.
>   *
>   * Returns whether ReplicationSlotControlLock was released in the interim 
> (and
>   * in that case we're not holding the lock at return, otherwise we are).
>   *
> - * Sets *invalidated true if the slot was invalidated. (Untouched otherwise.)
> + * Sets *invalidated true if an obsolete slot was invalidated. (Untouched 
> otherwise.)

What's the point of making this specific to "obsolete slots"?


>   * This is inherently racy, because we release the LWLock
>   * for syscalls, so caller must restart if we return true.
>   */
>  static bool
>  InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
> -bool *invalidated)
> +bool *invalidated, 
> TransactionId *xid)
>  {
>   int last_signaled_pid = 0;
>   boolreleased_lock = false;
> + boolcheck_on_xid;
> +
> + check_on_xid = xid ? true : false;
>
>   for (;;)
>   {
>   XLogRecPtr  restart_lsn;
> +
>   NameDataslotname;
>   int active_pid = 0;
>
> @@ -1263,19 +1333,20 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, 
> XLogRecPtr oldestLSN,
>* Check if the slot needs to be invalidated. If it needs to be
>* invalidated, and is not currently acquired, acquire it and 
> mark it
>  

Re: Minimal logical decoding on standbys

2023-04-05 Thread Drouvot, Bertrand

Hi,

On 4/5/23 4:24 PM, Robert Haas wrote:

On Tue, Apr 4, 2023 at 8:33 PM Jeff Davis  wrote:

For comments, I agree that WalSndWakeup() clearly needs a comment
update. The call site in ApplyWalRecord() could also use a comment. You
could add a comment at every call site, but I don't think that's
necessary if there's a good comment over WalSndWakeup().


Right, we don't want to go overboard, but I think putting some of the
text you wrote above for the commit message, or something with a
similar theme, in the comment for WalSndWakeup() would be quite
helpful. We want people to understand why the physical and logical
cases are different.


Gave it a try in V61 posted up-thread.


I agree with you that ApplyWalRecord() is the other place where we
need a good comment. I think the one in v60 needs more word-smithing.
It should probably be a bit more detailed and clear about not only
what we're doing but why we're doing it.


Gave it a try in V61 posted up-thread.



Now that I understand what's going on here a bit better, I'm inclined
to think that this patch is basically fine. At least, I don't see any
obvious problem with it.


Thanks for the review and feedback!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-05 Thread Drouvot, Bertrand

Hi,

On 4/5/23 3:15 PM, Amit Kapila wrote:

On Wed, Apr 5, 2023 at 6:14 PM Drouvot, Bertrand
 wrote:


On 4/5/23 12:28 PM, Amit Kapila wrote:

On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand
 wrote:



minor nitpick:
+
+ /* Intentional fall through to session cancel */
+ /* FALLTHROUGH */

Do we need to repeat fall through twice in different ways?



Do you mean, you'd prefer what was done in v52/0002?



No, I was thinking that instead of two comments, we need one here.
But, now thinking about it, do we really need to fall through in this
case, if so why? Shouldn't this case be handled after
PROCSIG_RECOVERY_CONFLICT_DATABASE?



Indeed, thanks! Done in V61 posted up-thread.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-05 Thread Drouvot, Bertrand

Hi,

On 4/5/23 1:59 PM, Amit Kapila wrote:

On Wed, Apr 5, 2023 at 3:58 PM Amit Kapila  wrote:


On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand
 wrote:

minor nitpick:
+
+ /* Intentional fall through to session cancel */
+ /* FALLTHROUGH */

Do we need to repeat fall through twice in different ways?



Few minor comments on 0003:

1.
+ case XLOG_PARAMETER_CHANGE:
+ {
+ xl_parameter_change *xlrec =
+ (xl_parameter_change *) XLogRecGetData(buf->record);
+
+ /*
+ * If wal_level on primary is reduced to less than logical,
+ * then we want to prevent existing logical slots from being
+ * used. Existing logical slots on standby get invalidated
+ * when this WAL record is replayed; and further, slot
+ * creation fails when the wal level is not sufficient; but
+ * all these operations are not synchronized, so a logical
+ * slot may creep in while the wal_level is being reduced.
+ * Hence this extra check.
+ */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires wal_level to be at
least logical on the primary server")));

By looking at this change, it is not very clear that this can occur
only on standby. I understand that on primary, we will not allow
restarting the server after changing wal_level if there is a
pre-existing slot but still this looks a bit odd. Shall we have an
Assert to indicate that this will occur only on standby?


I think that's a fair point. Adding an Assert and a comment before the
Assert in V61 attached.



2.
/*
- * Since logical decoding is only permitted on a primary server, we know
- * that the current timeline ID can't be changing any more. If we did this
- * on a standby, we'd have to worry about the values we compute here
- * becoming invalid due to a promotion or timeline change.
+ * Since logical decoding is also permitted on a standby server, we need
+ * to check if the server is in recovery to decide how to get the current
+ * timeline ID (so that it also cover the promotion or timeline change
+ * cases).
   */
+
+ /* make sure we have enough WAL available */
+ flushptr = WalSndWaitForWal(targetPagePtr + reqLen);
+
+ /* the standby could have been promoted, so check if still in recovery */
+ am_cascading_walsender = RecoveryInProgress();

The first part of the comment explains why it is important to check
RecoveryInProgress() and then immediately after that, the patch
invokes WalSndWaitForWal(). It may be better to move the comment after
WalSndWaitForWal() invocation.


Good catch, thanks! done in V61.


Also, it will be better to write a
comment as to why you need to do WalSndWaitForWal() before retrieving
the current timeline as previously that was done afterward.



Agree, done in V61.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 1bdbea718682cce4953310314759863302b0c0ea Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 14:08:11 +
Subject: [PATCH v61 6/6] Doc changes describing details about logical
 decoding.

Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/logicaldecoding.sgml | 22 ++
 1 file changed, 22 insertions(+)
 100.0% doc/src/sgml/

diff --git a/doc/src/sgml/logicaldecoding.sgml 
b/doc/src/sgml/logicaldecoding.sgml
index 4e912b4bd4..3da254ed1f 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -316,6 +316,28 @@ postgres=# select * from 
pg_logical_slot_get_changes('regression_slot', NULL, NU
  may consume changes from a slot at any given time.
 
 
+
+ A logical replication slot can also be created on a hot standby. To 
prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot gets
+ invalidated. It's highly recommended to use a physical slot between the 
primary
+ and the standby. Otherwise, hot_standby_feedback will work, but only 
while the
+ connection is alive (for example a node restart would break it). Existing
+ logical slots on standby also get invalidated if wal_level on primary is 
reduced to
+ less than 'logical'.
+
+
+
+ For a logical slot to be created, it builds a historic snapshot, for which
+ information of all the currently running transactions is essential. On
+ primary, this information is available, but on standby, this information
+ has to be obtained from primary. So, slot creation may wait for some
+ activity to happen on the primary. If the primary is idle, creating a
+ logical slot on standby may take a noticeable time. One option to speed it
+ is to call the pg_log_standby_snapshot on the 
primary.
+

Re: Minimal logical decoding on standbys

2023-04-05 Thread Robert Haas
On Tue, Apr 4, 2023 at 8:33 PM Jeff Davis  wrote:
> Perhaps a commit message like:
>
> "For cascading replication, wake up physical walsenders separately from
> logical walsenders.
>
> Physical walsenders can't send data until it's been flushed; logical
> walsenders can't decode and send data until it's been applied. On the
> standby, the WAL is flushed first, which will only wake up physical
> walsenders; and then applied, which will only wake up logical
> walsenders.
>
> Previously, all walsenders were awakened when the WAL was flushed. That
> was fine for logical walsenders on the primary; but on the standby the
> flushed WAL would not have been applied yet, so logical walsenders were
> awakened too early."

This sounds great. I think it's very clear about what is being changed
and why. I see that Bertrand already pulled this language into v60.

> For comments, I agree that WalSndWakeup() clearly needs a comment
> update. The call site in ApplyWalRecord() could also use a comment. You
> could add a comment at every call site, but I don't think that's
> necessary if there's a good comment over WalSndWakeup().

Right, we don't want to go overboard, but I think putting some of the
text you wrote above for the commit message, or something with a
similar theme, in the comment for WalSndWakeup() would be quite
helpful. We want people to understand why the physical and logical
cases are different.

I agree with you that ApplyWalRecord() is the other place where we
need a good comment. I think the one in v60 needs more word-smithing.
It should probably be a bit more detailed and clear about not only
what we're doing but why we're doing it.

The comment in InitWalSenderSlot() seems like it might be slightly
overdone, but I don't have a big problem with it so if we leave it
as-is that's fine.

Now that I understand what's going on here a bit better, I'm inclined
to think that this patch is basically fine. At least, I don't see any
obvious problem with it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Minimal logical decoding on standbys

2023-04-05 Thread Amit Kapila
On Wed, Apr 5, 2023 at 6:14 PM Drouvot, Bertrand
 wrote:
>
> On 4/5/23 12:28 PM, Amit Kapila wrote:
> > On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand
> >  wrote:
>
> > minor nitpick:
> > +
> > + /* Intentional fall through to session cancel */
> > + /* FALLTHROUGH */
> >
> > Do we need to repeat fall through twice in different ways?
> >
>
> Do you mean, you'd prefer what was done in v52/0002?
>

No, I was thinking that instead of two comments, we need one here.
But, now thinking about it, do we really need to fall through in this
case, if so why? Shouldn't this case be handled after
PROCSIG_RECOVERY_CONFLICT_DATABASE?

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-05 Thread Drouvot, Bertrand

Hi,

On 4/5/23 12:28 PM, Amit Kapila wrote:

On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand
 wrote:

Maybe we could change the doc with something among those lines instead?

"
Existing logical slots on standby also get invalidated if wal_level on primary 
is reduced to
less than 'logical'. This is done as soon as the standby detects such a change 
in the WAL stream.

It means, that for walsenders that are lagging (if any), some WAL records up to 
the parameter change on the
primary won't be decoded".

I don't know whether this is what one would expect but that should be less of a 
surprise if documented.

What do you think?



Yeah, I think it is better to document to avoid any surprises if
nobody else sees any problem with it.


Ack.


BTW, another thought that
crosses my mind is that let's not invalidate the slots when the
standby startup process processes parameter_change record and rather
do it when walsender decodes the parameter_change record, if we think
that is safe. I have shared this as this crosses my mind while
thinking about this part of the patch and wanted to validate my
thoughts, we don't need to change even if the idea is valid.



I think this is a valid idea but I think I do prefer the current one (where the
startup process triggers the invalidations) because:

  - I think this is better to invalidate as soon as possible. In case of 
inactive logical
replication slot (walsenders stopped) it could take time to get "notified". 
While with the current
approach you'd get notified in the logfile and pg_replication_slots even if 
walsenders are stopped.

  - This is not a "slot" dependent invalidation (as opposed to the xid 
invalidations case)

  - This is "somehow" the same behavior as on the primary: if one change the 
wal_level to be < logical
then the engine will not start (if logical slot in place). Then what has been 
decoded is until the time
the engine has been stopped. So if there is walsender lag, you'd not see some 
records.


minor nitpick:
+
+ /* Intentional fall through to session cancel */
+ /* FALLTHROUGH */

Do we need to repeat fall through twice in different ways?



Do you mean, you'd prefer what was done in v52/0002?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-05 Thread Amit Kapila
On Wed, Apr 5, 2023 at 3:58 PM Amit Kapila  wrote:
>
> On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand
>  wrote:
>
> minor nitpick:
> +
> + /* Intentional fall through to session cancel */
> + /* FALLTHROUGH */
>
> Do we need to repeat fall through twice in different ways?
>

Few minor comments on 0003:

1.
+ case XLOG_PARAMETER_CHANGE:
+ {
+ xl_parameter_change *xlrec =
+ (xl_parameter_change *) XLogRecGetData(buf->record);
+
+ /*
+ * If wal_level on primary is reduced to less than logical,
+ * then we want to prevent existing logical slots from being
+ * used. Existing logical slots on standby get invalidated
+ * when this WAL record is replayed; and further, slot
+ * creation fails when the wal level is not sufficient; but
+ * all these operations are not synchronized, so a logical
+ * slot may creep in while the wal_level is being reduced.
+ * Hence this extra check.
+ */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires wal_level to be at
least logical on the primary server")));

By looking at this change, it is not very clear that this can occur
only on standby. I understand that on primary, we will not allow
restarting the server after changing wal_level if there is a
pre-existing slot but still this looks a bit odd. Shall we have an
Assert to indicate that this will occur only on standby?

2.
/*
- * Since logical decoding is only permitted on a primary server, we know
- * that the current timeline ID can't be changing any more. If we did this
- * on a standby, we'd have to worry about the values we compute here
- * becoming invalid due to a promotion or timeline change.
+ * Since logical decoding is also permitted on a standby server, we need
+ * to check if the server is in recovery to decide how to get the current
+ * timeline ID (so that it also cover the promotion or timeline change
+ * cases).
  */
+
+ /* make sure we have enough WAL available */
+ flushptr = WalSndWaitForWal(targetPagePtr + reqLen);
+
+ /* the standby could have been promoted, so check if still in recovery */
+ am_cascading_walsender = RecoveryInProgress();

The first part of the comment explains why it is important to check
RecoveryInProgress() and then immediately after that, the patch
invokes WalSndWaitForWal(). It may be better to move the comment after
WalSndWaitForWal() invocation. Also, it will be better to write a
comment as to why you need to do WalSndWaitForWal() before retrieving
the current timeline as previously that was done afterward.

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-05 Thread Amit Kapila
On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand
 wrote:
>
> On 4/5/23 8:59 AM, Amit Kapila wrote:
> > On Mon, Apr 3, 2023 at 12:05 PM Amit Kapila  wrote:
>
> > On further thinking, as such this shouldn't be a problem because all
> > the WAL records before PARAMETER_CHANGE record will have sufficient
> > information so that they can get decoded. However, with the current
> > approach, the subscriber may not even receive the valid records before
> > PARAMETER_CHANGE record. This is because startup process will
> > terminate the walsenders while invaliding the slots and after restart
> > the walsenders will exit because the corresponding slot will be an
> > invalid slot. So, it is quite possible that walsender was lagging and
> > wouldn't have sent records before the PARAMETER_CHANGE record making
> > subscriber never receive those records that it should have received.
>
> Agree that would behave that way.
>
> > I don't know whether this is what one would expect.
>
> If one change wal_level to < logical on the primary, he should at least
> know that:
>
> "
> Existing
> + logical slots on standby also get invalidated if wal_level on primary 
> is reduced to
> + less than 'logical'.
> "
>
> If the doc has been read (as the quote above is coming from 0006).
>
> I think that what is missing is the "when" the slots are invalidated.
>
> Maybe we could change the doc with something among those lines instead?
>
> "
> Existing logical slots on standby also get invalidated if wal_level on 
> primary is reduced to
> less than 'logical'. This is done as soon as the standby detects such a 
> change in the WAL stream.
>
> It means, that for walsenders that are lagging (if any), some WAL records up 
> to the parameter change on the
> primary won't be decoded".
>
> I don't know whether this is what one would expect but that should be less of 
> a surprise if documented.
>
> What do you think?
>

Yeah, I think it is better to document to avoid any surprises if
nobody else sees any problem with it. BTW, another thought that
crosses my mind is that let's not invalidate the slots when the
standby startup process processes parameter_change record and rather
do it when walsender decodes the parameter_change record, if we think
that is safe. I have shared this as this crosses my mind while
thinking about this part of the patch and wanted to validate my
thoughts, we don't need to change even if the idea is valid.

minor nitpick:
+
+ /* Intentional fall through to session cancel */
+ /* FALLTHROUGH */

Do we need to repeat fall through twice in different ways?

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-05 Thread Drouvot, Bertrand

Hi,

On 4/5/23 8:59 AM, Amit Kapila wrote:

On Mon, Apr 3, 2023 at 12:05 PM Amit Kapila  wrote:



On further thinking, as such this shouldn't be a problem because all
the WAL records before PARAMETER_CHANGE record will have sufficient
information so that they can get decoded. However, with the current
approach, the subscriber may not even receive the valid records before
PARAMETER_CHANGE record. This is because startup process will
terminate the walsenders while invaliding the slots and after restart
the walsenders will exit because the corresponding slot will be an
invalid slot. So, it is quite possible that walsender was lagging and
wouldn't have sent records before the PARAMETER_CHANGE record making
subscriber never receive those records that it should have received.


Agree that would behave that way.


I don't know whether this is what one would expect.


If one change wal_level to < logical on the primary, he should at least
know that:

"
Existing
+ logical slots on standby also get invalidated if wal_level on primary is 
reduced to
+ less than 'logical'.
"

If the doc has been read (as the quote above is coming from 0006).

I think that what is missing is the "when" the slots are invalidated.

Maybe we could change the doc with something among those lines instead?

"
Existing logical slots on standby also get invalidated if wal_level on primary 
is reduced to
less than 'logical'. This is done as soon as the standby detects such a change 
in the WAL stream.

It means, that for walsenders that are lagging (if any), some WAL records up to 
the parameter change on the
primary won't be decoded".

I don't know whether this is what one would expect but that should be less of a 
surprise if documented.

What do you think?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-05 Thread Drouvot, Bertrand

Hi,

On 4/5/23 2:33 AM, Jeff Davis wrote:

On Tue, 2023-04-04 at 14:55 -0400, Robert Haas wrote:

Thanks for your continued work on $SUBJECT. I just took a look at
0004,


Thanks Robert for the feedback!


and I think that at the very least the commit message needs
work.


Agree.


Perhaps a commit message like:

"For cascading replication, wake up physical walsenders separately from
logical walsenders.

Physical walsenders can't send data until it's been flushed; logical
walsenders can't decode and send data until it's been applied. On the
standby, the WAL is flushed first, which will only wake up physical
walsenders; and then applied, which will only wake up logical
walsenders.

Previously, all walsenders were awakened when the WAL was flushed. That
was fine for logical walsenders on the primary; but on the standby the
flushed WAL would not have been applied yet, so logical walsenders were
awakened too early."


Thanks Jeff for the commit message proposal! It looks good to me
except that I think that "flushed WAL could have been not applied yet" is 
better than
"flushed WAL would not have been applied yet" but it's obviously open to 
discussion.

Currently changed it that way and used it in V60 shared up-thread.



For comments, I agree that WalSndWakeup() clearly needs a comment
update. The call site in ApplyWalRecord() could also use a comment. You
could add a comment at every call site, but I don't think that's
necessary if there's a good comment over WalSndWakeup().


Agree, added a comment over WalSndWakeup() and one before calling WalSndWakeup()
in ApplyWalRecord() in V60.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-05 Thread Drouvot, Bertrand

Hi,

On 4/4/23 8:13 PM, Jeff Davis wrote:

On Tue, 2023-04-04 at 11:42 +0200, Drouvot, Bertrand wrote:

Done in V58 and now this is as simple as:



Minor comments on 0004 (address if you agree):



Thanks for the review!


* Consider static inline for WalSndWakeupProcessRequests()?


Agree and done in V60 just shared up-thread.


* Is the WalSndWakeup() in KeepFileRestoredFromArchive() more like the
flush case? Why is the second argument unconditionally true? I don't
think the cascading logical walsenders have anything to do until the
WAL is actually applied.



Agree and changed it to "(true, false)" in V60.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-05 Thread Drouvot, Bertrand

Hi,

On 4/4/23 7:53 PM, Andres Freund wrote:

Hi,

On 2023-04-04 18:54:33 +0200, Drouvot, Bertrand wrote:

 if (check_on_xid)
 {
 if (terminating)
 appendStringInfo(_msg, _("terminating process %d to release replication slot 
\"%s\" because it conflicts with recovery"),
  pid,
  NameStr(slotname));


FWIW, I would just use exactly the same error message as today here.

errmsg("terminating process %d to release 
replication slot \"%s\"",
   active_pid, 
NameStr(slotname)),

This is accurate for both the existing and the new case. Then there's no need
to put that string into a stringinfo either.



Right, thanks! Did it that way in V60 attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 4861568b10fe2187d38f2997ad916a984918aa5b Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 14:08:11 +
Subject: [PATCH v60 6/6] Doc changes describing details about logical
 decoding.

Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/logicaldecoding.sgml | 22 ++
 1 file changed, 22 insertions(+)
 100.0% doc/src/sgml/

diff --git a/doc/src/sgml/logicaldecoding.sgml 
b/doc/src/sgml/logicaldecoding.sgml
index 4e912b4bd4..3da254ed1f 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -316,6 +316,28 @@ postgres=# select * from 
pg_logical_slot_get_changes('regression_slot', NULL, NU
  may consume changes from a slot at any given time.
 
 
+
+ A logical replication slot can also be created on a hot standby. To 
prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot gets
+ invalidated. It's highly recommended to use a physical slot between the 
primary
+ and the standby. Otherwise, hot_standby_feedback will work, but only 
while the
+ connection is alive (for example a node restart would break it). Existing
+ logical slots on standby also get invalidated if wal_level on primary is 
reduced to
+ less than 'logical'.
+
+
+
+ For a logical slot to be created, it builds a historic snapshot, for which
+ information of all the currently running transactions is essential. On
+ primary, this information is available, but on standby, this information
+ has to be obtained from primary. So, slot creation may wait for some
+ activity to happen on the primary. If the primary is idle, creating a
+ logical slot on standby may take a noticeable time. One option to speed it
+ is to call the pg_log_standby_snapshot on the 
primary.
+
+
 
  
   Replication slots persist across crashes and know nothing about the state
-- 
2.34.1

From 3dd93d8dbc601accedceae199e539ba74252e092 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 09:04:12 +
Subject: [PATCH v60 5/6] New TAP test for logical decoding on standby.

In addition to the new TAP test, this commit introduces a new 
pg_log_standby_snapshot()
function.

The idea is to be able to take a snapshot of running transactions and write this
to WAL without requesting for a (costly) checkpoint.

Author: Craig Ringer (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/func.sgml|  15 +
 src/backend/access/transam/xlogfuncs.c|  32 +
 src/backend/catalog/system_functions.sql  |   2 +
 src/include/catalog/pg_proc.dat   |   3 +
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  37 +
 src/test/recovery/meson.build |   1 +
 .../t/035_standby_logical_decoding.pl | 705 ++
 7 files changed, 795 insertions(+)
   3.1% src/backend/
   4.0% src/test/perl/PostgreSQL/Test/
  89.7% src/test/recovery/t/

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 918a492234..939fb8019f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27034,6 +27034,21 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * 
ps.setting::int + :offset
 prepared with .

   
+  
+   
+
+ pg_log_standby_snapshot
+
+pg_log_standby_snapshot ()
+pg_lsn
+   
+   
+Take a snapshot of running transactions and write this to WAL without
+having to wait bgwriter or checkpointer to log one. This one is useful 
for
+logical decoding on standby for which logical slot creation is hanging
+until such a record is 

Re: Minimal logical decoding on standbys

2023-04-05 Thread Amit Kapila
On Mon, Apr 3, 2023 at 12:05 PM Amit Kapila  wrote:
>
> On Mon, Apr 3, 2023 at 4:39 AM Alvaro Herrera  wrote:
> >
> > > From 56a9559555918a99c202a0924f7b2ede9de4e75d Mon Sep 17 00:00:00 2001
> > > From: bdrouvotAWS 
> > > Date: Tue, 7 Feb 2023 08:59:47 +
> > > Subject: [PATCH v52 3/6] Allow logical decoding on standby.
> > >
> > > Allow a logical slot to be created on standby. Restrict its usage
> > > or its creation if wal_level on primary is less than logical.
> > > During slot creation, it's restart_lsn is set to the last replayed
> > > LSN. Effectively, a logical slot creation on standby waits for an
> > > xl_running_xact record to arrive from primary.
> >
> > Hmm, not sure if it really applies here, but this sounds similar to
> > issues with track_commit_timestamps: namely, if the primary has it
> > enabled and you start a standby with it enabled, that's fine; but if the
> > primary is later shut down (but the standby isn't) and then the primary
> > restarted with a lesser value, then the standby would misbehave without
> > any obvious errors.
> >
>
> IIUC, the patch deals it by invalidating logical slots while replaying
> the XLOG_PARAMETER_CHANGE record on standby. Then later during
> decoding, if it encounters XLOG_PARAMETER_CHANGE, and wal_level from
> primary has been reduced, it will return an error. There is a race
> condition here as explained in the patch as follows:
>
> + /*
> + * If wal_level on primary is reduced to less than logical, then we
> + * want to prevent existing logical slots from being used.
> + * Existing logical slots on standby get invalidated when this WAL
> + * record is replayed; and further, slot creation fails when the
> + * wal level is not sufficient; but all these operations are not
> + * synchronized, so a logical slot may creep in while the wal_level
> + * is being reduced. Hence this extra check.
> + */
> + if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("logical decoding on standby requires "
> + "wal_level >= logical on master")));
>
> Now, during this race condition, say not only does a logical slot
> creep in but also one tries to decode WAL using the same then some
> misbehavior is expected. I have not tried this so not sure if this is
> really a problem but are you worried about something along those
> lines?
>

On further thinking, as such this shouldn't be a problem because all
the WAL records before PARAMETER_CHANGE record will have sufficient
information so that they can get decoded. However, with the current
approach, the subscriber may not even receive the valid records before
PARAMETER_CHANGE record. This is because startup process will
terminate the walsenders while invaliding the slots and after restart
the walsenders will exit because the corresponding slot will be an
invalid slot. So, it is quite possible that walsender was lagging and
wouldn't have sent records before the PARAMETER_CHANGE record making
subscriber never receive those records that it should have received. I
don't know whether this is what one would expect.

One other observation is that once this error has been raised both
standby and subscriber will keep on getting this error in the loop
unless the user manually disables the subscription on the subscriber.

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-04 Thread Andres Freund
Hi,

On 2023-04-04 17:33:25 -0700, Jeff Davis wrote:
> On Tue, 2023-04-04 at 14:55 -0400, Robert Haas wrote:
> >  That's presumably OK, in the
> > sense that they'll go back to sleep and eventually wake up again, but
> > it means they might end up chronically behind sending out WAL to
> > cascading standbys.
> 
> Without 0004, cascading logical walsenders would have worse wakeup
> behavior than logical walsenders on the primary. Assuming the fix is
> small in scope and otherwise acceptable, I think it belongs as a part
> of this overall series.

FWIW, personally, I wouldn't feel ok with committing 0003 without 0004. And
IMO they ought to be committed the other way round. The stalls you *can* get,
depending on the speed of WAL apply and OS scheduling, can be long.

This is actually why a predecessor version of the feature had a bunch of
sleeps and retries in the tests, just to avoid those stalls. Obviously that's
not a good path...

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-04 Thread Jeff Davis
On Tue, 2023-04-04 at 14:55 -0400, Robert Haas wrote:
> Thanks for your continued work on $SUBJECT. I just took a look at
> 0004, and I think that at the very least the commit message needs
> work. Nobody who is not a hacker is going to understand what problem
> this is fixing, because it makes reference to the names of functions
> and structure members rather than user-visible behavior. In fact, I'm
> not really sure that I understand the problem myself. It seems like
> the problem is that on a standby, WAL senders will get woken up too
> early, before we have any WAL to send.

Logical walsenders on the standby, specifically, which didn't exist
before this patch series.

>  That's presumably OK, in the
> sense that they'll go back to sleep and eventually wake up again, but
> it means they might end up chronically behind sending out WAL to
> cascading standbys.

Without 0004, cascading logical walsenders would have worse wakeup
behavior than logical walsenders on the primary. Assuming the fix is
small in scope and otherwise acceptable, I think it belongs as a part
of this overall series.

> If that's right, I think it should be spelled out
> more clearly in the commit message, and maybe also in the code
> comments.

Perhaps a commit message like:

"For cascading replication, wake up physical walsenders separately from
logical walsenders.

Physical walsenders can't send data until it's been flushed; logical
walsenders can't decode and send data until it's been applied. On the
standby, the WAL is flushed first, which will only wake up physical
walsenders; and then applied, which will only wake up logical
walsenders.

Previously, all walsenders were awakened when the WAL was flushed. That
was fine for logical walsenders on the primary; but on the standby the
flushed WAL would not have been applied yet, so logical walsenders were
awakened too early."

(I'm not sure if I quite got the verb tenses right.)

For comments, I agree that WalSndWakeup() clearly needs a comment
update. The call site in ApplyWalRecord() could also use a comment. You
could add a comment at every call site, but I don't think that's
necessary if there's a good comment over WalSndWakeup().

Regards,
Jeff Davis





Re: Minimal logical decoding on standbys

2023-04-04 Thread Robert Haas
On Tue, Apr 4, 2023 at 5:44 AM Drouvot, Bertrand
 wrote:
> Oh right, even better, thanks!
> Done in V58 and now this is as simple as:
>
> +   if (DoNotInvalidateSlot(s, xid, ))
>  {
>  /* then, we are not forcing for invalidation */

Thanks for your continued work on $SUBJECT. I just took a look at
0004, and I think that at the very least the commit message needs
work. Nobody who is not a hacker is going to understand what problem
this is fixing, because it makes reference to the names of functions
and structure members rather than user-visible behavior. In fact, I'm
not really sure that I understand the problem myself. It seems like
the problem is that on a standby, WAL senders will get woken up too
early, before we have any WAL to send. That's presumably OK, in the
sense that they'll go back to sleep and eventually wake up again, but
it means they might end up chronically behind sending out WAL to
cascading standbys. If that's right, I think it should be spelled out
more clearly in the commit message, and maybe also in the code
comments.

But the weird thing is that most (all?) of the patch doesn't seem to
be about that issue at all. Instead, it's about separating wakeups of
physical walsenders from wakeups of logical walsenders. I don't see
how that could ever fix the kind of problem I mentioned in the
preceding paragraph, so my guess is that this is a separate change.
But this change doesn't really seem adequately justified. The commit
message says that it "helps to filter what kind of walsender
we want to wakeup based on the code path" but that's awfully vague
about what the actual benefit is. I wonder whether many people have a
mix of physical and logical systems connecting to the same machine
such that this would even help, and if they do have that, would this
really do enough to solve any performance problem that might be caused
by too many wakeups?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Minimal logical decoding on standbys

2023-04-04 Thread Jeff Davis
On Tue, 2023-04-04 at 11:42 +0200, Drouvot, Bertrand wrote:
> Done in V58 and now this is as simple as:


Minor comments on 0004 (address if you agree):

* Consider static inline for WalSndWakeupProcessRequests()?
* Is the WalSndWakeup() in KeepFileRestoredFromArchive() more like the
flush case? Why is the second argument unconditionally true? I don't
think the cascading logical walsenders have anything to do until the
WAL is actually applied.

Otherwise, looks good!

Regards,
Jeff Davis





Re: Minimal logical decoding on standbys

2023-04-04 Thread Andres Freund
Hi,

On 2023-04-04 18:54:33 +0200, Drouvot, Bertrand wrote:
> if (check_on_xid)
> {
> if (terminating)
> appendStringInfo(_msg, _("terminating process %d to release 
> replication slot \"%s\" because it conflicts with recovery"),
>  pid,
>  NameStr(slotname));

FWIW, I would just use exactly the same error message as today here.

errmsg("terminating process %d 
to release replication slot \"%s\"",
   active_pid, 
NameStr(slotname)),

This is accurate for both the existing and the new case. Then there's no need
to put that string into a stringinfo either.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-04 Thread Andres Freund
Hi,

On 2023-04-04 13:21:38 +0200, Alvaro Herrera wrote:
> On 2023-Apr-03, Andres Freund wrote:
> 
> > Hm? That's what the _'s do. We build strings in parts in other places too.
> 
> No, what _() does is mark each piece for translation separately.  But a
> translation cannot be done on string pieces, and later have all the
> pieces appended together to form a full sentence.  Let me show the
> "!terminating" case as example and grab some translations for it from
> src/backend/po/de.po:
> 
> "invalidating" -> "... wird ungültig gemacht" (?)
> 
> (if logical) " obsolete replication" -> " obsolete Replikation"
> 
> " slot \"%s\" because it conflicts with recovery" -> " Slot \"%s\", weil sie 
> in Konflikt mit Wiederherstellung steht"
> 
> If you just concatenate all the translated phrases together, the
> resulting string will make no sense; keep in mind the "obsolete
> replication" part may or not may not be there.  And there's no way to
> make that work: even if you found an ordering of the English parts that
> allows you to translate each piece separately and have it make sense for
> German, the same won't work for Spanish or Japanese.

Ah, I misunderstood the angle you're coming from. Yes, the pieces need to be
reasonable fragments, instead of half-sentences.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-04 Thread Drouvot, Bertrand

Hi,

On 4/4/23 1:21 PM, Alvaro Herrera wrote:

Hi,

On 2023-Apr-03, Andres Freund wrote:


Hm? That's what the _'s do. We build strings in parts in other places too.


No, what _() does is mark each piece for translation separately.  But a
translation cannot be done on string pieces, and later have all the
pieces appended together to form a full sentence.  Let me show the
"!terminating" case as example and grab some translations for it from
src/backend/po/de.po:

"invalidating" -> "... wird ungültig gemacht" (?)

(if logical) " obsolete replication" -> " obsolete Replikation"

" slot \"%s\" because it conflicts with recovery" -> " Slot \"%s\", weil sie in 
Konflikt mit Wiederherstellung steht"

If you just concatenate all the translated phrases together, the
resulting string will make no sense; keep in mind the "obsolete
replication" part may or not may not be there.  And there's no way to
make that work: even if you found an ordering of the English parts that
allows you to translate each piece separately and have it make sense for
German, the same won't work for Spanish or Japanese.

You have to give the translator a complete phrase and let them turn into
a complete translated phrases.  Building from parts doesn't work.  We're
very good at avoiding string building; we have a couple of cases, but
they are *very* minor.

string 1 "invalidating slot \"%s\" because it conflicts with recovery"

string 2 "invalidating obsolete replication slot \"%s\" because it conflicts with 
recovery"



Thanks for looking at it and the explanations!


(I'm not clear on why did Bertrand omitted the word "replication" in the
case where the slot is not logical)


It makes more sense to add it, will do thanks!



I think the errdetail() are okay, it's the errmsg() bits that are bogus.
 

And yes, well caught on having to use errmsg_internal and
errdetail_internal() to avoid double translation.



So, IIUC having something like this would be fine?

"
if (check_on_xid)
{
if (terminating)
appendStringInfo(_msg, _("terminating process %d to release replication slot 
\"%s\" because it conflicts with recovery"),
 pid,
 NameStr(slotname));
else
appendStringInfo(_msg, _("invalidating replication slot \"%s\" 
because it conflicts with recovery"),
 NameStr(slotname));

if (TransactionIdIsValid(*xid))
appendStringInfo(_detail, _("The slot conflicted with xid horizon 
%u."), *xid);
else
appendStringInfo(_detail, _("Logical decoding on standby requires 
wal_level to be at least logical on the primary server"));
}
else
{
if (terminating)
appendStringInfo(_msg, _("terminating process %d to release replication slot 
\"%s\""),
 pid,
 NameStr(slotname));
else
appendStringInfo(_msg, _("invalidating obsolete replication slot 
\"%s\""),
 NameStr(slotname));

appendStringInfo(_detail, _("The slot's restart_lsn %X/%X exceeds the 
limit by %llu bytes."),
 LSN_FORMAT_ARGS(restart_lsn),
 (unsigned long long) (oldestLSN - restart_lsn));

hint = true;
}

ereport(LOG,
errmsg_internal("%s", err_msg.data),
errdetail_internal("%s", err_detail.data),
hint ? errhint("You might need to increase 
max_slot_wal_keep_size.") : 0);
"

as err_msg is not concatenated anymore (I mean it's just one sentence build one 
time)
and this make use of errmsg_internal() and errdetail_internal().

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-04 Thread Drouvot, Bertrand

Hi,

On 4/4/23 3:43 PM, Amit Kapila wrote:

On Tue, Apr 4, 2023 at 6:05 PM Drouvot, Bertrand

I think we might want to consider slot's effective_xmin instead of
data.xmin as we use that to store xmin_horizon when we build the full
snapshot.



Oh, I did not know about the 'effective_xmin' and was going to rely only on the 
catalog xmin.

Reading the comment in the ReplicationSlot struct about the 'effective_xmin' I 
do think it makes sense to use it
(instead of data.xmin).

Please find attached v59 doing so.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom fad28278fa13bf3564c878aba57fb6d1e6623d59 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 14:08:11 +
Subject: [PATCH v59 6/6] Doc changes describing details about logical
 decoding.

Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/logicaldecoding.sgml | 22 ++
 1 file changed, 22 insertions(+)
 100.0% doc/src/sgml/

diff --git a/doc/src/sgml/logicaldecoding.sgml 
b/doc/src/sgml/logicaldecoding.sgml
index 4e912b4bd4..3da254ed1f 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -316,6 +316,28 @@ postgres=# select * from 
pg_logical_slot_get_changes('regression_slot', NULL, NU
  may consume changes from a slot at any given time.
 
 
+
+ A logical replication slot can also be created on a hot standby. To 
prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot gets
+ invalidated. It's highly recommended to use a physical slot between the 
primary
+ and the standby. Otherwise, hot_standby_feedback will work, but only 
while the
+ connection is alive (for example a node restart would break it). Existing
+ logical slots on standby also get invalidated if wal_level on primary is 
reduced to
+ less than 'logical'.
+
+
+
+ For a logical slot to be created, it builds a historic snapshot, for which
+ information of all the currently running transactions is essential. On
+ primary, this information is available, but on standby, this information
+ has to be obtained from primary. So, slot creation may wait for some
+ activity to happen on the primary. If the primary is idle, creating a
+ logical slot on standby may take a noticeable time. One option to speed it
+ is to call the pg_log_standby_snapshot on the 
primary.
+
+
 
  
   Replication slots persist across crashes and know nothing about the state
-- 
2.34.1

From 9e038e69816a1b0722c15515dbfbef3310198e39 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 09:04:12 +
Subject: [PATCH v59 5/6] New TAP test for logical decoding on standby.

In addition to the new TAP test, this commit introduces a new 
pg_log_standby_snapshot()
function.

The idea is to be able to take a snapshot of running transactions and write this
to WAL without requesting for a (costly) checkpoint.

Author: Craig Ringer (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/func.sgml|  15 +
 src/backend/access/transam/xlogfuncs.c|  32 +
 src/backend/catalog/system_functions.sql  |   2 +
 src/include/catalog/pg_proc.dat   |   3 +
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  37 +
 src/test/recovery/meson.build |   1 +
 .../t/035_standby_logical_decoding.pl | 705 ++
 7 files changed, 795 insertions(+)
   3.1% src/backend/
   4.0% src/test/perl/PostgreSQL/Test/
  89.7% src/test/recovery/t/

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 918a492234..939fb8019f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27034,6 +27034,21 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * 
ps.setting::int + :offset
 prepared with .

   
+  
+   
+
+ pg_log_standby_snapshot
+
+pg_log_standby_snapshot ()
+pg_lsn
+   
+   
+Take a snapshot of running transactions and write this to WAL without
+having to wait bgwriter or checkpointer to log one. This one is useful 
for
+logical decoding on standby for which logical slot creation is hanging
+until such a record is replayed on the standby.
+   
+  
  
 

diff --git a/src/backend/access/transam/xlogfuncs.c 
b/src/backend/access/transam/xlogfuncs.c
index c07daa874f..481e9a47da 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -38,6 +38,7 @@
 #include "utils/pg_lsn.h"
 

Re: Minimal logical decoding on standbys

2023-04-04 Thread Amit Kapila
On Tue, Apr 4, 2023 at 6:05 PM Drouvot, Bertrand
 wrote:
>
> On 4/4/23 1:43 PM, Amit Kapila wrote:
> > On Tue, Apr 4, 2023 at 3:14 PM Drouvot, Bertrand
> >  wrote:
> >>
> >
> >
> > +static inline bool
> > +LogicalReplicationSlotXidsConflict(ReplicationSlot *s, TransactionId xid)
> > +{
> > + TransactionId slot_xmin;
> > + TransactionId slot_catalog_xmin;
> > +
> > + slot_xmin = s->data.xmin;
> > + slot_catalog_xmin = s->data.catalog_xmin;
> > +
> > + return (((TransactionIdIsValid(slot_xmin) &&
> > TransactionIdPrecedesOrEquals(slot_xmin, xid)) ||
> >
> > For logical slots, slot->data.xmin will always be an
> > InvalidTransactionId. It will only be set/updated for physical slots.
> > So, it is not clear to me why in this and other related functions, you
> > are referring to and or invalidating it.
> >
>
> I think you're right that invalidating/checking only on the catalog xmin is
> enough for logical slot (I'm not sure how I ended up taking the xmin into 
> account but
> that seems useless indeed).
>

I think we might want to consider slot's effective_xmin instead of
data.xmin as we use that to store xmin_horizon when we build the full
snapshot.

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-04 Thread Drouvot, Bertrand

Hi,

On 4/4/23 1:43 PM, Amit Kapila wrote:

On Tue, Apr 4, 2023 at 3:14 PM Drouvot, Bertrand
 wrote:





+static inline bool
+LogicalReplicationSlotXidsConflict(ReplicationSlot *s, TransactionId xid)
+{
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+
+ return (((TransactionIdIsValid(slot_xmin) &&
TransactionIdPrecedesOrEquals(slot_xmin, xid)) ||

For logical slots, slot->data.xmin will always be an
InvalidTransactionId. It will only be set/updated for physical slots.
So, it is not clear to me why in this and other related functions, you
are referring to and or invalidating it.



I think you're right that invalidating/checking only on the catalog xmin is
enough for logical slot (I'm not sure how I ended up taking the xmin into 
account but
that seems useless indeed).

I'll submit a new version to deal with the catalog xmin only, thanks!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-04 Thread Amit Kapila
On Tue, Apr 4, 2023 at 3:14 PM Drouvot, Bertrand
 wrote:
>


+static inline bool
+LogicalReplicationSlotXidsConflict(ReplicationSlot *s, TransactionId xid)
+{
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+
+ return (((TransactionIdIsValid(slot_xmin) &&
TransactionIdPrecedesOrEquals(slot_xmin, xid)) ||

For logical slots, slot->data.xmin will always be an
InvalidTransactionId. It will only be set/updated for physical slots.
So, it is not clear to me why in this and other related functions, you
are referring to and or invalidating it.

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-04 Thread Alvaro Herrera
Hi,

On 2023-Apr-03, Andres Freund wrote:

> Hm? That's what the _'s do. We build strings in parts in other places too.

No, what _() does is mark each piece for translation separately.  But a
translation cannot be done on string pieces, and later have all the
pieces appended together to form a full sentence.  Let me show the
"!terminating" case as example and grab some translations for it from
src/backend/po/de.po:

"invalidating" -> "... wird ungültig gemacht" (?)

(if logical) " obsolete replication" -> " obsolete Replikation"

" slot \"%s\" because it conflicts with recovery" -> " Slot \"%s\", weil sie in 
Konflikt mit Wiederherstellung steht"

If you just concatenate all the translated phrases together, the
resulting string will make no sense; keep in mind the "obsolete
replication" part may or not may not be there.  And there's no way to
make that work: even if you found an ordering of the English parts that
allows you to translate each piece separately and have it make sense for
German, the same won't work for Spanish or Japanese.

You have to give the translator a complete phrase and let them turn into
a complete translated phrases.  Building from parts doesn't work.  We're
very good at avoiding string building; we have a couple of cases, but
they are *very* minor.

string 1 "invalidating slot \"%s\" because it conflicts with recovery"

string 2 "invalidating obsolete replication slot \"%s\" because it conflicts 
with recovery"

(I'm not clear on why did Bertrand omitted the word "replication" in the
case where the slot is not logical)

I think the errdetail() are okay, it's the errmsg() bits that are bogus.

And yes, well caught on having to use errmsg_internal and
errdetail_internal() to avoid double translation.

Cheers

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: Minimal logical decoding on standbys

2023-04-04 Thread Drouvot, Bertrand

Hi,

On 4/4/23 9:48 AM, Masahiko Sawada wrote:

On Tue, Apr 4, 2023 at 10:55 AM Masahiko Sawada  wrote:



Regarding 0004 patch:

@@ -2626,6 +2626,12 @@ InitWalSenderSlot(void)
 walsnd->sync_standby_priority = 0;
 walsnd->latch = >procLatch;
 walsnd->replyTime = 0;
+
+   if (MyDatabaseId == InvalidOid)
+   walsnd->kind = REPLICATION_KIND_PHYSICAL;
+   else
+   walsnd->kind = REPLICATION_KIND_LOGICAL;
+

I think we might want to set the replication kind when processing the
START_REPLICATION command. The walsender using a logical replication
slot is not necessarily streaming (e.g. when COPYing table data).



Discussing with Bertrand off-list, it's wrong as the logical
replication slot creation also needs to read WAL records so a
walsender who is creating a logical replication slot needs to be woken
up. We can set it the replication kind when processing
START_REPLICATION and CREATE_REPLICATION_SLOT, but it seems better to
set it in one place. So I agree to set it in InitWalSenderSlot().



Thanks for the review and feedback!
Added a comment in 0004 in V58 just posted up-thread to explain the reason
why the walsnd->kind assignment is done InitWalSenderSlot().

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-04 Thread Drouvot, Bertrand

Hi,

On 4/4/23 7:57 AM, Amit Kapila wrote:

On Mon, Apr 3, 2023 at 8:51 PM Drouvot, Bertrand
 wrote:


I made it as simple as:

  /*
   * If the slot is already invalid or is a non conflicting slot, we 
don't
   * need to do anything.
   */
  islogical = xid ? true : false;

  if (SlotIsInvalid(s, islogical) || SlotIsNotConflicting(s, islogical, 
xid, ))

in V56 attached.



Here the variable 'islogical' doesn't seem to convey its actual
meaning because one can imagine that it indicates whether the slot is
logical which I don't think is the actual intent.


Good point. Just renamed it to 'check_on_xid' (as still needed outside of
the "CanInvalidateSlot" context) in V58 attached.


One idea to simplify
this is to introduce a single function CanInvalidateSlot() or
something like that and move the logic from both the functions
SlotIsInvalid() and SlotIsNotConflicting() into the new function.



Oh right, even better, thanks!
Done in V58 and now this is as simple as:

+   if (DoNotInvalidateSlot(s, xid, ))
{
/* then, we are not forcing for invalidation */


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 7c61edd93b4df1efb9723ddae41c009ccbac9f59 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 14:08:11 +
Subject: [PATCH v58 6/6] Doc changes describing details about logical
 decoding.

Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/logicaldecoding.sgml | 22 ++
 1 file changed, 22 insertions(+)
 100.0% doc/src/sgml/

diff --git a/doc/src/sgml/logicaldecoding.sgml 
b/doc/src/sgml/logicaldecoding.sgml
index 4e912b4bd4..3da254ed1f 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -316,6 +316,28 @@ postgres=# select * from 
pg_logical_slot_get_changes('regression_slot', NULL, NU
  may consume changes from a slot at any given time.
 
 
+
+ A logical replication slot can also be created on a hot standby. To 
prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot gets
+ invalidated. It's highly recommended to use a physical slot between the 
primary
+ and the standby. Otherwise, hot_standby_feedback will work, but only 
while the
+ connection is alive (for example a node restart would break it). Existing
+ logical slots on standby also get invalidated if wal_level on primary is 
reduced to
+ less than 'logical'.
+
+
+
+ For a logical slot to be created, it builds a historic snapshot, for which
+ information of all the currently running transactions is essential. On
+ primary, this information is available, but on standby, this information
+ has to be obtained from primary. So, slot creation may wait for some
+ activity to happen on the primary. If the primary is idle, creating a
+ logical slot on standby may take a noticeable time. One option to speed it
+ is to call the pg_log_standby_snapshot on the 
primary.
+
+
 
  
   Replication slots persist across crashes and know nothing about the state
-- 
2.34.1

From 14f9f51aaacd8889ef1f9853534c9303fc89f59f Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 09:04:12 +
Subject: [PATCH v58 5/6] New TAP test for logical decoding on standby.

In addition to the new TAP test, this commit introduces a new 
pg_log_standby_snapshot()
function.

The idea is to be able to take a snapshot of running transactions and write this
to WAL without requesting for a (costly) checkpoint.

Author: Craig Ringer (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/func.sgml|  15 +
 src/backend/access/transam/xlogfuncs.c|  32 +
 src/backend/catalog/system_functions.sql  |   2 +
 src/include/catalog/pg_proc.dat   |   3 +
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  37 +
 src/test/recovery/meson.build |   1 +
 .../t/035_standby_logical_decoding.pl | 705 ++
 7 files changed, 795 insertions(+)
   3.1% src/backend/
   4.0% src/test/perl/PostgreSQL/Test/
  89.7% src/test/recovery/t/

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 918a492234..939fb8019f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27034,6 +27034,21 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * 
ps.setting::int + :offset
 prepared with .

   
+  
+   
+
+ pg_log_standby_snapshot
+
+

Re: Minimal logical decoding on standbys

2023-04-04 Thread Masahiko Sawada
On Tue, Apr 4, 2023 at 10:55 AM Masahiko Sawada  wrote:
>
> On Tue, Apr 4, 2023 at 3:17 AM Drouvot, Bertrand
>  wrote:
> >
> > Hi,
> >
> > On 4/3/23 8:10 AM, Drouvot, Bertrand wrote:
> > > Hi,
> > >
> > > On 4/3/23 7:35 AM, Amit Kapila wrote:
> > >> On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis  wrote:
> > >>
> > >> Agreed, even Bertrand and myself discussed the same approach few
> > >> emails above. BTW, if we have this selective logic to wake
> > >> physical/logical walsenders and for standby's, we only wake logical
> > >> walsenders at the time of  ApplyWalRecord() then do we need the new
> > >> conditional variable enhancement being discussed, and if so, why?
> > >>
> > >
> > > Thank you both for this new idea and discussion. In that case I don't 
> > > think
> > > we need the new CV API and the use of a CV anymore. As just said 
> > > up-thread I'll submit
> > > a new proposal with this new approach.
> > >
> >
> > Please find enclosed V57 implementing the new approach in 0004.
>
> Regarding 0004 patch:
>
> @@ -2626,6 +2626,12 @@ InitWalSenderSlot(void)
> walsnd->sync_standby_priority = 0;
> walsnd->latch = >procLatch;
> walsnd->replyTime = 0;
> +
> +   if (MyDatabaseId == InvalidOid)
> +   walsnd->kind = REPLICATION_KIND_PHYSICAL;
> +   else
> +   walsnd->kind = REPLICATION_KIND_LOGICAL;
> +
>
> I think we might want to set the replication kind when processing the
> START_REPLICATION command. The walsender using a logical replication
> slot is not necessarily streaming (e.g. when COPYing table data).
>

Discussing with Bertrand off-list, it's wrong as the logical
replication slot creation also needs to read WAL records so a
walsender who is creating a logical replication slot needs to be woken
up. We can set it the replication kind when processing
START_REPLICATION and CREATE_REPLICATION_SLOT, but it seems better to
set it in one place. So I agree to set it in InitWalSenderSlot().

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-03 Thread Amit Kapila
On Mon, Apr 3, 2023 at 8:51 PM Drouvot, Bertrand
 wrote:
>
> On 4/2/23 10:10 PM, Andres Freund wrote:
> > Hi,
> >>  restart_lsn = s->data.restart_lsn;
> >> -
> >> -/*
> >> - * If the slot is already invalid or is fresh enough, we 
> >> don't need to
> >> - * do anything.
> >> - */
> >> -if (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= 
> >> oldestLSN)
> >> +slot_xmin = s->data.xmin;
> >> +slot_catalog_xmin = s->data.catalog_xmin;
> >> +
> >> +/* the slot has been invalidated (logical decoding conflict 
> >> case) */
> >> +if ((xid && ((LogicalReplicationSlotIsInvalid(s)) ||
> >> +/* or the xid is valid and this is a non conflicting slot */
> >> + (TransactionIdIsValid(*xid) && 
> >> !(LogicalReplicationSlotXidsConflict(slot_xmin, slot_catalog_xmin, 
> >> *xid) ||
> >> +/* or the slot has been invalidated (obsolete LSN case) */
> >> +(!xid && (XLogRecPtrIsInvalid(restart_lsn) || 
> >> restart_lsn >= oldestLSN)))
> >>  {
> >
> > This still looks nearly unreadable. I suggest moving comments outside of the
> > if (), remove redundant parentheses, use a function to detect if the slot 
> > has
> > been invalidated.
> >
>
> I made it as simple as:
>
>  /*
>   * If the slot is already invalid or is a non conflicting slot, we 
> don't
>   * need to do anything.
>   */
>  islogical = xid ? true : false;
>
>  if (SlotIsInvalid(s, islogical) || SlotIsNotConflicting(s, 
> islogical, xid, ))
>
> in V56 attached.
>

Here the variable 'islogical' doesn't seem to convey its actual
meaning because one can imagine that it indicates whether the slot is
logical which I don't think is the actual intent. One idea to simplify
this is to introduce a single function CanInvalidateSlot() or
something like that and move the logic from both the functions
SlotIsInvalid() and SlotIsNotConflicting() into the new function.

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-03 Thread Masahiko Sawada
On Tue, Apr 4, 2023 at 3:17 AM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 4/3/23 8:10 AM, Drouvot, Bertrand wrote:
> > Hi,
> >
> > On 4/3/23 7:35 AM, Amit Kapila wrote:
> >> On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis  wrote:
> >>
> >> Agreed, even Bertrand and myself discussed the same approach few
> >> emails above. BTW, if we have this selective logic to wake
> >> physical/logical walsenders and for standby's, we only wake logical
> >> walsenders at the time of  ApplyWalRecord() then do we need the new
> >> conditional variable enhancement being discussed, and if so, why?
> >>
> >
> > Thank you both for this new idea and discussion. In that case I don't think
> > we need the new CV API and the use of a CV anymore. As just said up-thread 
> > I'll submit
> > a new proposal with this new approach.
> >
>
> Please find enclosed V57 implementing the new approach in 0004.

Regarding 0004 patch:

@@ -2626,6 +2626,12 @@ InitWalSenderSlot(void)
walsnd->sync_standby_priority = 0;
walsnd->latch = >procLatch;
walsnd->replyTime = 0;
+
+   if (MyDatabaseId == InvalidOid)
+   walsnd->kind = REPLICATION_KIND_PHYSICAL;
+   else
+   walsnd->kind = REPLICATION_KIND_LOGICAL;
+

I think we might want to set the replication kind when processing the
START_REPLICATION command. The walsender using a logical replication
slot is not necessarily streaming (e.g. when COPYing table data).

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-03 Thread Andres Freund
Hi,

On 2023-04-03 17:34:52 +0200, Alvaro Herrera wrote:
> On 2023-Apr-03, Drouvot, Bertrand wrote:
> 
> > +/*
> > + * Report terminating or conflicting message.
> > + *
> > + * For both, logical conflict on standby and obsolete slot are handled.
> > + */
> > +static void
> > +ReportTerminationInvalidation(bool terminating, bool islogical, int pid,
> > + NameData slotname, 
> > TransactionId *xid,
> > + XLogRecPtr 
> > restart_lsn, XLogRecPtr oldestLSN)
> > +{
> 
> > +   if (terminating)
> > +   appendStringInfo(_msg, _("terminating process %d to release 
> > replication slot \"%s\""),
> > +pid,
> > +NameStr(slotname));
> > +   else
> > +   appendStringInfo(_msg, _("invalidating"));
> > +
> > +   if (islogical)
> > +   {
> > +   if (terminating)
> > +   appendStringInfo(_msg, _(" because it conflicts 
> > with recovery"));
> 
> You can't build the strings this way, because it's not possible to put
> the strings into the translation machinery.  You need to write full
> strings for each separate case instead, without appending other string
> parts later.

Hm? That's what the _'s do. We build strings in parts in other places too.

You do need to use errmsg_internal() later, to prevent that format string from
being translated as well.

I'm not say that this is exactly the right way, don't get me wrong.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-03 Thread Drouvot, Bertrand

Hi,

On 4/3/23 8:10 AM, Drouvot, Bertrand wrote:

Hi,

On 4/3/23 7:35 AM, Amit Kapila wrote:

On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis  wrote:

Agreed, even Bertrand and myself discussed the same approach few
emails above. BTW, if we have this selective logic to wake
physical/logical walsenders and for standby's, we only wake logical
walsenders at the time of  ApplyWalRecord() then do we need the new
conditional variable enhancement being discussed, and if so, why?



Thank you both for this new idea and discussion. In that case I don't think
we need the new CV API and the use of a CV anymore. As just said up-thread I'll 
submit
a new proposal with this new approach.



Please find enclosed V57 implementing the new approach in 0004. With the new 
approach in place
the TAP tests (0005) work like a charm (no delay and even after a promotion).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 0d198484e008090a524562076326054be56935ca Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 14:08:11 +
Subject: [PATCH v57 6/6] Doc changes describing details about logical
 decoding.

Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/logicaldecoding.sgml | 22 ++
 1 file changed, 22 insertions(+)
 100.0% doc/src/sgml/

diff --git a/doc/src/sgml/logicaldecoding.sgml 
b/doc/src/sgml/logicaldecoding.sgml
index 4e912b4bd4..3da254ed1f 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -316,6 +316,28 @@ postgres=# select * from 
pg_logical_slot_get_changes('regression_slot', NULL, NU
  may consume changes from a slot at any given time.
 
 
+
+ A logical replication slot can also be created on a hot standby. To 
prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot gets
+ invalidated. It's highly recommended to use a physical slot between the 
primary
+ and the standby. Otherwise, hot_standby_feedback will work, but only 
while the
+ connection is alive (for example a node restart would break it). Existing
+ logical slots on standby also get invalidated if wal_level on primary is 
reduced to
+ less than 'logical'.
+
+
+
+ For a logical slot to be created, it builds a historic snapshot, for which
+ information of all the currently running transactions is essential. On
+ primary, this information is available, but on standby, this information
+ has to be obtained from primary. So, slot creation may wait for some
+ activity to happen on the primary. If the primary is idle, creating a
+ logical slot on standby may take a noticeable time. One option to speed it
+ is to call the pg_log_standby_snapshot on the 
primary.
+
+
 
  
   Replication slots persist across crashes and know nothing about the state
-- 
2.34.1

From 4e392c06e39f36c4185780a21f2b90c7c6a97de4 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 09:04:12 +
Subject: [PATCH v57 5/6] New TAP test for logical decoding on standby.

In addition to the new TAP test, this commit introduces a new 
pg_log_standby_snapshot()
function.

The idea is to be able to take a snapshot of running transactions and write this
to WAL without requesting for a (costly) checkpoint.

Author: Craig Ringer (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/func.sgml|  15 +
 src/backend/access/transam/xlogfuncs.c|  32 +
 src/backend/catalog/system_functions.sql  |   2 +
 src/include/catalog/pg_proc.dat   |   3 +
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  37 +
 src/test/recovery/meson.build |   1 +
 .../t/035_standby_logical_decoding.pl | 705 ++
 7 files changed, 795 insertions(+)
   3.1% src/backend/
   4.0% src/test/perl/PostgreSQL/Test/
  89.7% src/test/recovery/t/

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 918a492234..939fb8019f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27034,6 +27034,21 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * 
ps.setting::int + :offset
 prepared with .

   
+  
+   
+
+ pg_log_standby_snapshot
+
+pg_log_standby_snapshot ()
+pg_lsn
+   
+   
+Take a snapshot of running transactions and write this to WAL without
+having to wait bgwriter or checkpointer to log one. This one is useful 
for
+logical decoding on standby for which logical slot creation is hanging
+until such a 

Re: Minimal logical decoding on standbys

2023-04-03 Thread Alvaro Herrera
On 2023-Apr-03, Drouvot, Bertrand wrote:

> +/*
> + * Report terminating or conflicting message.
> + *
> + * For both, logical conflict on standby and obsolete slot are handled.
> + */
> +static void
> +ReportTerminationInvalidation(bool terminating, bool islogical, int pid,
> +   NameData slotname, 
> TransactionId *xid,
> +   XLogRecPtr 
> restart_lsn, XLogRecPtr oldestLSN)
> +{

> + if (terminating)
> + appendStringInfo(_msg, _("terminating process %d to release 
> replication slot \"%s\""),
> +  pid,
> +  NameStr(slotname));
> + else
> + appendStringInfo(_msg, _("invalidating"));
> +
> + if (islogical)
> + {
> + if (terminating)
> + appendStringInfo(_msg, _(" because it conflicts 
> with recovery"));

You can't build the strings this way, because it's not possible to put
the strings into the translation machinery.  You need to write full
strings for each separate case instead, without appending other string
parts later.

Thanks

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)




Re: Minimal logical decoding on standbys

2023-04-03 Thread Drouvot, Bertrand

Hi,

On 4/2/23 10:10 PM, Andres Freund wrote:

Hi,

Btw, most of the patches have some things that pgindent will change (and some
that my editor will highlight). It wouldn't hurt to run pgindent for the later
patches...


done.



Pushed the WAL format change.



Thanks!



On 2023-04-02 10:27:45 +0200, Drouvot, Bertrand wrote:

5.3% doc/src/sgml/
6.2% src/backend/access/transam/
4.6% src/backend/replication/logical/
   55.6% src/backend/replication/
4.4% src/backend/storage/ipc/
6.9% src/backend/tcop/
5.3% src/backend/
3.8% src/include/catalog/
5.3% src/include/replication/


I think it might be worth trying to split this up a bit.



Okay. Split in 2 parts in V56 enclosed.

One part to handle logical slot conflicts on standby, and one part
to arrange for a new pg_stat_database_conflicts and pg_replication_slots field.




restart_lsn = s->data.restart_lsn;
-
-   /*
-* If the slot is already invalid or is fresh enough, we don't 
need to
-* do anything.
-*/
-   if (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= 
oldestLSN)
+   slot_xmin = s->data.xmin;
+   slot_catalog_xmin = s->data.catalog_xmin;
+
+   /* the slot has been invalidated (logical decoding conflict 
case) */
+   if ((xid && ((LogicalReplicationSlotIsInvalid(s)) ||
+   /* or the xid is valid and this is a non conflicting slot */
+(TransactionIdIsValid(*xid) && 
!(LogicalReplicationSlotXidsConflict(slot_xmin, slot_catalog_xmin, *xid) ||
+   /* or the slot has been invalidated (obsolete LSN case) */
+   (!xid && (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn 
>= oldestLSN)))
{


This still looks nearly unreadable. I suggest moving comments outside of the
if (), remove redundant parentheses, use a function to detect if the slot has
been invalidated.



I made it as simple as:

/*
 * If the slot is already invalid or is a non conflicting slot, we don't
 * need to do anything.
 */
islogical = xid ? true : false;

if (SlotIsInvalid(s, islogical) || SlotIsNotConflicting(s, islogical, xid, 
))

in V56 attached.




@@ -1329,16 +1345,45 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, 
XLogRecPtr oldestLSN,
 */
if (last_signaled_pid != active_pid)
{
+   boolsend_signal = false;
+
+   initStringInfo(_msg);
+   initStringInfo(_detail);
+
+   appendStringInfo(_msg, "terminating process %d to release 
replication slot \"%s\"",
+active_pid,
+
NameStr(slotname));


For this to be translatable you need to use _("message").


Thanks!





+   if (xid)
+   {
+   appendStringInfo(_msg, " because it 
conflicts with recovery");
+   send_signal = true;
+
+   if (TransactionIdIsValid(*xid))
+   appendStringInfo(_detail, "The 
slot conflicted with xid horizon %u.", *xid);
+   else
+   appendStringInfo(_detail, 
"Logical decoding on standby requires wal_level to be at least logical on the primary 
server");
+   }
+   else
+   {
+   appendStringInfo(_detail, "The slot's 
restart_lsn %X/%X exceeds the limit by %llu bytes.",
+
LSN_FORMAT_ARGS(restart_lsn),
+
(unsigned long long) (oldestLSN - restart_lsn));
+   }
+
ereport(LOG,
-   errmsg("terminating process %d to release 
replication slot \"%s\"",
-  active_pid, 
NameStr(slotname)),
-   errdetail("The slot's restart_lsn 
%X/%X exceeds the limit by %llu bytes.",
- 
LSN_FORMAT_ARGS(restart_lsn),
- (unsigned 
long long) (oldestLSN - restart_lsn)),
-   errhint("You might need to increase 
max_slot_wal_keep_size."));
-
-   (void) 

Re: Minimal logical decoding on standbys

2023-04-03 Thread Amit Kapila
On Mon, Apr 3, 2023 at 4:39 AM Alvaro Herrera  wrote:
>
> > From 56a9559555918a99c202a0924f7b2ede9de4e75d Mon Sep 17 00:00:00 2001
> > From: bdrouvotAWS 
> > Date: Tue, 7 Feb 2023 08:59:47 +
> > Subject: [PATCH v52 3/6] Allow logical decoding on standby.
> >
> > Allow a logical slot to be created on standby. Restrict its usage
> > or its creation if wal_level on primary is less than logical.
> > During slot creation, it's restart_lsn is set to the last replayed
> > LSN. Effectively, a logical slot creation on standby waits for an
> > xl_running_xact record to arrive from primary.
>
> Hmm, not sure if it really applies here, but this sounds similar to
> issues with track_commit_timestamps: namely, if the primary has it
> enabled and you start a standby with it enabled, that's fine; but if the
> primary is later shut down (but the standby isn't) and then the primary
> restarted with a lesser value, then the standby would misbehave without
> any obvious errors.
>

IIUC, the patch deals it by invalidating logical slots while replaying
the XLOG_PARAMETER_CHANGE record on standby. Then later during
decoding, if it encounters XLOG_PARAMETER_CHANGE, and wal_level from
primary has been reduced, it will return an error. There is a race
condition here as explained in the patch as follows:

+ /*
+ * If wal_level on primary is reduced to less than logical, then we
+ * want to prevent existing logical slots from being used.
+ * Existing logical slots on standby get invalidated when this WAL
+ * record is replayed; and further, slot creation fails when the
+ * wal level is not sufficient; but all these operations are not
+ * synchronized, so a logical slot may creep in while the wal_level
+ * is being reduced. Hence this extra check.
+ */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));

Now, during this race condition, say not only does a logical slot
creep in but also one tries to decode WAL using the same then some
misbehavior is expected. I have not tried this so not sure if this is
really a problem but are you worried about something along those
lines?

>  If that is a real problem, then perhaps you can
> solve it by copying some of the logic from track_commit_timestamps,
> which took a large number of iterations to get right.
>

IIUC, track_commit_timestamps deactivates the CommitTs module (by
using state in the shared memory) when replaying the
XLOG_PARAMETER_CHANGE record. Then later using that state it gives an
error from the appropriate place in the CommitTs module. If my
understanding is correct then that appears to be a better design than
what the patch is currently doing. Also, the error message used in
error_commit_ts_disabled() seems to be better than the current one.

--
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-03 Thread Drouvot, Bertrand

Hi,

On 4/3/23 7:35 AM, Amit Kapila wrote:

On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis  wrote:


On Fri, 2023-03-31 at 02:50 -0700, Jeff Davis wrote:

But if the ConditionVariableEventSleep() API is added, then I think
we
should change the non-recovery case to use a CV as well for
consistency, and it would avoid the need for WalSndWakeup().


It seems like what we ultimately want is for WalSndWakeup() to
selectively wake up physical and/or logical walsenders depending on the
caller. For instance:

WalSndWakeup(bool physical, bool logical)

The callers:

   * On promotion, StartupXLog would call:
 - WalSndWakeup(true, true)
   * XLogFlush/XLogBackgroundFlush/XLogWalRcvFlush would call:
 - WalSndWakeup(true, !RecoveryInProgress())
   * ApplyWalRecord would call:
 - WalSndWakeup(switchedTLI, switchedTLI || RecoveryInProgress())

There seem to be two approaches to making that work:

1. Use two ConditionVariables, and WalSndWakeup would broadcast to one
or both depending on its arguments.

2. Have a "replicaiton_kind" variable in WalSnd (either set based on
MyDatabaseId==InvalidOid, or set at START_REPLICATION time) to indicate
whether it's a physical or logical walsender. WalSndWakeup would wake
up the right walsenders based on its arguments.

#2 seems simpler at least for now. Would that work?



Agreed, even Bertrand and myself discussed the same approach few
emails above. BTW, if we have this selective logic to wake
physical/logical walsenders and for standby's, we only wake logical
walsenders at the time of  ApplyWalRecord() then do we need the new
conditional variable enhancement being discussed, and if so, why?



Thank you both for this new idea and discussion. In that case I don't think
we need the new CV API and the use of a CV anymore. As just said up-thread I'll 
submit
a new proposal with this new approach.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-03 Thread Drouvot, Bertrand

Hi,

On 4/3/23 7:20 AM, Amit Kapila wrote:

On Mon, Apr 3, 2023 at 1:31 AM Jeff Davis  wrote:


On Sun, 2023-04-02 at 10:11 +0200, Drouvot, Bertrand wrote:

I was thinking that, if a new LogicalWalSndWakeup() replaces
"ConditionVariableBroadcast(>replayedCV);"
in ApplyWalRecord() then, it could be possible that some walsender(s)
are requested to wake up while they are actually doing decoding (but
I might be wrong).


I don't think that's a problem, right?



Agreed, I also don't see a problem because of the reason you mentioned
below that if the latch is already set, we won't do anything in
SetLatch.


Thanks for the feedback, I do agree too after Jeff's and your explanation.




We are concerned about wakeups when they happen repeatedly when there's
no work to do, or when the wakeup doesn't happen when it should (and we
need to wait for a timeout).



Currently, we wake up walsenders only after writing some WAL
records
at the time of flush, so won't it be better to wake up only after
applying some WAL records rather than after applying each record?


Yeah that would be better.


Why? If the walsender is asleep, and there's work to be done, why not
wake it up?



I think we can wake it up when there is work to be done even if the
work unit is smaller. The reason why I mentioned waking up the
walsender only after processing some records is to avoid the situation
where it may not need to wait again after decoding very few records.
But probably the logic in WalSndWaitForWal() will help us to exit
before starting to wait by checking the replay location.



Okay, I'll re-write the sub-patch related to the startup/walsender corner
case with this new approach.

Regards,


--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-02 Thread Amit Kapila
On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis  wrote:
>
> On Fri, 2023-03-31 at 02:50 -0700, Jeff Davis wrote:
> > But if the ConditionVariableEventSleep() API is added, then I think
> > we
> > should change the non-recovery case to use a CV as well for
> > consistency, and it would avoid the need for WalSndWakeup().
>
> It seems like what we ultimately want is for WalSndWakeup() to
> selectively wake up physical and/or logical walsenders depending on the
> caller. For instance:
>
>WalSndWakeup(bool physical, bool logical)
>
> The callers:
>
>   * On promotion, StartupXLog would call:
> - WalSndWakeup(true, true)
>   * XLogFlush/XLogBackgroundFlush/XLogWalRcvFlush would call:
> - WalSndWakeup(true, !RecoveryInProgress())
>   * ApplyWalRecord would call:
> - WalSndWakeup(switchedTLI, switchedTLI || RecoveryInProgress())
>
> There seem to be two approaches to making that work:
>
> 1. Use two ConditionVariables, and WalSndWakeup would broadcast to one
> or both depending on its arguments.
>
> 2. Have a "replicaiton_kind" variable in WalSnd (either set based on
> MyDatabaseId==InvalidOid, or set at START_REPLICATION time) to indicate
> whether it's a physical or logical walsender. WalSndWakeup would wake
> up the right walsenders based on its arguments.
>
> #2 seems simpler at least for now. Would that work?
>

Agreed, even Bertrand and myself discussed the same approach few
emails above. BTW, if we have this selective logic to wake
physical/logical walsenders and for standby's, we only wake logical
walsenders at the time of  ApplyWalRecord() then do we need the new
conditional variable enhancement being discussed, and if so, why?

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-02 Thread Amit Kapila
On Mon, Apr 3, 2023 at 1:31 AM Jeff Davis  wrote:
>
> On Sun, 2023-04-02 at 10:11 +0200, Drouvot, Bertrand wrote:
> > I was thinking that, if a new LogicalWalSndWakeup() replaces
> > "ConditionVariableBroadcast(>replayedCV);"
> > in ApplyWalRecord() then, it could be possible that some walsender(s)
> > are requested to wake up while they are actually doing decoding (but
> > I might be wrong).
>
> I don't think that's a problem, right?
>

Agreed, I also don't see a problem because of the reason you mentioned
below that if the latch is already set, we won't do anything in
SetLatch.

> We are concerned about wakeups when they happen repeatedly when there's
> no work to do, or when the wakeup doesn't happen when it should (and we
> need to wait for a timeout).
>
> > >
> > > Currently, we wake up walsenders only after writing some WAL
> > > records
> > > at the time of flush, so won't it be better to wake up only after
> > > applying some WAL records rather than after applying each record?
> >
> > Yeah that would be better.
>
> Why? If the walsender is asleep, and there's work to be done, why not
> wake it up?
>

I think we can wake it up when there is work to be done even if the
work unit is smaller. The reason why I mentioned waking up the
walsender only after processing some records is to avoid the situation
where it may not need to wait again after decoding very few records.
But probably the logic in WalSndWaitForWal() will help us to exit
before starting to wait by checking the replay location.

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-02 Thread Alvaro Herrera
> From 56a9559555918a99c202a0924f7b2ede9de4e75d Mon Sep 17 00:00:00 2001
> From: bdrouvotAWS 
> Date: Tue, 7 Feb 2023 08:59:47 +
> Subject: [PATCH v52 3/6] Allow logical decoding on standby.
> 
> Allow a logical slot to be created on standby. Restrict its usage
> or its creation if wal_level on primary is less than logical.
> During slot creation, it's restart_lsn is set to the last replayed
> LSN. Effectively, a logical slot creation on standby waits for an
> xl_running_xact record to arrive from primary.

Hmm, not sure if it really applies here, but this sounds similar to
issues with track_commit_timestamps: namely, if the primary has it
enabled and you start a standby with it enabled, that's fine; but if the
primary is later shut down (but the standby isn't) and then the primary
restarted with a lesser value, then the standby would misbehave without
any obvious errors.  If that is a real problem, then perhaps you can
solve it by copying some of the logic from track_commit_timestamps,
which took a large number of iterations to get right.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)




Re: Minimal logical decoding on standbys

2023-04-02 Thread Jeff Davis
On Sun, 2023-04-02 at 15:29 -0700, Andres Freund wrote:
> I agree that the *wait* has to go through condition_variable.c, but
> it doesn't
> seem right that creation of the WES needs to go through
> condition_variable.c.

The kind of WES required by a CV is an implementation detail, so I was
concerned about making too many assumptions across different APIs.

But what I ended up with is arguably not better, so perhaps I should do
it your way and then just have some comments about what assumptions are
being made?

> The only thing that ConditionVariableEventSleep() seems to require is
> that the
> WES is waiting for MyLatch. You don't even need a separate WES for
> that, the
> already existing WES should suffice.

By "already existing" WES, I assume you mean FeBeWaitSet? Yes, that
mostly matches, but it uses WL_POSTMASTER_DEATH instead of
WL_EXIT_ON_PM_DEATH, so I'd need to handle PM death in
condition_variable.c. That's trivial to do, though.

Regards,
Jeff Davis





Re: Minimal logical decoding on standbys

2023-04-02 Thread Jeff Davis
On Fri, 2023-03-31 at 02:50 -0700, Jeff Davis wrote:
> But if the ConditionVariableEventSleep() API is added, then I think
> we
> should change the non-recovery case to use a CV as well for
> consistency, and it would avoid the need for WalSndWakeup().

It seems like what we ultimately want is for WalSndWakeup() to
selectively wake up physical and/or logical walsenders depending on the
caller. For instance:

   WalSndWakeup(bool physical, bool logical)

The callers:

  * On promotion, StartupXLog would call:
- WalSndWakeup(true, true)
  * XLogFlush/XLogBackgroundFlush/XLogWalRcvFlush would call:
- WalSndWakeup(true, !RecoveryInProgress())
  * ApplyWalRecord would call:
- WalSndWakeup(switchedTLI, switchedTLI || RecoveryInProgress())

There seem to be two approaches to making that work:

1. Use two ConditionVariables, and WalSndWakeup would broadcast to one
or both depending on its arguments.

2. Have a "replicaiton_kind" variable in WalSnd (either set based on
MyDatabaseId==InvalidOid, or set at START_REPLICATION time) to indicate
whether it's a physical or logical walsender. WalSndWakeup would wake
up the right walsenders based on its arguments.

#2 seems simpler at least for now. Would that work?

Regards,
Jeff Davis





Re: Minimal logical decoding on standbys

2023-04-02 Thread Andres Freund
Hi,

On 2023-04-02 15:15:44 -0700, Jeff Davis wrote:
> On Sun, 2023-04-02 at 14:35 -0700, Andres Freund wrote:
> > Why not offer a function to add a CV to a WES? It seems somehow odd
> > to require
> > going through condition_variable.c to create a WES.
> 
> I agree that it's a bit odd, but remember that after waiting on a CV's
> latch, it needs to re-insert itself into the CV's wait list.
> 
> A WaitEventSetWait() can't do that, unless we move the details of re-
> adding to the wait list into latch.c. I considered that, but latch.c
> already implements the APIs for WaitEventSet and Latch, so it felt
> complex to also make it responsible for ConditionVariable.

I agree that the *wait* has to go through condition_variable.c, but it doesn't
seem right that creation of the WES needs to go through condition_variable.c.

The only thing that ConditionVariableEventSleep() seems to require is that the
WES is waiting for MyLatch. You don't even need a separate WES for that, the
already existing WES should suffice.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-02 Thread Jeff Davis
On Sun, 2023-04-02 at 14:35 -0700, Andres Freund wrote:
> Why not offer a function to add a CV to a WES? It seems somehow odd
> to require
> going through condition_variable.c to create a WES.

I agree that it's a bit odd, but remember that after waiting on a CV's
latch, it needs to re-insert itself into the CV's wait list.

A WaitEventSetWait() can't do that, unless we move the details of re-
adding to the wait list into latch.c. I considered that, but latch.c
already implements the APIs for WaitEventSet and Latch, so it felt
complex to also make it responsible for ConditionVariable.

I'm open to suggestion.

Regards,
Jeff Davis





Re: Minimal logical decoding on standbys

2023-04-02 Thread Andres Freund
Hi,

On 2023-03-31 02:44:33 -0700, Jeff Davis wrote:
> From 2f05cab9012950ae9290fccbb6366d50fc01553e Mon Sep 17 00:00:00 2001
> From: Jeff Davis 
> Date: Wed, 1 Mar 2023 20:02:42 -0800
> Subject: [PATCH v2] Introduce ConditionVariableEventSleep().
> 
> The new API takes a WaitEventSet which can include socket events. The
> WaitEventSet must have been created by
> ConditionVariableWaitSetCreate(), another new function, so that it
> includes the wait events necessary for a condition variable.

Why not offer a function to add a CV to a WES? It seems somehow odd to require
going through condition_variable.c to create a WES.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-02 Thread Jeff Davis
On Fri, 2023-03-31 at 02:44 -0700, Jeff Davis wrote:
> Thank you, done. I think the nearby line was also wrong, returning
> true
> when there was no timeout. I combined the lines and got rid of the
> early return so it can check the list and timeout condition like
> normal. Attached.

On second (third?) thought, I think I was right the first time. It
passes the flag WL_EXIT_ON_PM_DEATH (included in the
ConditionVariableWaitSet), so a WL_POSTMASTER_DEATH event should not be
returned.

Also, I think the early return is correct. The current code in
ConditionVariableTimedSleep() still checks the wait list even if
WaitLatch() returns WL_TIMEOUT (it ignores the return), but I don't see
why it can't early return true. For a socket event in
ConditionVariableEventSleep() I think it should early return false.

Regards,
Jeff Davis





Re: Minimal logical decoding on standbys

2023-04-02 Thread Andres Freund
Hi,

Btw, most of the patches have some things that pgindent will change (and some
that my editor will highlight). It wouldn't hurt to run pgindent for the later
patches...

Pushed the WAL format change.


On 2023-04-02 10:27:45 +0200, Drouvot, Bertrand wrote:
> During WAL replay on standby, when slot conflict is identified,
> invalidate such slots. Also do the same thing if wal_level on the primary 
> server
> is reduced to below logical and there are existing logical slots
> on standby. Introduce a new ProcSignalReason value for slot
> conflict recovery. Arrange for a new pg_stat_database_conflicts field:
> confl_active_logicalslot.
> 
> Add a new field "conflicting" in pg_replication_slots.
> 
> Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot
> Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
> Mello,
> Bharath Rupireddy
> ---
>  doc/src/sgml/monitoring.sgml  |  11 +
>  doc/src/sgml/system-views.sgml|  10 +
>  src/backend/access/gist/gistxlog.c|   2 +
>  src/backend/access/hash/hash_xlog.c   |   1 +
>  src/backend/access/heap/heapam.c  |   3 +
>  src/backend/access/nbtree/nbtxlog.c   |   2 +
>  src/backend/access/spgist/spgxlog.c   |   1 +
>  src/backend/access/transam/xlog.c |  20 +-
>  src/backend/catalog/system_views.sql  |   6 +-
>  .../replication/logical/logicalfuncs.c|  13 +-
>  src/backend/replication/slot.c| 189 ++
>  src/backend/replication/slotfuncs.c   |  16 +-
>  src/backend/replication/walsender.c   |   7 +
>  src/backend/storage/ipc/procsignal.c  |   3 +
>  src/backend/storage/ipc/standby.c |  13 +-
>  src/backend/tcop/postgres.c   |  28 +++
>  src/backend/utils/activity/pgstat_database.c  |   4 +
>  src/backend/utils/adt/pgstatfuncs.c   |   3 +
>  src/include/catalog/pg_proc.dat   |  11 +-
>  src/include/pgstat.h  |   1 +
>  src/include/replication/slot.h|  14 +-
>  src/include/storage/procsignal.h  |   1 +
>  src/include/storage/standby.h |   2 +
>  src/test/regress/expected/rules.out   |   8 +-
>  24 files changed, 308 insertions(+), 61 deletions(-)
>5.3% doc/src/sgml/
>6.2% src/backend/access/transam/
>4.6% src/backend/replication/logical/
>   55.6% src/backend/replication/
>4.4% src/backend/storage/ipc/
>6.9% src/backend/tcop/
>5.3% src/backend/
>3.8% src/include/catalog/
>5.3% src/include/replication/

I think it might be worth trying to split this up a bit.


>   restart_lsn = s->data.restart_lsn;
> -
> - /*
> -  * If the slot is already invalid or is fresh enough, we don't 
> need to
> -  * do anything.
> -  */
> - if (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= 
> oldestLSN)
> + slot_xmin = s->data.xmin;
> + slot_catalog_xmin = s->data.catalog_xmin;
> +
> + /* the slot has been invalidated (logical decoding conflict 
> case) */
> + if ((xid && ((LogicalReplicationSlotIsInvalid(s)) ||
> + /* or the xid is valid and this is a non conflicting slot */
> +  (TransactionIdIsValid(*xid) && 
> !(LogicalReplicationSlotXidsConflict(slot_xmin, slot_catalog_xmin, *xid) 
> ||
> + /* or the slot has been invalidated (obsolete LSN case) */
> + (!xid && (XLogRecPtrIsInvalid(restart_lsn) || 
> restart_lsn >= oldestLSN)))
>   {

This still looks nearly unreadable. I suggest moving comments outside of the
if (), remove redundant parentheses, use a function to detect if the slot has
been invalidated.


> @@ -1329,16 +1345,45 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, 
> XLogRecPtr oldestLSN,
>*/
>   if (last_signaled_pid != active_pid)
>   {
> + boolsend_signal = false;
> +
> + initStringInfo(_msg);
> + initStringInfo(_detail);
> +
> + appendStringInfo(_msg, "terminating process 
> %d to release replication slot \"%s\"",
> +  active_pid,
> +  
> NameStr(slotname));

For this to be translatable you need to use _("message").


> + if (xid)
> + {
> + appendStringInfo(_msg, " because it 
> conflicts with recovery");
> + send_signal = true;
> +
> + if (TransactionIdIsValid(*xid))
> + 

Re: Minimal logical decoding on standbys

2023-04-02 Thread Jeff Davis
On Sun, 2023-04-02 at 10:11 +0200, Drouvot, Bertrand wrote:
> I was thinking that, if a new LogicalWalSndWakeup() replaces
> "ConditionVariableBroadcast(>replayedCV);"
> in ApplyWalRecord() then, it could be possible that some walsender(s)
> are requested to wake up while they are actually doing decoding (but
> I might be wrong).

I don't think that's a problem, right?

We are concerned about wakeups when they happen repeatedly when there's
no work to do, or when the wakeup doesn't happen when it should (and we
need to wait for a timeout).

> > 
> > Currently, we wake up walsenders only after writing some WAL
> > records
> > at the time of flush, so won't it be better to wake up only after
> > applying some WAL records rather than after applying each record?
> 
> Yeah that would be better.

Why? If the walsender is asleep, and there's work to be done, why not
wake it up?

If it's already doing work, and the latch gets repeatedly set, that
doesn't look like a problem either. The comment on SetLatch() says:

  /*
   * Sets a latch and wakes up anyone waiting on it.
   *
   * This is cheap if the latch is already set, otherwise not so much.

Regards,
Jeff Davis









Re: Minimal logical decoding on standbys

2023-04-02 Thread Drouvot, Bertrand

Hi,

On 4/1/23 6:50 AM, Amit Kapila wrote:

On Fri, Mar 31, 2023 at 7:14 PM Drouvot, Bertrand
 wrote:


That sounds like a good idea. We could imagine creating a LogicalWalSndWakeup()
doing the Walsender(s) triage based on a new variable (as you suggest).

But, it looks to me that we:

- would need to go through the list of all the walsenders to do the triage
- could wake up some logical walsender(s) unnecessary



Why it could wake up unnecessarily?


I was thinking that, if a new LogicalWalSndWakeup() replaces 
"ConditionVariableBroadcast(>replayedCV);"
in ApplyWalRecord() then, it could be possible that some walsender(s)
are requested to wake up while they are actually doing decoding (but I might be 
wrong).




This extra work would occur during each replay.

while with the CV, only the ones in the CV wait queue would be waked up.



Currently, we wake up walsenders only after writing some WAL records
at the time of flush, so won't it be better to wake up only after
applying some WAL records rather than after applying each record?


Yeah that would be better.

Do you have any idea about how (and where) we could define the "some WAL records 
replayed"?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-01 Thread Andres Freund
Hi,

On 2023-03-31 12:45:51 +0200, Drouvot, Bertrand wrote:
> On 3/31/23 6:33 AM, Andres Freund wrote:
> > Hi,
> > 
> > On 2023-03-30 18:23:41 +0200, Drouvot, Bertrand wrote:
> > > On 3/30/23 9:04 AM, Andres Freund wrote:
> > > > I think this commit is ready to go. Unless somebody thinks differently, 
> > > > I
> > > > think I might push it tomorrow.
> > > 
> > > Great! Once done, I'll submit a new patch so that GlobalVisTestFor() can 
> > > make
> > > use of the heap relation in vacuumRedirectAndPlaceholder() (which will be 
> > > possible
> > > once 0001 is committed).
> > 
> > Unfortunately I did find an issue doing a pre-commit review of the patch.
> > 
> > The patch adds VISIBILITYMAP_IS_CATALOG_REL to xl_heap_visible.flags - but 
> > it
> > does not remove the bit before calling visibilitymap_set().
> > 
> > This ends up corrupting the visibilitymap, because the we'll set a bit for
> > another page.
> > 
> 
> Oh I see, I did not think about that (not enough experience in the VM area).
> Nice catch and thanks for pointing out!

I pushed a commit just adding an assertion that only valid bits are passed in.


> > I'm also thinking of splitting the patch into two. One patch to pass down 
> > the
> > heap relation into the new places, and another for the rest.
> 
> I think that makes sense. I don't know how far you've work on the split but 
> please
> find attached V54 doing such a split + implementing your 
> VISIBILITYMAP_XLOG_VALID_BITS
> suggestion.

I pushed the pass-the-relation part.  I removed an include of catalog.h that
was in the patch - I suspect it might have slipped in there from a later patch
in the series...

I was a bit bothered by using 'heap' instead of 'table' in so many places
(eventually we imo should standardize on the latter), but looking around the
changed places, heap was used for things like buffers etc. So I left it at
heap.

Glad we split 0001 - the rest is a lot easier to review.

Greetings,

Andres Freund




  1   2   3   4   >