RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-12-17 Thread Hayato Kuroda (Fujitsu)
Dear Thomas, Alexander,

> Thanks for reporting. Yes, it has been already reported by me [1], and the 
> server
> log was provided by Andrew [2]. The issue was that a file creation was failed
> because the same one was unlink()'d just before but it was in
> STATUS_DELETE_PENDING
> status. Kindly Alexander proposed a fix [3] and it looks good to me, but
> confirmations by senior and windows-friendly developers are needed to move
> forward.
> (at first we thought the issue was solved by updating, but it was not correct)
> 
> I know that you have developed there region, so I'm very happy if you check 
> the
> forked thread.

I forgot to say an important point. The issue was not introduced by the feature.
It just actualized a possible failure, only for Windows environment.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-12-17 Thread Hayato Kuroda (Fujitsu)
Dear Thomas, Alexander,

> 17.12.2023 07:02, Thomas Munro wrote:
> > FYI fairywren failed in this test:
> >
> >
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2023-1
> 2-16%2022%3A03%3A06
> >
> > ===8<===
> > Restoring database schemas in the new cluster
> > *failure*
> >
> > Consult the last few lines of
> >
> "C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgr
> ade/003_logical_slots/data/t_003_logical_slots_newpub_data/pgdata/pg_upgra
> de_output.d/20231216T221418.035/log/pg_upgrade_dump_1.log"
> > for
> > the probable cause of the failure.
> > Failure, exiting
> > [22:14:34.598](22.801s) not ok 10 - run of pg_upgrade of old cluster
> > [22:14:34.600](0.001s) #   Failed test 'run of pg_upgrade of old cluster'
> > #   at
> C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/
> 003_logical_slots.pl
> > line 177.
> > ===8<===
> >
> > Without that log it might be hard to figure out what went wrong though :-/
> >
> 
> Yes, but most probably it's the same failure as
>  

Thanks for reporting. Yes, it has been already reported by me [1], and the 
server
log was provided by Andrew [2]. The issue was that a file creation was failed
because the same one was unlink()'d just before but it was in 
STATUS_DELETE_PENDING
status. Kindly Alexander proposed a fix [3] and it looks good to me, but
confirmations by senior and windows-friendly developers are needed to move 
forward.
(at first we thought the issue was solved by updating, but it was not correct)

I know that you have developed there region, so I'm very happy if you check the
forked thread.

[1]: 
https://www.postgresql.org/message-id/flat/TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: 
https://www.postgresql.org/message-id/TYAPR01MB5866A4E7342088E91362BEF0F5BBA%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[3]: 
https://www.postgresql.org/message-id/976479cf-dd66-ca19-f40c-5640e30700cb%40gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-12-16 Thread Alexander Lakhin

17.12.2023 07:02, Thomas Munro wrote:

FYI fairywren failed in this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2023-12-16%2022%3A03%3A06

===8<===
Restoring database schemas in the new cluster
*failure*

Consult the last few lines of
"C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_logical_slots/data/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231216T221418.035/log/pg_upgrade_dump_1.log"
for
the probable cause of the failure.
Failure, exiting
[22:14:34.598](22.801s) not ok 10 - run of pg_upgrade of old cluster
[22:14:34.600](0.001s) #   Failed test 'run of pg_upgrade of old cluster'
#   at 
C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/003_logical_slots.pl
line 177.
===8<===

Without that log it might be hard to figure out what went wrong though :-/



Yes, but most probably it's the same failure as
https://www.postgresql.org/message-id/flat/TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best regards,
Alexander




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-12-16 Thread Thomas Munro
FYI fairywren failed in this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2023-12-16%2022%3A03%3A06

===8<===
Restoring database schemas in the new cluster
*failure*

Consult the last few lines of
"C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_logical_slots/data/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231216T221418.035/log/pg_upgrade_dump_1.log"
for
the probable cause of the failure.
Failure, exiting
[22:14:34.598](22.801s) not ok 10 - run of pg_upgrade of old cluster
[22:14:34.600](0.001s) #   Failed test 'run of pg_upgrade of old cluster'
#   at 
C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/003_logical_slots.pl
line 177.
===8<===

Without that log it might be hard to figure out what went wrong though :-/




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-12-06 Thread Amit Kapila
On Wed, Dec 6, 2023 at 10:02 AM Amit Kapila  wrote:
>
> On Wed, Dec 6, 2023 at 9:40 AM vignesh C  wrote:
> >
> > On Tue, 5 Dec 2023 at 11:11, Hayato Kuroda (Fujitsu)
> >  wrote:
> > >
> > > Dear Sawada-san, hackers,
> > >
> > > Based on comments I made a fix. PSA the patch.
> > >
> >
> > Thanks for the patch, the changes look good to me.
> >
>
> Thanks, I have added a comment and updated the commit message. I'll
> push this tomorrow unless there are more comments.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-12-05 Thread Amit Kapila
On Wed, Dec 6, 2023 at 9:40 AM vignesh C  wrote:
>
> On Tue, 5 Dec 2023 at 11:11, Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Dear Sawada-san, hackers,
> >
> > Based on comments I made a fix. PSA the patch.
> >
>
> Thanks for the patch, the changes look good to me.
>

Thanks, I have added a comment and updated the commit message. I'll
push this tomorrow unless there are more comments.

-- 
With Regards,
Amit Kapila.


v2-0001-Fix-issues-in-binary_upgrade_logical_slot_has_cau.patch
Description: Binary data


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-12-05 Thread vignesh C
On Tue, 5 Dec 2023 at 11:11, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Sawada-san, hackers,
>
> Based on comments I made a fix. PSA the patch.
>

Thanks for the patch, the changes look good to me.

Regards,
Vignesh




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-12-04 Thread Hayato Kuroda (Fujitsu)
Dear Sawada-san, hackers,

Based on comments I made a fix. PSA the patch.

> 
> Is there any reason why this function can be executed only in binary
> upgrade mode? It seems to me that other functions in
> pg_upgrade_support.c must be called only in binary upgrade mode
> because it does some hacky changes internally. On the other hand,
> binary_upgrade_logical_slot_has_caught_up() just calls
> LogicalReplicationSlotHasPendingWal(), which doesn't change anything
> internally. If we make this function usable in normal mode, the user
> would be able to  check each slot's upgradability without pg_upgrade
> --check command (or without stopping the server if the user can ensure
> no more meaningful WAL records are generated).

I kept the function to be upgrade only because subsequent operations might 
generate
WALs. See [1].

> Also, the function checks if the user has the REPLICATION privilege
> but I think that only superuser can connect to the server in binary
> upgrade mode in the first place.

CheckSlotPermissions() was replaced to Assert().

> The following error message doesn't match the function name:
> 
> /* We must check before dereferencing the argument */
> if (PG_ARGISNULL(0))
> elog(ERROR, "null argument to
> binary_upgrade_validate_wal_records is not allowed");

Per below comment, this elog(ERROR) was not needed anymore. Removed.

> { oid => '8046', descr => 'for use by pg_upgrade',
>   proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
>   provolatile => 'v', proparallel => 'u', prorettype => 'bool',
>   proargtypes => 'name',
>   prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
> 
> The function is not a strict function but we check in the function if
> the passed argument is not null. I think it would be clearer to make
> it a strict function.

Per conclusion [2], I changed the function to the strict one. As shown in below,
binary_upgrade_logical_slot_has_caught_up() returned NULL when the input was 
NULL.

```
postgres=# SELECT * FROM pg_create_logical_replication_slot('slot', 
'test_decoding');
 slot_name |lsn
---+---
 slot  | 0/152E7E0
(1 row)

postgres=# SELECT * FROM binary_upgrade_logical_slot_has_caught_up(NULL);
 binary_upgrade_logical_slot_has_caught_up 
---
 
(1 row)
```

> LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
> guess it's more suitable to be in slotfunc.s where similar functions
> such as pg_logical_replication_slot_advance() is also defined.

Committers had different opinions about it, so I kept current style [3].

[1]: 
https://www.postgresql.org/message-id/CALj2ACW7H-kAHia%3DvCbmdWDueGA_3pQfyzARfAQX0aGzHY57Zw%40mail.gmail.com
[2]: 
https://www.postgresql.org/message-id/CAA4eK1LzK0NvMkWAY6RJ6yN%2BYYUgMg1f%3DmNOGV8CPXLT43FHMw%40mail.gmail.com
[3]: 
https://www.postgresql.org/message-id/CAD21AoDkyyC%3Dwa2%3D1Ruo_L8g16xf_W5Xyhp-%3D3j9urT916b9gA%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



followup_for_upgrade.patch
Description: followup_for_upgrade.patch


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-12-04 Thread Amit Kapila
On Mon, Dec 4, 2023 at 11:59 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear hackers,
>
> I found another failure related with the commit [1]. This is caused by missing
> wait on the test code. Amit helped me for this analysis and fix.
>

Pushed!

-- 
With Regards,
Amit Kapila.




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-12-03 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

I found another failure related with the commit [1]. This is caused by missing
wait on the test code. Amit helped me for this analysis and fix.

# Analysis of the failure

The failure is that restored slot is two_phase = false, whereas the slot is
created as two_phase = true. This is because pg_upgrade was executed before all
tables are in ready state.

# How to fix

I think the test is not good. According to other subscription tests related with
2PC, they additionally wait until subtwophasestate becomes 'e'. It should be
added as well. PSA the patch.

[1]: 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual=2023-12-01%2016%3A59%3A30

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



add_wait.patch
Description: add_wait.patch


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-12-01 Thread Masahiko Sawada
On Thu, Nov 30, 2023 at 6:49 PM Amit Kapila  wrote:
>
> On Wed, Nov 29, 2023 at 7:33 AM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > > > Actually, we do not expect that it won't input NULL. IIUC all of slots 
> > > > have
> > > > slot_name, and subquery uses its name. But will it be kept forever? I 
> > > > think we
> > > > can avoid any risk.
> > > >
> > > > > I've not tested it yet but even if it returns NULL, perhaps
> > > > > get_old_cluster_logical_slot_infos() would still set curr->caught_up
> > > > > to false, no?
> > > >
> > > > Hmm. I checked the C99 specification [1] of strcmp, but it does not 
> > > > define the
> > > > case when the NULL is input. So it depends implementation.
> > >
> > > I think PQgetvalue() returns an empty string if the result value is null.
> > >
> >
> > Oh, you are right... I found below paragraph from [1].
> >
> > > An empty string is returned if the field value is null. See PQgetisnull 
> > > to distinguish
> > > null values from empty-string values.
> >
> > So I agree what you said - current code can accept NULL.
> > But still not sure the error message is really good or not.
> > If we regard an empty string as false, the slot which has empty name will 
> > be reported like:
> > "The slot \"\" has not consumed the WAL yet" in 
> > check_old_cluster_for_valid_slots().
> > Isn't it inappropriate?
> >
>
> I see your point that giving a better message (which would tell the
> actual problem) to the user in this case also has a value. OTOH, as
> you said, this case won't happen in practical scenarios, so I am fine
> either way with a slight tilt toward retaining a better error message
> (aka the current way). Sawada-San/Bharath, do you have any suggestions
> on this?

TBH I'm not sure the error message is much helpful for users more than
the message "The slot \"\" has not consumed the WAL yet" in practice.
In either case, the messages just tell the user the slot name passed
to the function was not appropriate. Rather, I'm a bit concerned that
we create a precedent that we make a function non-strict to produce an
error message only for unrealistic cases. Please point out if we
already have such precedents. Other functions in pg_upgrade_support.c
such as binary_upgrade_set_next_pg_tablespace_oid() are not called if
the argument is NULL since it's a strict function. But if null was
passed in (where should not happen in practice), pg_upgrade would fail
with an error message or would finish while leaving the cluster in an
inconsistent state, I've not tested. Why do we want to care about the
argument being NULL only in
binary_upgrade_logical_slot_has_caught_up()?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-30 Thread Amit Kapila
On Wed, Nov 29, 2023 at 7:33 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> > > Actually, we do not expect that it won't input NULL. IIUC all of slots 
> > > have
> > > slot_name, and subquery uses its name. But will it be kept forever? I 
> > > think we
> > > can avoid any risk.
> > >
> > > > I've not tested it yet but even if it returns NULL, perhaps
> > > > get_old_cluster_logical_slot_infos() would still set curr->caught_up
> > > > to false, no?
> > >
> > > Hmm. I checked the C99 specification [1] of strcmp, but it does not 
> > > define the
> > > case when the NULL is input. So it depends implementation.
> >
> > I think PQgetvalue() returns an empty string if the result value is null.
> >
>
> Oh, you are right... I found below paragraph from [1].
>
> > An empty string is returned if the field value is null. See PQgetisnull to 
> > distinguish
> > null values from empty-string values.
>
> So I agree what you said - current code can accept NULL.
> But still not sure the error message is really good or not.
> If we regard an empty string as false, the slot which has empty name will be 
> reported like:
> "The slot \"\" has not consumed the WAL yet" in 
> check_old_cluster_for_valid_slots().
> Isn't it inappropriate?
>

I see your point that giving a better message (which would tell the
actual problem) to the user in this case also has a value. OTOH, as
you said, this case won't happen in practical scenarios, so I am fine
either way with a slight tilt toward retaining a better error message
(aka the current way). Sawada-San/Bharath, do you have any suggestions
on this?

-- 
With Regards,
Amit Kapila.




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-29 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> Sorry, my analysis was not complete. On looking closely, I think the
> reason is that we are allowed to upgrade the slot iff there is no
> pending WAL to be processed. 

Yes, the guard will strongly protect from data loss, but I do not take care in 
the test.

> The test first disables the subscription
> to avoid unnecessary LOGs on the subscriber and then stops the
> publisher node.

Right. Unnecessary ERROR would be appeared if we do not disable.

> It is quite possible that just before the shutdown of
> the server, autovacuum generates some WAL record that needs to be
> processed, 

Yeah, pg_upgrade does not ensure that autovacuum is not running *before* the
upgrade.

> so you propose just disabling the autovacuum for this test.

Absolutely correct.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-29 Thread Amit Kapila
On Thu, Nov 30, 2023 at 8:40 AM Amit Kapila  wrote:
>
> On Wed, Nov 29, 2023 at 2:56 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > > > >
> > > > > Pushed!
> > > >
> > > > Hi all, the CF entry for this is marked RfC, and CI is trying to apply
> > > > the last patch committed. Is there further work that needs to be
> > > > re-attached and/or rebased?
> > > >
> > >
> > > No. I have marked it as committed.
> > >
> >
> > I found another failure related with the commit [1]. I think it is caused 
> > by the
> > autovacuum. I want to propose a patch which disables the feature for old 
> > publisher.
> >
> > More detail, please see below.
> >
> > # Analysis of the failure
> >
> > Summary: this failure occurs when the autovacuum starts after the 
> > subscription
> > is disabled but before doing pg_upgrade.
> >
> > According to the regress file, it unexpectedly failed the pg_upgrade [2]. 
> > There are
> > no possibilities for slots are invalidated, so some WALs seemed to be 
> > generated
> > after disabling the subscriber.
> >
> > Also, server log caused by oldpub said that autovacuum worker was 
> > terminated when
> > it stopped. This was occurred after walsender released the logical slots. 
> > WAL records
> > caused by autovacuum workers could not be consumed by the slots, so that 
> > upgrading
> > function returned false.
> >
> > # How to reproduce
> >
> > I made a small file for reproducing the failure. Please see reproduce.txt. 
> > This contains
> > changes for launching autovacuum worker very often and for ensuring actual 
> > works are
> > done. After applying it, I could reproduce the same failure every time.
> >
> > # How to fix
> >
> > I think it is sufficient to fix only the test code.
> > The easiest way is to disable the autovacuum on old publisher. PSA the 
> > patch file.
> >
>
> Agreed, for now, we should change the test as you proposed. I'll take
> care of that. However, I wonder, if we should also ensure that
> autovacuum or any other worker is shut down before walsender processes
> the last set of WAL before shutdown. We can analyze more on this and
> probably start a separate thread to discuss this point.
>

Sorry, my analysis was not complete. On looking closely, I think the
reason is that we are allowed to upgrade the slot iff there is no
pending WAL to be processed. The test first disables the subscription
to avoid unnecessary LOGs on the subscriber and then stops the
publisher node. It is quite possible that just before the shutdown of
the server, autovacuum generates some WAL record that needs to be
processed, so you propose just disabling the autovacuum for this test.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-29 Thread Amit Kapila
On Wed, Nov 29, 2023 at 2:56 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > > >
> > > > Pushed!
> > >
> > > Hi all, the CF entry for this is marked RfC, and CI is trying to apply
> > > the last patch committed. Is there further work that needs to be
> > > re-attached and/or rebased?
> > >
> >
> > No. I have marked it as committed.
> >
>
> I found another failure related with the commit [1]. I think it is caused by 
> the
> autovacuum. I want to propose a patch which disables the feature for old 
> publisher.
>
> More detail, please see below.
>
> # Analysis of the failure
>
> Summary: this failure occurs when the autovacuum starts after the subscription
> is disabled but before doing pg_upgrade.
>
> According to the regress file, it unexpectedly failed the pg_upgrade [2]. 
> There are
> no possibilities for slots are invalidated, so some WALs seemed to be 
> generated
> after disabling the subscriber.
>
> Also, server log caused by oldpub said that autovacuum worker was terminated 
> when
> it stopped. This was occurred after walsender released the logical slots. WAL 
> records
> caused by autovacuum workers could not be consumed by the slots, so that 
> upgrading
> function returned false.
>
> # How to reproduce
>
> I made a small file for reproducing the failure. Please see reproduce.txt. 
> This contains
> changes for launching autovacuum worker very often and for ensuring actual 
> works are
> done. After applying it, I could reproduce the same failure every time.
>
> # How to fix
>
> I think it is sufficient to fix only the test code.
> The easiest way is to disable the autovacuum on old publisher. PSA the patch 
> file.
>

Agreed, for now, we should change the test as you proposed. I'll take
care of that. However, I wonder, if we should also ensure that
autovacuum or any other worker is shut down before walsender processes
the last set of WAL before shutdown. We can analyze more on this and
probably start a separate thread to discuss this point.


-- 
With Regards,
Amit Kapila.




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-29 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

> > >
> > > Pushed!
> >
> > Hi all, the CF entry for this is marked RfC, and CI is trying to apply
> > the last patch committed. Is there further work that needs to be
> > re-attached and/or rebased?
> >
> 
> No. I have marked it as committed.
>

I found another failure related with the commit [1]. I think it is caused by the
autovacuum. I want to propose a patch which disables the feature for old 
publisher.

More detail, please see below.

# Analysis of the failure

Summary: this failure occurs when the autovacuum starts after the subscription
is disabled but before doing pg_upgrade.

According to the regress file, it unexpectedly failed the pg_upgrade [2]. There 
are
no possibilities for slots are invalidated, so some WALs seemed to be generated
after disabling the subscriber.

Also, server log caused by oldpub said that autovacuum worker was terminated 
when
it stopped. This was occurred after walsender released the logical slots. WAL 
records
caused by autovacuum workers could not be consumed by the slots, so that 
upgrading
function returned false.

# How to reproduce

I made a small file for reproducing the failure. Please see reproduce.txt. This 
contains
changes for launching autovacuum worker very often and for ensuring actual 
works are
done. After applying it, I could reproduce the same failure every time.

# How to fix

I think it is sufficient to fix only the test code.
The easiest way is to disable the autovacuum on old publisher. PSA the patch 
file.

How do you think?


[1]: 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2023-11-27%2020%3A52%3A10
[2]:
```
...
Checking for contrib/isn with bigint-passing mismatch ok
Checking for valid logical replication slots  fatal

Your installation contains logical replication slots that can't be upgraded.
You can remove invalid slots and/or consume the pending WAL for other slots,
and then restart the upgrade.
A list of the problematic slots is in the file:

/home/bf/bf-build/skink-master/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231127T220024.480/invalid_logical_slots.txt
Failure, exiting
[22:01:20.362](86.645s) not ok 10 - run of pg_upgrade of old cluster
...
```
[3]:
```
...
2023-11-27 22:00:23.546 UTC [3567962][walsender][4/0:0] LOG:  released logical 
replication slot "regress_sub"
2023-11-27 22:00:23.549 UTC [3559042][postmaster][:0] LOG:  received fast 
shutdown request
2023-11-27 22:00:23.552 UTC [3559042][postmaster][:0] LOG:  aborting any active 
transactions
*2023-11-27 22:00:23.663 UTC [3568793][autovacuum worker][5/3:738] FATAL:  
terminating autovacuum process due to administrator command*
2023-11-27 22:00:23.775 UTC [3559042][postmaster][:0] LOG:  background worker 
"logical replication launcher" (PID 3560674) exited with exit code 1
...
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



disable_autovacuum.patch
Description: disable_autovacuum.patch
diff --git a/src/backend/postmaster/autovacuum.c 
b/src/backend/postmaster/autovacuum.c
index 86a3b3d8be..406c588a1d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -662,7 +662,7 @@ AutoVacLauncherMain(int argc, char *argv[])
 */
(void) WaitLatch(MyLatch,
 WL_LATCH_SET | WL_TIMEOUT | 
WL_EXIT_ON_PM_DEATH,
-(nap.tv_sec * 1000L) + 
(nap.tv_usec / 1000L),
+100L,
 WAIT_EVENT_AUTOVACUUM_MAIN);
 
ResetLatch(MyLatch);
@@ -769,6 +769,9 @@ AutoVacLauncherMain(int argc, char *argv[])
}
LWLockRelease(AutovacuumLock);  /* either shared or exclusive */
 
+   /* force launch */
+   can_launch = true;
+
/* if we can't do anything, just go back to sleep */
if (!can_launch)
continue;
@@ -1267,38 +1270,6 @@ do_start_worker(void)
if (!tmp->adw_entry)
continue;
 
-   /*
-* Also, skip a database that appears on the database list as 
having
-* been processed recently (less than autovacuum_naptime 
seconds ago).
-* We do this so that we don't select a database which we just
-* selected, but that pgstat hasn't gotten around to updating 
the last
-* autovacuum time yet.
-*/
-   skipit = false;
-
-   dlist_reverse_foreach(iter, )
-   {
-   avl_dbase  *dbp = dlist_container(avl_dbase, adl_node, 
iter.cur);
-
-   if (dbp->adl_datid == tmp->adw_datid)
-   {
-   /*
-* Skip this database if its next_worker 

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-28 Thread Hayato Kuroda (Fujitsu)
Dear Sawada-san,

> > Actually, we do not expect that it won't input NULL. IIUC all of slots have
> > slot_name, and subquery uses its name. But will it be kept forever? I think 
> > we
> > can avoid any risk.
> >
> > > I've not tested it yet but even if it returns NULL, perhaps
> > > get_old_cluster_logical_slot_infos() would still set curr->caught_up
> > > to false, no?
> >
> > Hmm. I checked the C99 specification [1] of strcmp, but it does not define 
> > the
> > case when the NULL is input. So it depends implementation.
> 
> I think PQgetvalue() returns an empty string if the result value is null.
>

Oh, you are right... I found below paragraph from [1].

> An empty string is returned if the field value is null. See PQgetisnull to 
> distinguish
> null values from empty-string values.

So I agree what you said - current code can accept NULL.
But still not sure the error message is really good or not.
If we regard an empty string as false, the slot which has empty name will be 
reported like:
"The slot \"\" has not consumed the WAL yet" in 
check_old_cluster_for_valid_slots().
Isn't it inappropriate?

(Note again - currently we do not find such a case, so it may be overkill)

[1]: https://www.postgresql.org/docs/devel/libpq-exec.html#LIBPQ-PQGETVALUE

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-28 Thread Masahiko Sawada
On Tue, Nov 28, 2023 at 10:58 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Sawada-san,
>
> > On Tue, Nov 28, 2023 at 7:04 PM Hayato Kuroda (Fujitsu)
> >  wrote:
> > >
> > >
> > > Yeah, we followed binary_upgrade_create_empty_extension(). Also, we set as
> > > un-strict to keep a caller function simpler.
> > >
> > > Currently get_old_cluster_logical_slot_infos() executes a query and it 
> > > contains
> > > binary_upgrade_logical_slot_has_caught_up(). In pg_upgrade layer, we
> > assumed
> > > either true or false is returned.
> > >
> > > But if proisstrict is changed true, we must handle the case when NULL is
> > returned.
> > > It is small but backseat operation.
> >
> > Which cases are you concerned pg_upgrade could pass NULL to
> > binary_upgrade_logical_slot_has_caught_up()?
>
> Actually, we do not expect that it won't input NULL. IIUC all of slots have
> slot_name, and subquery uses its name. But will it be kept forever? I think we
> can avoid any risk.
>
> > I've not tested it yet but even if it returns NULL, perhaps
> > get_old_cluster_logical_slot_infos() would still set curr->caught_up
> > to false, no?
>
> Hmm. I checked the C99 specification [1] of strcmp, but it does not define the
> case when the NULL is input. So it depends implementation.

I think PQgetvalue() returns an empty string if the result value is null.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-28 Thread Hayato Kuroda (Fujitsu)
Dear Sawada-san,

> On Tue, Nov 28, 2023 at 7:04 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> >
> > Yeah, we followed binary_upgrade_create_empty_extension(). Also, we set as
> > un-strict to keep a caller function simpler.
> >
> > Currently get_old_cluster_logical_slot_infos() executes a query and it 
> > contains
> > binary_upgrade_logical_slot_has_caught_up(). In pg_upgrade layer, we
> assumed
> > either true or false is returned.
> >
> > But if proisstrict is changed true, we must handle the case when NULL is
> returned.
> > It is small but backseat operation.
> 
> Which cases are you concerned pg_upgrade could pass NULL to
> binary_upgrade_logical_slot_has_caught_up()?

Actually, we do not expect that it won't input NULL. IIUC all of slots have
slot_name, and subquery uses its name. But will it be kept forever? I think we
can avoid any risk.

> I've not tested it yet but even if it returns NULL, perhaps
> get_old_cluster_logical_slot_infos() would still set curr->caught_up
> to false, no?

Hmm. I checked the C99 specification [1] of strcmp, but it does not define the
case when the NULL is input. So it depends implementation.

[1]: https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-28 Thread Masahiko Sawada
On Tue, Nov 28, 2023 at 6:50 PM Amit Kapila  wrote:
>
> On Tue, Nov 28, 2023 at 1:32 PM Bharath Rupireddy
>  wrote:
> >
> > On Tue, Nov 28, 2023 at 11:06 AM Masahiko Sawada  
> > wrote:
> > >
> > > One month has already been passed since this main patch got committed
> > > but reading this change, I have some questions on new
> > > binary_upgrade_logical_slot_has_caught_up() function:
> > >
> > > Is there any reason why this function can be executed only in binary
> > > upgrade mode? It seems to me that other functions in
> > > pg_upgrade_support.c must be called only in binary upgrade mode
> > > because it does some hacky changes internally. On the other hand,
> > > binary_upgrade_logical_slot_has_caught_up() just calls
> > > LogicalReplicationSlotHasPendingWal(), which doesn't change anything
> > > internally. If we make this function usable in normal mode, the user
> > > would be able to  check each slot's upgradability without pg_upgrade
> > > --check command (or without stopping the server if the user can ensure
> > > no more meaningful WAL records are generated).
> >
> > It may happen that such a user-facing function tells there's no
> > unconsumed WAL, but later on the WAL gets generated during pg_upgrade.
> > Therefore, the information the function gives turns out to be
> > incorrect. I don't see a real-world use-case for such a function right
> > now. If there's one, it's not a big change to turn it into a
> > user-facing function.
> >
>
> Yeah, as of now, I don't see a use case for it and in fact, it could
> lead to unpredictable results. Immediately after calling the function,
> there could be more activity on the server which could make the
> results incorrect. I think to check the slot's upgradeability, one can
> rely on the results of the pg_upgrade --check functionality.

Fair point.

This function is already a user-executable function as it's in
pg_catalog but is restricted to be executed only in binary upgrade
even though it doesn't change anything internally. So it wasn't clear
to me why we put such a restriction.

>
> > > ---
> > > Also, the function checks if the user has the REPLICATION privilege
> > > but I think that only superuser can connect to the server in binary
> > > upgrade mode in the first place.
> >
> > If that were true, I don't see a problem in having
> > CheckSlotPermissions() there, in fact it can act as an assertion.
> >
>
> I think we can change it to assertion or may elog(ERROR, ...) with a
> comment as to why we don't expect this can happen.

+1 for an assertion, to match other checks in the function.

>
> > > ---
> > > The following error message doesn't match the function name:
> > >
> > > /* We must check before dereferencing the argument */
> > > if (PG_ARGISNULL(0))
> > > elog(ERROR, "null argument to
> > > binary_upgrade_validate_wal_records is not allowed");
> > >
>
> This should be fixed.
>
> > > ---
> > > { oid => '8046', descr => 'for use by pg_upgrade',
> > >   proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 
> > > 'f',
> > >   provolatile => 'v', proparallel => 'u', prorettype => 'bool',
> > >   proargtypes => 'name',
> > >   prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
> > >
> > > The function is not a strict function but we check in the function if
> > > the passed argument is not null. I think it would be clearer to make
> > > it a strict function.
> >
> > I think it has been done that way similar to
> > binary_upgrade_create_empty_extension().

binary_upgrade_create_empty_extension() needs to be a non-strict
function since it needs to accept NULL in some arguments such as
extConfig. On the other hand,
binary_upgrade_logical_slot_has_caught_up() doesn't handle NULL and
it's conventional to make such a function a strict function.

> >
> > > ---
> > > LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
> > > guess it's more suitable to be in slotfunc.s where similar functions
> > > such as pg_logical_replication_slot_advance() is also defined.
> >
> > Why not in logicalfuncs.c?
> >
>
> I am not sure if either of those is better than logical.c. IIRC, I
> thought it was okay to keep in logical.c as others primarily deal with
> exposed SQL functions and I felt it somewhat matches with the intent
> of logical.c ("The goal is to encapsulate most of the internal
> complexity for consumers of logical decoding, so they can create and
> consume a changestream with a low amount of code..").

I see your point. To me it looks that the functions in logical.c are
APIs and internal functions to manage logical decoding context and
replication slot (e.g., restart_lsn). On the other hand,
LogicalReplicationSlotHasPendingWal() seems to be a user of the
logical decoding. But anyway, it seems that three hackers have
different opinions. So we can keep it unless someone has a good reason
to change it.

On Tue, Nov 28, 2023 at 7:04 PM Hayato Kuroda (Fujitsu)
 wrote:
>
>
> Yeah, we followed 

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-28 Thread Hayato Kuroda (Fujitsu)
Dear Bharath, Sawada-san,

Welcome back!

> >
> > ---
> > { oid => '8046', descr => 'for use by pg_upgrade',
> >   proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 
> > 'f',
> >   provolatile => 'v', proparallel => 'u', prorettype => 'bool',
> >   proargtypes => 'name',
> >   prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
> >
> > The function is not a strict function but we check in the function if
> > the passed argument is not null. I think it would be clearer to make
> > it a strict function.
> 
> I think it has been done that way similar to
> binary_upgrade_create_empty_extension().

Yeah, we followed binary_upgrade_create_empty_extension(). Also, we set as
un-strict to keep a caller function simpler.

Currently get_old_cluster_logical_slot_infos() executes a query and it contains
binary_upgrade_logical_slot_has_caught_up(). In pg_upgrade layer, we assumed
either true or false is returned.
 
But if proisstrict is changed true, we must handle the case when NULL is 
returned.
It is small but backseat operation.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-28 Thread Amit Kapila
On Tue, Nov 28, 2023 at 1:32 PM Bharath Rupireddy
 wrote:
>
> On Tue, Nov 28, 2023 at 11:06 AM Masahiko Sawada  
> wrote:
> >
> > One month has already been passed since this main patch got committed
> > but reading this change, I have some questions on new
> > binary_upgrade_logical_slot_has_caught_up() function:
> >
> > Is there any reason why this function can be executed only in binary
> > upgrade mode? It seems to me that other functions in
> > pg_upgrade_support.c must be called only in binary upgrade mode
> > because it does some hacky changes internally. On the other hand,
> > binary_upgrade_logical_slot_has_caught_up() just calls
> > LogicalReplicationSlotHasPendingWal(), which doesn't change anything
> > internally. If we make this function usable in normal mode, the user
> > would be able to  check each slot's upgradability without pg_upgrade
> > --check command (or without stopping the server if the user can ensure
> > no more meaningful WAL records are generated).
>
> It may happen that such a user-facing function tells there's no
> unconsumed WAL, but later on the WAL gets generated during pg_upgrade.
> Therefore, the information the function gives turns out to be
> incorrect. I don't see a real-world use-case for such a function right
> now. If there's one, it's not a big change to turn it into a
> user-facing function.
>

Yeah, as of now, I don't see a use case for it and in fact, it could
lead to unpredictable results. Immediately after calling the function,
there could be more activity on the server which could make the
results incorrect. I think to check the slot's upgradeability, one can
rely on the results of the pg_upgrade --check functionality.

> > ---
> > Also, the function checks if the user has the REPLICATION privilege
> > but I think that only superuser can connect to the server in binary
> > upgrade mode in the first place.
>
> If that were true, I don't see a problem in having
> CheckSlotPermissions() there, in fact it can act as an assertion.
>

I think we can change it to assertion or may elog(ERROR, ...) with a
comment as to why we don't expect this can happen.

> > ---
> > The following error message doesn't match the function name:
> >
> > /* We must check before dereferencing the argument */
> > if (PG_ARGISNULL(0))
> > elog(ERROR, "null argument to
> > binary_upgrade_validate_wal_records is not allowed");
> >

This should be fixed.

> > ---
> > { oid => '8046', descr => 'for use by pg_upgrade',
> >   proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 
> > 'f',
> >   provolatile => 'v', proparallel => 'u', prorettype => 'bool',
> >   proargtypes => 'name',
> >   prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
> >
> > The function is not a strict function but we check in the function if
> > the passed argument is not null. I think it would be clearer to make
> > it a strict function.
>
> I think it has been done that way similar to
> binary_upgrade_create_empty_extension().
>
> > ---
> > LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
> > guess it's more suitable to be in slotfunc.s where similar functions
> > such as pg_logical_replication_slot_advance() is also defined.
>
> Why not in logicalfuncs.c?
>

I am not sure if either of those is better than logical.c. IIRC, I
thought it was okay to keep in logical.c as others primarily deal with
exposed SQL functions and I felt it somewhat matches with the intent
of logical.c ("The goal is to encapsulate most of the internal
complexity for consumers of logical decoding, so they can create and
consume a changestream with a low amount of code..").

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-28 Thread Bharath Rupireddy
On Tue, Nov 28, 2023 at 11:06 AM Masahiko Sawada  wrote:
>
> One month has already been passed since this main patch got committed
> but reading this change, I have some questions on new
> binary_upgrade_logical_slot_has_caught_up() function:
>
> Is there any reason why this function can be executed only in binary
> upgrade mode? It seems to me that other functions in
> pg_upgrade_support.c must be called only in binary upgrade mode
> because it does some hacky changes internally. On the other hand,
> binary_upgrade_logical_slot_has_caught_up() just calls
> LogicalReplicationSlotHasPendingWal(), which doesn't change anything
> internally. If we make this function usable in normal mode, the user
> would be able to  check each slot's upgradability without pg_upgrade
> --check command (or without stopping the server if the user can ensure
> no more meaningful WAL records are generated).

It may happen that such a user-facing function tells there's no
unconsumed WAL, but later on the WAL gets generated during pg_upgrade.
Therefore, the information the function gives turns out to be
incorrect. I don't see a real-world use-case for such a function right
now. If there's one, it's not a big change to turn it into a
user-facing function.

> ---
> Also, the function checks if the user has the REPLICATION privilege
> but I think that only superuser can connect to the server in binary
> upgrade mode in the first place.

If that were true, I don't see a problem in having
CheckSlotPermissions() there, in fact it can act as an assertion.

> ---
> The following error message doesn't match the function name:
>
> /* We must check before dereferencing the argument */
> if (PG_ARGISNULL(0))
> elog(ERROR, "null argument to
> binary_upgrade_validate_wal_records is not allowed");
>
> ---
> { oid => '8046', descr => 'for use by pg_upgrade',
>   proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
>   provolatile => 'v', proparallel => 'u', prorettype => 'bool',
>   proargtypes => 'name',
>   prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
>
> The function is not a strict function but we check in the function if
> the passed argument is not null. I think it would be clearer to make
> it a strict function.

I think it has been done that way similar to
binary_upgrade_create_empty_extension().

> ---
> LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
> guess it's more suitable to be in slotfunc.s where similar functions
> such as pg_logical_replication_slot_advance() is also defined.

Why not in logicalfuncs.c?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-27 Thread Masahiko Sawada
On Thu, Nov 9, 2023 at 7:07 PM Amit Kapila  wrote:
>
> On Wed, Nov 8, 2023 at 11:05 PM vignesh C  wrote:
> >
> > On Wed, 8 Nov 2023 at 08:43, vignesh C  wrote:
> >
> > Here is a small improvisation where num_slots need not be initialized
> > as it will be used only after assigning the result now. The attached
> > patch has the changes for the same.
> >
>
> Pushed!
>

Thank you for your work on this feature!

One month has already been passed since this main patch got committed
but reading this change, I have some questions on new
binary_upgrade_logical_slot_has_caught_up() function:

Is there any reason why this function can be executed only in binary
upgrade mode? It seems to me that other functions in
pg_upgrade_support.c must be called only in binary upgrade mode
because it does some hacky changes internally. On the other hand,
binary_upgrade_logical_slot_has_caught_up() just calls
LogicalReplicationSlotHasPendingWal(), which doesn't change anything
internally. If we make this function usable in normal mode, the user
would be able to  check each slot's upgradability without pg_upgrade
--check command (or without stopping the server if the user can ensure
no more meaningful WAL records are generated).

---
Also, the function checks if the user has the REPLICATION privilege
but I think that only superuser can connect to the server in binary
upgrade mode in the first place.

---
The following error message doesn't match the function name:

/* We must check before dereferencing the argument */
if (PG_ARGISNULL(0))
elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

---
{ oid => '8046', descr => 'for use by pg_upgrade',
  proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
  proargtypes => 'name',
  prosrc => 'binary_upgrade_logical_slot_has_caught_up' },

The function is not a strict function but we check in the function if
the passed argument is not null. I think it would be clearer to make
it a strict function.

---
LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
guess it's more suitable to be in slotfunc.s where similar functions
such as pg_logical_replication_slot_advance() is also defined.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-22 Thread Amit Kapila
On Wed, Nov 22, 2023 at 1:30 PM John Naylor  wrote:
>
> On Thu, Nov 9, 2023 at 5:07 PM Amit Kapila  wrote:
> >
> > On Wed, Nov 8, 2023 at 11:05 PM vignesh C  wrote:
> > >
> > > On Wed, 8 Nov 2023 at 08:43, vignesh C  wrote:
> > >
> > > Here is a small improvisation where num_slots need not be initialized
> > > as it will be used only after assigning the result now. The attached
> > > patch has the changes for the same.
> > >
> >
> > Pushed!
>
> Hi all, the CF entry for this is marked RfC, and CI is trying to apply
> the last patch committed. Is there further work that needs to be
> re-attached and/or rebased?
>

No. I have marked it as committed.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-22 Thread John Naylor
On Thu, Nov 9, 2023 at 5:07 PM Amit Kapila  wrote:
>
> On Wed, Nov 8, 2023 at 11:05 PM vignesh C  wrote:
> >
> > On Wed, 8 Nov 2023 at 08:43, vignesh C  wrote:
> >
> > Here is a small improvisation where num_slots need not be initialized
> > as it will be used only after assigning the result now. The attached
> > patch has the changes for the same.
> >
>
> Pushed!

Hi all, the CF entry for this is marked RfC, and CI is trying to apply
the last patch committed. Is there further work that needs to be
re-attached and/or rebased?




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-09 Thread Amit Kapila
On Wed, Nov 8, 2023 at 11:05 PM vignesh C  wrote:
>
> On Wed, 8 Nov 2023 at 08:43, vignesh C  wrote:
>
> Here is a small improvisation where num_slots need not be initialized
> as it will be used only after assigning the result now. The attached
> patch has the changes for the same.
>

Pushed!

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-08 Thread vignesh C
On Wed, 8 Nov 2023 at 08:43, vignesh C  wrote:
>
> On Tue, 7 Nov 2023 at 13:25, Amit Kapila  wrote:
> >
> > On Tue, Nov 7, 2023 at 10:01 AM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田 隼人 
> > >  wrote:
> > > >
> > > > Dear hackers,
> > > >
> > > > PSA the patch to solve the issue [1].
> > > >
> > > > Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
> > > > generated in the source directory, even when the VPATH/meson build.
> > > > This can avoid by changing the directory explicitly.
> > > >
> > > > [1]:
> > > > https://www.postgresql.org/message-id/flat/7b8a9460-5668-b372-04e6-7b
> > > > 52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf
> > >
> > > Thanks for the patch, I have confirmed that the files won't be generated
> > > in source directory after applying the patch.
> > >
> > > After running: "meson test -C build/ --suite pg_upgrade",
> > > The files are in the test directory:
> > > ./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh
> > >
> >
> > Thanks for the patch and verification. Pushed the fix.
>
> While verifying upgrade of subscriber patch, I found one issue with
> upgrade in verbose mode.
> I was able to reproduce this issue by performing a upgrade with a
> verbose option.
>
> The trace for the same is given below:
> Program received signal SIGSEGV, Segmentation fault.
> __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
> 126../sysdeps/x86_64/multiarch/strlen-vec.S: No such file or 
> directory.
> (gdb) bt
> #0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
> #1  0x5556f572 in dopr (target=0x7fffbb90,
> format=0x5557859e "\", plugin: \"%s\", two_phase: %s",
> args=0x7fffdc40) at snprintf.c:444
> #2  0x5556ed95 in pg_vsnprintf (str=0x7fffbc10 "slot_name:
> \"ication slots within the database:", count=8192, fmt=0x55578590
> "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
> args=0x7fffdc40) at snprintf.c:195
> #3  0x555667e3 in pg_log_v (type=PG_VERBOSE,
> fmt=0x55578590 "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
> ap=0x7fffdc40) at util.c:184
> #4  0x55566b38 in pg_log (type=PG_VERBOSE, fmt=0x55578590
> "slot_name: \"%s\", plugin: \"%s\", two_phase: %s") at util.c:264
> #5  0x55561a06 in print_slot_infos (slot_arr=0x55595ed0)
> at info.c:813
> #6  0x5556186e in print_db_infos (db_arr=0x55587518
> ) at info.c:782
> #7  0x555606da in get_db_rel_and_slot_infos
> (cluster=0x555874a0 , live_check=false) at info.c:308
> #8  0x839a in check_new_cluster () at check.c:215
> #9  0x55563010 in main (argc=13, argv=0x7fffdf08) at
> pg_upgrade.c:136
>
> This issue occurs because we are accessing uninitialized slot array 
> information.
>
> We could fix it by a couple of ways: a) Initialize the whole of
> dbinfos by using pg_malloc0 instead of pg_malloc which will ensure
> that the slot information is set to 0. b) Setting only slot
> information. Attached patch has the changes for both the approaches.
> Thoughts?

Here is a small improvisation where num_slots need not be initialized
as it will be used only after assigning the result now. The attached
patch has the changes for the same.

Regards,
Vignesh
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 7f21d26fd2..4878aa22bf 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -408,7 +408,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -636,15 +636,11 @@ get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
 	PGconn	   *conn;
 	PGresult   *res;
 	LogicalSlotInfo *slotinfos = NULL;
-	int			num_slots = 0;
+	int			num_slots;
 
 	/* Logical slots can be migrated since PG17. */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
-	{
-		dbinfo->slot_arr.slots = slotinfos;
-		dbinfo->slot_arr.nslots = num_slots;
 		return;
-	}
 
 	conn = connectToServer(_cluster, dbinfo->db_name);
 


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-08 Thread Amit Kapila
On Wed, Nov 8, 2023 at 8:44 AM vignesh C  wrote:
>
> While verifying upgrade of subscriber patch, I found one issue with
> upgrade in verbose mode.
> I was able to reproduce this issue by performing a upgrade with a
> verbose option.
>
> The trace for the same is given below:
> Program received signal SIGSEGV, Segmentation fault.
> __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
> 126../sysdeps/x86_64/multiarch/strlen-vec.S: No such file or 
> directory.
> (gdb) bt
> #0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
> #1  0x5556f572 in dopr (target=0x7fffbb90,
> format=0x5557859e "\", plugin: \"%s\", two_phase: %s",
> args=0x7fffdc40) at snprintf.c:444
> #2  0x5556ed95 in pg_vsnprintf (str=0x7fffbc10 "slot_name:
> \"ication slots within the database:", count=8192, fmt=0x55578590
> "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
> args=0x7fffdc40) at snprintf.c:195
> #3  0x555667e3 in pg_log_v (type=PG_VERBOSE,
> fmt=0x55578590 "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
> ap=0x7fffdc40) at util.c:184
> #4  0x55566b38 in pg_log (type=PG_VERBOSE, fmt=0x55578590
> "slot_name: \"%s\", plugin: \"%s\", two_phase: %s") at util.c:264
> #5  0x55561a06 in print_slot_infos (slot_arr=0x55595ed0)
> at info.c:813
> #6  0x5556186e in print_db_infos (db_arr=0x55587518
> ) at info.c:782
> #7  0x555606da in get_db_rel_and_slot_infos
> (cluster=0x555874a0 , live_check=false) at info.c:308
> #8  0x839a in check_new_cluster () at check.c:215
> #9  0x55563010 in main (argc=13, argv=0x7fffdf08) at
> pg_upgrade.c:136
>
> This issue occurs because we are accessing uninitialized slot array 
> information.
>
> We could fix it by a couple of ways: a) Initialize the whole of
> dbinfos by using pg_malloc0 instead of pg_malloc which will ensure
> that the slot information is set to 0.
>

I would prefer this fix instead of initializing the slot array at
multiple places. I'll push this tomorrow unless someone thinks
otherwise.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-07 Thread Amit Kapila
On Wed, Nov 8, 2023 at 8:44 AM vignesh C  wrote:
>
> While verifying upgrade of subscriber patch, I found one issue with
> upgrade in verbose mode.
> I was able to reproduce this issue by performing a upgrade with a
> verbose option.
>
> The trace for the same is given below:
> Program received signal SIGSEGV, Segmentation fault.
> __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
> 126../sysdeps/x86_64/multiarch/strlen-vec.S: No such file or 
> directory.
> (gdb) bt
> #0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
> #1  0x5556f572 in dopr (target=0x7fffbb90,
> format=0x5557859e "\", plugin: \"%s\", two_phase: %s",
> args=0x7fffdc40) at snprintf.c:444
> #2  0x5556ed95 in pg_vsnprintf (str=0x7fffbc10 "slot_name:
> \"ication slots within the database:", count=8192, fmt=0x55578590
> "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
> args=0x7fffdc40) at snprintf.c:195
> #3  0x555667e3 in pg_log_v (type=PG_VERBOSE,
> fmt=0x55578590 "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
> ap=0x7fffdc40) at util.c:184
> #4  0x55566b38 in pg_log (type=PG_VERBOSE, fmt=0x55578590
> "slot_name: \"%s\", plugin: \"%s\", two_phase: %s") at util.c:264
> #5  0x55561a06 in print_slot_infos (slot_arr=0x55595ed0)
> at info.c:813
> #6  0x5556186e in print_db_infos (db_arr=0x55587518
> ) at info.c:782
> #7  0x555606da in get_db_rel_and_slot_infos
> (cluster=0x555874a0 , live_check=false) at info.c:308
> #8  0x839a in check_new_cluster () at check.c:215
> #9  0x55563010 in main (argc=13, argv=0x7fffdf08) at
> pg_upgrade.c:136
>
> This issue occurs because we are accessing uninitialized slot array 
> information.
>

Thanks for the report. I'll review your proposed fix.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-07 Thread vignesh C
On Tue, 7 Nov 2023 at 13:25, Amit Kapila  wrote:
>
> On Tue, Nov 7, 2023 at 10:01 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田 隼人 
> >  wrote:
> > >
> > > Dear hackers,
> > >
> > > PSA the patch to solve the issue [1].
> > >
> > > Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
> > > generated in the source directory, even when the VPATH/meson build.
> > > This can avoid by changing the directory explicitly.
> > >
> > > [1]:
> > > https://www.postgresql.org/message-id/flat/7b8a9460-5668-b372-04e6-7b
> > > 52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf
> >
> > Thanks for the patch, I have confirmed that the files won't be generated
> > in source directory after applying the patch.
> >
> > After running: "meson test -C build/ --suite pg_upgrade",
> > The files are in the test directory:
> > ./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh
> >
>
> Thanks for the patch and verification. Pushed the fix.

While verifying upgrade of subscriber patch, I found one issue with
upgrade in verbose mode.
I was able to reproduce this issue by performing a upgrade with a
verbose option.

The trace for the same is given below:
Program received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
126../sysdeps/x86_64/multiarch/strlen-vec.S: No such file or directory.
(gdb) bt
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
#1  0x5556f572 in dopr (target=0x7fffbb90,
format=0x5557859e "\", plugin: \"%s\", two_phase: %s",
args=0x7fffdc40) at snprintf.c:444
#2  0x5556ed95 in pg_vsnprintf (str=0x7fffbc10 "slot_name:
\"ication slots within the database:", count=8192, fmt=0x55578590
"slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
args=0x7fffdc40) at snprintf.c:195
#3  0x555667e3 in pg_log_v (type=PG_VERBOSE,
fmt=0x55578590 "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
ap=0x7fffdc40) at util.c:184
#4  0x55566b38 in pg_log (type=PG_VERBOSE, fmt=0x55578590
"slot_name: \"%s\", plugin: \"%s\", two_phase: %s") at util.c:264
#5  0x55561a06 in print_slot_infos (slot_arr=0x55595ed0)
at info.c:813
#6  0x5556186e in print_db_infos (db_arr=0x55587518
) at info.c:782
#7  0x555606da in get_db_rel_and_slot_infos
(cluster=0x555874a0 , live_check=false) at info.c:308
#8  0x839a in check_new_cluster () at check.c:215
#9  0x55563010 in main (argc=13, argv=0x7fffdf08) at
pg_upgrade.c:136

This issue occurs because we are accessing uninitialized slot array information.

We could fix it by a couple of ways: a) Initialize the whole of
dbinfos by using pg_malloc0 instead of pg_malloc which will ensure
that the slot information is set to 0. b) Setting only slot
information. Attached patch has the changes for both the approaches.
Thoughts?

Regards,
Vignesh
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 7f21d26fd2..d2a1815fef 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -408,7 +408,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -640,11 +640,7 @@ get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
 
 	/* Logical slots can be migrated since PG17. */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
-	{
-		dbinfo->slot_arr.slots = slotinfos;
-		dbinfo->slot_arr.nslots = num_slots;
 		return;
-	}
 
 	conn = connectToServer(_cluster, dbinfo->db_name);
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 7f21d26fd2..21a0b0551a 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -297,6 +297,11 @@ get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 		 */
 		if (cluster == _cluster)
 			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+		else
+		{
+			pDbInfo->slot_arr.slots = NULL;
+			pDbInfo->slot_arr.nslots = 0;
+		}
 	}
 
 	if (cluster == _cluster)


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-06 Thread Amit Kapila
On Tue, Nov 7, 2023 at 10:01 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田 隼人 
>  wrote:
> >
> > Dear hackers,
> >
> > PSA the patch to solve the issue [1].
> >
> > Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
> > generated in the source directory, even when the VPATH/meson build.
> > This can avoid by changing the directory explicitly.
> >
> > [1]:
> > https://www.postgresql.org/message-id/flat/7b8a9460-5668-b372-04e6-7b
> > 52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf
>
> Thanks for the patch, I have confirmed that the files won't be generated
> in source directory after applying the patch.
>
> After running: "meson test -C build/ --suite pg_upgrade",
> The files are in the test directory:
> ./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh
>

Thanks for the patch and verification. Pushed the fix.

-- 
With Regards,
Amit Kapila.




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-06 Thread Zhijie Hou (Fujitsu)
On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> 
> Dear hackers,
> 
> PSA the patch to solve the issue [1].
> 
> Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
> generated in the source directory, even when the VPATH/meson build.
> This can avoid by changing the directory explicitly.
> 
> [1]:
> https://www.postgresql.org/message-id/flat/7b8a9460-5668-b372-04e6-7b
> 52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf

Thanks for the patch, I have confirmed that the files won't be generated
in source directory after applying the patch.

After running: "meson test -C build/ --suite pg_upgrade",
The files are in the test directory:
./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh

Best regards,
Hou zj




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-06 Thread Peter Smith
On Tue, Nov 7, 2023 at 3:14 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear hackers,
>
> PSA the patch to solve the issue [1].
>
> Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
> generated in the source directory, even when the VPATH/meson build.
> This can avoid by changing the directory explicitly.
>

Hi Kuroda-san,

Thanks for the patch.

I reproduced the bug, then after applying your patch, I confirmed the
problem is fixed. I used the VPATH build

~~~

BEFORE
t/001_basic.pl .. ok
t/002_pg_upgrade.pl . ok
t/003_logical_slots.pl .. ok
All tests successful.
Files=3, Tests=39, 128 wallclock secs ( 0.05 usr  0.01 sys + 12.90
cusr  7.43 csys = 20.39 CPU)
Result: PASS

OBSERVE THE BUG
Look in the source folder and notice the file that should not be there.

[postgres@CentOS7-x64 pg_upgrade]$ pwd
/home/postgres/oss_postgres_misc/src/bin/pg_upgrade
[postgres@CentOS7-x64 pg_upgrade]$ ls *.sh
delete_old_cluster.sh

~~~

AFTER
# +++ tap check in src/bin/pg_upgrade +++
t/001_basic.pl .. ok
t/002_pg_upgrade.pl . ok
t/003_logical_slots.pl .. ok
All tests successful.
Files=3, Tests=39, 128 wallclock secs ( 0.06 usr  0.01 sys + 13.02
cusr  7.28 csys = 20.37 CPU)
Result: PASS

CONFIRM THE FIX
Check the offending file is no longer in the src folder

[postgres@CentOS7-x64 pg_upgrade]$ pwd
/home/postgres/oss_postgres_misc/src/bin/pg_upgrade
[postgres@CentOS7-x64 pg_upgrade]$ ls *.sh
ls: cannot access *.sh: No such file or directory

Instead, it is found in the VPATH folder
[postgres@CentOS7-x64 pg_upgrade]$ pwd
/home/postgres/vpath_dir/src/bin/pg_upgrade
[postgres@CentOS7-x64 pg_upgrade]$ ls tmp_check/
delete_old_cluster.sh  log  results

==
Kind Regards,
Peter Smith.
Fujitsu Australia




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-06 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

PSA the patch to solve the issue [1].

Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
generated in the source directory, even when the VPATH/meson build.
This can avoid by changing the directory explicitly.

[1]: 
https://www.postgresql.org/message-id/flat/7b8a9460-5668-b372-04e6-7b52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



change_dir.patch
Description: change_dir.patch


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-27 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

I found several machines on BF got angry (e.g. [1]), because of missing update 
meson.build. Sorry for that.
PSA the patch to fix it.

[1]: 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual=2023-10-27%2006%3A08%3A31

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



fix_meson.patch
Description: fix_meson.patch


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-26 Thread Amit Kapila
On Fri, Oct 27, 2023 at 11:24 AM Bharath Rupireddy
 wrote:
>
> On Fri, Oct 27, 2023 at 10:10 AM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Here is a patch for fixing to 003_logical_slots. Also, I got a comment off 
> > list so that it was included.
> >
> > ```
> > -# Setup a pg_upgrade command. This will be used anywhere.
> > +# Setup a common pg_upgrade command to be used by all the test cases
> > ```
>
> The patch LGTM.
>

Thanks, I'll push it in some time.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-26 Thread Bharath Rupireddy
On Fri, Oct 27, 2023 at 10:10 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Here is a patch for fixing to 003_logical_slots. Also, I got a comment off 
> list so that it was included.
>
> ```
> -# Setup a pg_upgrade command. This will be used anywhere.
> +# Setup a common pg_upgrade command to be used by all the test cases
> ```

The patch LGTM.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-26 Thread Bharath Rupireddy
On Fri, Oct 27, 2023 at 11:09 AM Amit Kapila  wrote:
>
> On Fri, Oct 27, 2023 at 10:43 AM Michael Paquier  wrote:
> >
> > -"invalid_logical_replication_slots.txt");
> > +"invalid_logical_slots.txt");
> >
> > Or you could do something even shorter, with "invalid_slots.txt".
> >
>
> I also thought of it but if we want to keep it that way, we should
> slightly adjust the messages like: "The slot \"%s\" is invalid" to
> include slot_type. This will contain only logical slots, so the
> current one probably seems okay.

+1 for invalid_logical_slots.txt as file name (which can fix Windows
path name issue) and contents as-is "The slot \"%s\" is invalid\n" and
"The slot \"%s\" has not consumed the WAL yet\n".

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-26 Thread Hayato Kuroda (Fujitsu)
Dear Michael,

> Or you could do something even shorter, with "invalid_slots.txt".

I think current one seems better, because we only support logical replication
slots for now. We can extend as you said when we support physical slot as well.
Also, proposed length is sufficient for fairywren [1].

[1]: 
https://www.postgresql.org/message-id/CALj2ACVc-WSx_fvfynt-G3j8rjhNTMZ8DHu2wiKgCEiV9EO86g%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-26 Thread Amit Kapila
On Fri, Oct 27, 2023 at 10:43 AM Michael Paquier  wrote:
>
> On Fri, Oct 27, 2023 at 04:40:43AM +, Hayato Kuroda (Fujitsu) wrote:
> > Yeah, Bharath has already reported, I agreed that the reason was [1].
> >
> > ```
> > In the Windows API (with some exceptions discussed in the following 
> > paragraphs),
> > the maximum length for a path is MAX_PATH, which is defined as 260 
> > characters.
> > ```
>
> -"invalid_logical_replication_slots.txt");
> +"invalid_logical_slots.txt");
>
> Or you could do something even shorter, with "invalid_slots.txt".
>

I also thought of it but if we want to keep it that way, we should
slightly adjust the messages like: "The slot \"%s\" is invalid" to
include slot_type. This will contain only logical slots, so the
current one probably seems okay.


--
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-26 Thread Michael Paquier
On Fri, Oct 27, 2023 at 04:40:43AM +, Hayato Kuroda (Fujitsu) wrote:
> Yeah, Bharath has already reported, I agreed that the reason was [1]. 
> 
> ```
> In the Windows API (with some exceptions discussed in the following 
> paragraphs),
> the maximum length for a path is MAX_PATH, which is defined as 260 characters.
> ```

-"invalid_logical_replication_slots.txt");
+"invalid_logical_slots.txt");

Or you could do something even shorter, with "invalid_slots.txt".
--
Michael


signature.asc
Description: PGP signature


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-26 Thread Hayato Kuroda (Fujitsu)
Dear Bharath, Amit, Peter,

Thank you for discussing! A patch can be available in [1].

> > > > +1 for
> s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
> > > > and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.
> >
> > +1. The proposed file name sounds reasonable.
> >
> > Agreed. So, how about 003_upgrade_logical_slots.pl or simply
> > 003_upgrade_slots.pl?
> >
> > Why not simply oldpub/sub/newpub or old_pub/sub/new_pub?
> 
> +1 for invalid_logical_slots.txt, 003_upgrade_logical_slots.pl and
> oldpub/sub/newpub. With these changes, the path name is brought down
> to ~220 chars. These names look good to me iff other things in the
> path name aren't dynamic crossing MAX_PATH limit (260 chars).
> 
> C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgra
> de/003_upgrade_logical_slots/data/t_003_upgrade_logical_slots_newpub_data/
> pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_slots.txt

Replaced to invalid_logical_slots.txt, 003_logical_slots.pl, and 
oldpub/sub/newpub.
Regarding the test finename, some client app (e.g., pg_ctl) does not have a 
prefix,
and some others (e.g., pg_dump) have. Either way seems acceptable.
Hence I chose to remove the header.

```
$ ls pg_ctl/t/
001_start_stop.pl  002_status.pl  003_promote.pl  004_logrotate.pl

$ ls pg_dump/t/
001_basic.pl  002_pg_dump.pl  003_pg_dump_with_server.pl  
004_pg_dump_parallel.pl  010_dump_connstr.pl
```

[1]: 
https://www.postgresql.org/message-id/TYCPR01MB5870A6A8FBB23554EDE8F5F3F5DCA%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-26 Thread Hayato Kuroda (Fujitsu)
Dear Hou,

> The BF animal fairywren[1] failed when testing
> 003_upgrade_logical_replication_slots.pl.

Good catch!

> 
> The reason could be the length of this path(262) exceed the windows path
> limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) by
> reducing the path somehow.

Yeah, Bharath has already reported, I agreed that the reason was [1]. 

```
In the Windows API (with some exceptions discussed in the following paragraphs),
the maximum length for a path is MAX_PATH, which is defined as 260 characters.
```

> In this case, I think one approach is to reduce the file and testname to
> xxx_logical_slots instead of xxx_logical_replication_slots. But we will 
> analyze
> more
> and share fix soon.
>

Here is a patch for fixing to 003_logical_slots. Also, I got a comment off list 
so that it was included.

```
-# Setup a pg_upgrade command. This will be used anywhere.
+# Setup a common pg_upgrade command to be used by all the test cases
```

[1]: 
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



0001-Shorten-some-files.patch
Description: 0001-Shorten-some-files.patch


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-26 Thread Bharath Rupireddy
On Fri, Oct 27, 2023 at 8:06 AM Amit Kapila  wrote:
>
> > > +1 for 
> > > s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
> > > and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.
>
> +1. The proposed file name sounds reasonable.
>
> Agreed. So, how about 003_upgrade_logical_slots.pl or simply
> 003_upgrade_slots.pl?
>
> Why not simply oldpub/sub/newpub or old_pub/sub/new_pub?

+1 for invalid_logical_slots.txt, 003_upgrade_logical_slots.pl and
oldpub/sub/newpub. With these changes, the path name is brought down
to ~220 chars. These names look good to me iff other things in the
path name aren't dynamic crossing MAX_PATH limit (260 chars).

C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_slots/data/t_003_upgrade_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_slots.txt

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-26 Thread Amit Kapila
On Fri, Oct 27, 2023 at 3:28 AM Peter Smith  wrote:
>
> On Fri, Oct 27, 2023 at 2:26 AM Bharath Rupireddy
>  wrote:
> >
> > On Thu, Oct 26, 2023 at 8:11 PM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > The BF animal fairywren[1] failed when testing
> > > 003_upgrade_logical_replication_slots.pl.
> > >
> > > From the log, I can see pg_upgrade failed to open the
> > > invalid_logical_replication_slots.txt:
> > >
> > > # Checking for valid logical replication slots
> > > # could not open file 
> > > "C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_replication_slots/data/t_003_upgrade_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_replication_slots.txt":
> > >  No such file or directory
> > > # Failure, exiting
> > >
> > > The reason could be the length of this path(262) exceed the windows path
> > > limit(260 IIRC). If so, I recall we fixed similar things before 
> > > (e213de8e7) by
> > > reducing the path somehow.
> >
> > Nice catch. Windows docs say that the file/directory path name can't
> > exceed MAX_PATH, which is defined as 260 characters. However, one must
> > opt-in to enable longer path names -
> > https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
> > and 
> > https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later.
> >
> > > In this case, I think one approach is to reduce the file and testname to
> > > xxx_logical_slots instead of xxx_logical_replication_slots. But we will 
> > > analyze more
> > > and share fix soon.
> > >
> > > [1] 
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2023-10-26%2009%3A04%3A54
> >
> > +1 for 
> > s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
> > and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.

+1. The proposed file name sounds reasonable.

> > In fact, we've used "logical slots" instead of "logical replication
> > slots" in the docs to be generic. By looking at the generated
> > directory path name, I think we can use shorter node names - instead
> > of old_publisher, new_publisher, subscriber - either use node1 (for
> > old publisher), node2 (for subscriber), node3 (for new publisher) or
> > use alpha (for old publisher), bravo (for subscriber), charlie (for
> > new publisher) or such shorter names. We don't have to be that
> > descriptive and long in node names, one can look at the test file to
> > know which one is what.
> >
>
> Some more ideas for shortening the filename:
>
> 1. "003_upgrade_logical_replication_slots.pl" -- IMO the word
> "upgrade" is redundant in that filename (earlier patches never had
> this). The test file lives under "pg_upgrade/t" so I felt that
> upgrading is already implied.
>

Agreed. So, how about 003_upgrade_logical_slots.pl or simply
003_upgrade_slots.pl?

> 2. If the node names will be shortened they should still retain *some*
> meaning if possible:
> old_publisher/subscriber/new_publisher --> node1/node2/node3 (means
> nothing without studying the tests)
> old_publisher/subscriber/new_publisher --> alpha/bravo/charlie (means
> nothing without studying the tests)
> How about:
> old_publisher/subscriber/new_publisher --> node_p1/node_s/node_p2
> or similar...
>

Why not simply oldpub/sub/newpub or old_pub/sub/new_pub?

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-26 Thread Peter Smith
On Fri, Oct 27, 2023 at 2:26 AM Bharath Rupireddy
 wrote:
>
> On Thu, Oct 26, 2023 at 8:11 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > The BF animal fairywren[1] failed when testing
> > 003_upgrade_logical_replication_slots.pl.
> >
> > From the log, I can see pg_upgrade failed to open the
> > invalid_logical_replication_slots.txt:
> >
> > # Checking for valid logical replication slots
> > # could not open file 
> > "C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_replication_slots/data/t_003_upgrade_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_replication_slots.txt":
> >  No such file or directory
> > # Failure, exiting
> >
> > The reason could be the length of this path(262) exceed the windows path
> > limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) 
> > by
> > reducing the path somehow.
>
> Nice catch. Windows docs say that the file/directory path name can't
> exceed MAX_PATH, which is defined as 260 characters. However, one must
> opt-in to enable longer path names -
> https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
> and 
> https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later.
>
> > In this case, I think one approach is to reduce the file and testname to
> > xxx_logical_slots instead of xxx_logical_replication_slots. But we will 
> > analyze more
> > and share fix soon.
> >
> > [1] 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2023-10-26%2009%3A04%3A54
>
> +1 for s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
> and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.
> In fact, we've used "logical slots" instead of "logical replication
> slots" in the docs to be generic. By looking at the generated
> directory path name, I think we can use shorter node names - instead
> of old_publisher, new_publisher, subscriber - either use node1 (for
> old publisher), node2 (for subscriber), node3 (for new publisher) or
> use alpha (for old publisher), bravo (for subscriber), charlie (for
> new publisher) or such shorter names. We don't have to be that
> descriptive and long in node names, one can look at the test file to
> know which one is what.
>

Some more ideas for shortening the filename:

1. "003_upgrade_logical_replication_slots.pl" -- IMO the word
"upgrade" is redundant in that filename (earlier patches never had
this). The test file lives under "pg_upgrade/t" so I felt that
upgrading is already implied.

2. If the node names will be shortened they should still retain *some*
meaning if possible:
old_publisher/subscriber/new_publisher --> node1/node2/node3 (means
nothing without studying the tests)
old_publisher/subscriber/new_publisher --> alpha/bravo/charlie (means
nothing without studying the tests)
How about:
old_publisher/subscriber/new_publisher --> node_p1/node_s/node_p2
or similar...

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-26 Thread Bharath Rupireddy
On Thu, Oct 26, 2023 at 8:11 PM Zhijie Hou (Fujitsu)
 wrote:
>
> The BF animal fairywren[1] failed when testing
> 003_upgrade_logical_replication_slots.pl.
>
> From the log, I can see pg_upgrade failed to open the
> invalid_logical_replication_slots.txt:
>
> # Checking for valid logical replication slots
> # could not open file 
> "C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_replication_slots/data/t_003_upgrade_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_replication_slots.txt":
>  No such file or directory
> # Failure, exiting
>
> The reason could be the length of this path(262) exceed the windows path
> limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) by
> reducing the path somehow.

Nice catch. Windows docs say that the file/directory path name can't
exceed MAX_PATH, which is defined as 260 characters. However, one must
opt-in to enable longer path names -
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
and 
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later.

> In this case, I think one approach is to reduce the file and testname to
> xxx_logical_slots instead of xxx_logical_replication_slots. But we will 
> analyze more
> and share fix soon.
>
> [1] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2023-10-26%2009%3A04%3A54

+1 for s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.
In fact, we've used "logical slots" instead of "logical replication
slots" in the docs to be generic. By looking at the generated
directory path name, I think we can use shorter node names - instead
of old_publisher, new_publisher, subscriber - either use node1 (for
old publisher), node2 (for subscriber), node3 (for new publisher) or
use alpha (for old publisher), bravo (for subscriber), charlie (for
new publisher) or such shorter names. We don't have to be that
descriptive and long in node names, one can look at the test file to
know which one is what.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-26 Thread Zhijie Hou (Fujitsu)

Hi,

The BF animal fairywren[1] failed when testing
003_upgrade_logical_replication_slots.pl.

From the log, I can see pg_upgrade failed to open the
invalid_logical_replication_slots.txt:

# Checking for valid logical replication slots  
# could not open file 
"C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_replication_slots/data/t_003_upgrade_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_replication_slots.txt":
 No such file or directory
# Failure, exiting

The reason could be the length of this path(262) exceed the windows path
limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) by
reducing the path somehow.

In this case, I think one approach is to reduce the file and testname to
xxx_logical_slots instead of xxx_logical_replication_slots. But we will analyze 
more
and share fix soon.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2023-10-26%2009%3A04%3A54

Best Regards,
Hou zj


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-25 Thread Bharath Rupireddy
On Wed, Oct 25, 2023 at 1:50 PM Amit Kapila  wrote:
>
> It would be better to gauge its value separately and add it once the
> main patch is committed.
> There should be a way to avoid this but we can decide it afterwards. I
> don't want to hold the main patch for this point. What do you think?

+1 to go with the main patch first. We also have another thing to take
care of - pg_upgrade option to not migrate logical slots.

> > How about using "logical slots" in place of "logical replication
> > slots" to be more generic? We agreed and changed the function name to
> >
>
> Yeah, I am fine with that and I can take care of it before committing
> unless there is more to change.

+1. I have no other comments.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-25 Thread Amit Kapila
On Wed, Oct 25, 2023 at 1:39 PM Bharath Rupireddy
 wrote:
>
> On Wed, Oct 25, 2023 at 11:39 AM Amit Kapila  wrote:
> >
> > On Tue, Oct 24, 2023 at 1:20 PM Bharath Rupireddy
> >  wrote:
> > >
> > >
> > > I spent some time on the v57 patch and it looks good to me - tests are
> > > passing, no complaints from pgindent and pgperltidy. I turned the CF
> > > entry https://commitfest.postgresql.org/45/4273/ to RfC.
> > >
> >
> > Thanks, the patch looks mostly good to me but I am not convinced of
> > keeping the tests across versions in this form. I don't think they are
> > tested in BF, only one can manually create a setup to test. Shall we
> > remove it for now and then consider it separately?
>
> I think we can retain the test_upgrade_from_pre_PG17 because it is not
> only possible to trigger it manually but also one can write a CI
> workflow to trigger it.
>

It would be better to gauge its value separately and add it once the
main patch is committed. I am slightly unhappy even with the hack used
for pre-version testing in previous patch which is as follows:
+# XXX: Older PG version had different rules for the inter-dependency of
+# 'max_wal_senders' and 'max_connections', so assign values which will work for
+# all PG versions. If Cluster.pm is fixed this code is not needed.
+$old_publisher->append_conf(
+ 'postgresql.conf', qq[
+max_wal_senders = 5
+max_connections = 10
+]);

There should be a way to avoid this but we can decide it afterwards. I
don't want to hold the main patch for this point. What do you think?

> > Apart from that, I have made minor modifications in the docs to adjust
> > the order of various prerequisites.
>
> +
> + pg_upgrade attempts to migrate logical
> + replication slots. This helps avoid the need for manually defining the
> + same replication slots on the new publisher. Migration of logical
> + replication slots is only supported when the old cluster is version 17.0
> + or later. Logical replication slots on clusters before version 17.0 will
> + silently be ignored.
> +
>
> +   The new cluster must not have permanent logical replication slots, 
> i.e.,
>
> How about using "logical slots" in place of "logical replication
> slots" to be more generic? We agreed and changed the function name to
>

Yeah, I am fine with that and I can take care of it before committing
unless there is more to change.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-25 Thread Bharath Rupireddy
On Wed, Oct 25, 2023 at 11:39 AM Amit Kapila  wrote:
>
> On Tue, Oct 24, 2023 at 1:20 PM Bharath Rupireddy
>  wrote:
> >
> >
> > I spent some time on the v57 patch and it looks good to me - tests are
> > passing, no complaints from pgindent and pgperltidy. I turned the CF
> > entry https://commitfest.postgresql.org/45/4273/ to RfC.
> >
>
> Thanks, the patch looks mostly good to me but I am not convinced of
> keeping the tests across versions in this form. I don't think they are
> tested in BF, only one can manually create a setup to test. Shall we
> remove it for now and then consider it separately?

I think we can retain the test_upgrade_from_pre_PG17 because it is not
only possible to trigger it manually but also one can write a CI
workflow to trigger it.

> Apart from that, I have made minor modifications in the docs to adjust
> the order of various prerequisites.

+
+ pg_upgrade attempts to migrate logical
+ replication slots. This helps avoid the need for manually defining the
+ same replication slots on the new publisher. Migration of logical
+ replication slots is only supported when the old cluster is version 17.0
+ or later. Logical replication slots on clusters before version 17.0 will
+ silently be ignored.
+

+   The new cluster must not have permanent logical replication slots, i.e.,

How about using "logical slots" in place of "logical replication
slots" to be more generic? We agreed and changed the function name to

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-25 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Based on your advice, I revised the patch again. 

> >
> > I spent some time on the v57 patch and it looks good to me - tests are
> > passing, no complaints from pgindent and pgperltidy. I turned the CF
> > entry https://commitfest.postgresql.org/45/4273/ to RfC.
> >
> 
> Thanks, the patch looks mostly good to me but I am not convinced of
> keeping the tests across versions in this form. I don't think they are
> tested in BF, only one can manually create a setup to test.

I analyzed and agreed that current BF client does not use TAP test framework
for cross-version checks.

> Shall we
> remove it for now and then consider it separately?

OK, some parts for cross-checks were removed.

> Apart from that, I have made minor modifications in the docs to adjust
> the order of various prerequisites.

Thanks, included.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v59-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v59-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-25 Thread Amit Kapila
On Tue, Oct 24, 2023 at 1:20 PM Bharath Rupireddy
 wrote:
>
>
> I spent some time on the v57 patch and it looks good to me - tests are
> passing, no complaints from pgindent and pgperltidy. I turned the CF
> entry https://commitfest.postgresql.org/45/4273/ to RfC.
>

Thanks, the patch looks mostly good to me but I am not convinced of
keeping the tests across versions in this form. I don't think they are
tested in BF, only one can manually create a setup to test. Shall we
remove it for now and then consider it separately?

Apart from that, I have made minor modifications in the docs to adjust
the order of various prerequisites.

-- 
With Regards,
Amit Kapila.


v58-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description: Binary data


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-24 Thread Bharath Rupireddy
On Tue, Oct 24, 2023 at 11:32 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> > If we don't want to keep it generic then we should use something like
> > 'contains_decodable_change'. 'is_change_decodable' could have suited
> > here if we were checking a particular change.
>
> I kept the name for now. How does Bharath think?

No more bikeshedding from my side. +1 for processing_required as-is.

> > > 6. An optimization in count_old_cluster_logical_slots(void): Turn
> > > slot_count to a function static variable so that the for loop isn't
> > > required every time because the slot count is prepared in
> > > get_old_cluster_logical_slot_infos only once and won't change later
> > > on. Do you see any problem with the following? This saves a few CPU
> > > cycles when there are large number of replication slots.
> > > {
> > > static int slot_count = 0;
> > > static bool first_time = true;
> > >
> > > if (first_time)
> > > {
> > > for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> > > slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
> > >
> > > first_time = false;
> > > }
> > >
> > > return slot_count;
> > > }
> > >
> >
> > This may not be a problem but this is also not a function that will be
> > used frequently. I am not sure if adding such code optimizations is
> > worth it.
>
> Not addressed.

count_old_cluster_logical_slots is being called 3 times during
pg_upgrade and every time counting number of slots for all the
databases seems redundant IMV especially given the fact that the slot
count is computed once at the beginning and never changes. When the
replication slots on the cluster are on the higher side, every time
counting *may* prove costly. And, the use of static variables isn't a
huge change requiring a different set of infra or as such, it's a
simple pattern.

Having said above, if others don't see a merit in it, I'm okay to
withdraw my comment.

> Below commands are an example of the test.
>
> ```
> # test PG9.5 -> patched HEAD
> $ oldinstall=/home/hayato/older/pg95 make check 
> PROVE_TESTS='t/003_upgrade_logical_replication_slots.pl'

Oh, I get it. Thanks.

> Also, based on a comment [2], the upgrade function was renamed to
> 'binary_upgrade_logical_slot_has_caught_up'.

+1.

I spent some time on the v57 patch and it looks good to me - tests are
passing, no complaints from pgindent and pgperltidy. I turned the CF
entry https://commitfest.postgresql.org/45/4273/ to RfC.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-24 Thread Hayato Kuroda (Fujitsu)
Dear Bharath, Amit,

Thanks for reviewing! PSA new version.
I addressed comments which have not been claimed.

> On Mon, Oct 23, 2023 at 2:00 PM Bharath Rupireddy
>  wrote:
> >
> > On Mon, Oct 23, 2023 at 11:10 AM Hayato Kuroda (Fujitsu)
> >  wrote:
> > >
> > > Thank you for reviewing! PSA new version.
> >
> > > > 6. A nit: how about is_decodable_txn or is_decodable_change or some
> > > > other instead of just a plain name processing_required?
> > > > +/* Do we need to process any change in 'fast_forward' mode? */
> > > > +boolprocessing_required;
> > >
> > > I preferred current one. Because not only decodable txn, non-txn change 
> > > and
> > > empty transactions also be processed.
> >
> > Right. It's not the txn, but the change. processing_required seems too
> > generic IMV. A nit: is_change_decodable or something?
> >
> 
> If we don't want to keep it generic then we should use something like
> 'contains_decodable_change'. 'is_change_decodable' could have suited
> here if we were checking a particular change.

I kept the name for now. How does Bharath think?

> > Thanks for the patch. Here are few comments on v56 patch:
> >
> > 1.
> > + *
> > + * Although this function is currently used only during pg_upgrade, there 
> > are
> > + * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
> >
> > This comment isn't required IMV, because anyone looking at the code
> > and callsites can understand it.

Removed.

> > 2. A nit: IMV "This is a special purpose ..." statement seems redundant.
> > + *
> > + * This is a special purpose function to ensure that the given slot can be
> > + * upgraded without data loss.
> >
> > How about
> >
> > Verify that the given replication slot has consumed all the WAL changes.
> > If there's any decodable WAL record after the slot's
> > confirmed_flush_lsn, the slot's consumer will lose that data after the
> > slot is upgraded.
> > Returns true if there are no decodable WAL records after the
> > confirmed_flush_lsn. Otherwise false.
> >
> 
> Personally, I find the current comment succinct and clear.

I kept current one.

> > 3.
> > +if (PG_ARGISNULL(0))
> > +elog(ERROR, "null argument to
> > binary_upgrade_validate_wal_records is not allowed");
> >
> > I can see the above style is referenced from
> > binary_upgrade_create_empty_extension, but IMV the following looks
> > better and latest (ereport is new style than elog)
> >
> > ereport(ERROR,
> > (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >  errmsg("replication slot name cannot be null")));
> >
> 
> Do you have any theory for making elog to ereport? I am not completely
> sure but as this and related function is used internally, so using
> elog seems reasonable. Also, I find keeping it consistent with the
> existing error message is also reasonable. We can change both later
> together if we get a broader agreement.

I kept current style. elog() was used here because I regarded it as
"cannot happen" error. According to the doc [1], elog() is still used
for the purpose.

> > 4. The following comment seems frivolous, the code tells it all.
> > Please remove the comment.
> > +
> > +/* No need to check this slot, seek to new one */
> > +continue;

Removed.

> > 5. A typo - s/gets/Gets
> > + * gets the LogicalSlotInfos for all the logical replication slots of the

Replaced.

> > 6. An optimization in count_old_cluster_logical_slots(void): Turn
> > slot_count to a function static variable so that the for loop isn't
> > required every time because the slot count is prepared in
> > get_old_cluster_logical_slot_infos only once and won't change later
> > on. Do you see any problem with the following? This saves a few CPU
> > cycles when there are large number of replication slots.
> > {
> > static int slot_count = 0;
> > static bool first_time = true;
> >
> > if (first_time)
> > {
> > for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> > slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
> >
> > first_time = false;
> > }
> >
> > return slot_count;
> > }
> >
> 
> This may not be a problem but this is also not a function that will be
> used frequently. I am not sure if adding such code optimizations is
> worth it.

Not addressed.

> > 7. A typo: s/slotname/slot name. "slot name" looks better in user
> > visible messages.
> > +pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\",
> two_phase: %s",
> >
> 
> If we want to follow other parameters then we can even use slot_name.

Changed to slot_name.

Below part is replies for remained comments:

>8.
>+else
>+{
>+test_upgrade_from_pre_PG17($old_publisher, $new_publisher,
>+@pg_upgrade_cmd);
>+}
>Will this ever be tested in current TAP test framework? I mean, will
>the TAP test framework allow testing upgrades from one PG version to
>another PG version?

Yes, the TAP tester allow to do 

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-23 Thread Amit Kapila
On Sat, Oct 21, 2023 at 5:41 AM Bharath Rupireddy
 wrote:
>
> On Fri, Oct 20, 2023 at 8:51 PM Zhijie Hou (Fujitsu)
>  wrote:

>
> 9. IMO, binary_upgrade_logical_replication_slot_has_caught_up seems
> better, meaningful and consistent despite a bit long than just
> binary_upgrade_slot_has_caught_up.
>

I think logical_replication is specific to our pub-sub model but we
can have manually created slots as well. So, it would be better to
name it as binary_upgrade_logical_slot_has_caught_up().

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-23 Thread Amit Kapila
On Mon, Oct 23, 2023 at 2:00 PM Bharath Rupireddy
 wrote:
>
> On Mon, Oct 23, 2023 at 11:10 AM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Thank you for reviewing! PSA new version.
>
> > > 6. A nit: how about is_decodable_txn or is_decodable_change or some
> > > other instead of just a plain name processing_required?
> > > +/* Do we need to process any change in 'fast_forward' mode? */
> > > +boolprocessing_required;
> >
> > I preferred current one. Because not only decodable txn, non-txn change and
> > empty transactions also be processed.
>
> Right. It's not the txn, but the change. processing_required seems too
> generic IMV. A nit: is_change_decodable or something?
>

If we don't want to keep it generic then we should use something like
'contains_decodable_change'. 'is_change_decodable' could have suited
here if we were checking a particular change.

> Thanks for the patch. Here are few comments on v56 patch:
>
> 1.
> + *
> + * Although this function is currently used only during pg_upgrade, there are
> + * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
>
> This comment isn't required IMV, because anyone looking at the code
> and callsites can understand it.
>
> 2. A nit: IMV "This is a special purpose ..." statement seems redundant.
> + *
> + * This is a special purpose function to ensure that the given slot can be
> + * upgraded without data loss.
>
> How about
>
> Verify that the given replication slot has consumed all the WAL changes.
> If there's any decodable WAL record after the slot's
> confirmed_flush_lsn, the slot's consumer will lose that data after the
> slot is upgraded.
> Returns true if there are no decodable WAL records after the
> confirmed_flush_lsn. Otherwise false.
>

Personally, I find the current comment succinct and clear.

> 3.
> +if (PG_ARGISNULL(0))
> +elog(ERROR, "null argument to
> binary_upgrade_validate_wal_records is not allowed");
>
> I can see the above style is referenced from
> binary_upgrade_create_empty_extension, but IMV the following looks
> better and latest (ereport is new style than elog)
>
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>  errmsg("replication slot name cannot be null")));
>

Do you have any theory for making elog to ereport? I am not completely
sure but as this and related function is used internally, so using
elog seems reasonable. Also, I find keeping it consistent with the
existing error message is also reasonable. We can change both later
together if we get a broader agreement.

> 4. The following comment seems frivolous, the code tells it all.
> Please remove the comment.
> +
> +/* No need to check this slot, seek to new one */
> +continue;
>
> 5. A typo - s/gets/Gets
> + * gets the LogicalSlotInfos for all the logical replication slots of the
>
> 6. An optimization in count_old_cluster_logical_slots(void): Turn
> slot_count to a function static variable so that the for loop isn't
> required every time because the slot count is prepared in
> get_old_cluster_logical_slot_infos only once and won't change later
> on. Do you see any problem with the following? This saves a few CPU
> cycles when there are large number of replication slots.
> {
> static int slot_count = 0;
> static bool first_time = true;
>
> if (first_time)
> {
> for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
>
> first_time = false;
> }
>
> return slot_count;
> }
>

This may not be a problem but this is also not a function that will be
used frequently. I am not sure if adding such code optimizations is
worth it.

> 7. A typo: s/slotname/slot name. "slot name" looks better in user
> visible messages.
> +pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %s",
>

If we want to follow other parameters then we can even use slot_name.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-23 Thread Bharath Rupireddy
On Mon, Oct 23, 2023 at 11:10 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Thank you for reviewing! PSA new version.

> > 6. A nit: how about is_decodable_txn or is_decodable_change or some
> > other instead of just a plain name processing_required?
> > +/* Do we need to process any change in 'fast_forward' mode? */
> > +boolprocessing_required;
>
> I preferred current one. Because not only decodable txn, non-txn change and
> empty transactions also be processed.

Right. It's not the txn, but the change. processing_required seems too
generic IMV. A nit: is_change_decodable or something?

Thanks for the patch. Here are few comments on v56 patch:

1.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.

This comment isn't required IMV, because anyone looking at the code
and callsites can understand it.

2. A nit: IMV "This is a special purpose ..." statement seems redundant.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.

How about

Verify that the given replication slot has consumed all the WAL changes.
If there's any decodable WAL record after the slot's
confirmed_flush_lsn, the slot's consumer will lose that data after the
slot is upgraded.
Returns true if there are no decodable WAL records after the
confirmed_flush_lsn. Otherwise false.

3.
+if (PG_ARGISNULL(0))
+elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

I can see the above style is referenced from
binary_upgrade_create_empty_extension, but IMV the following looks
better and latest (ereport is new style than elog)

ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("replication slot name cannot be null")));

4. The following comment seems frivolous, the code tells it all.
Please remove the comment.
+
+/* No need to check this slot, seek to new one */
+continue;

5. A typo - s/gets/Gets
+ * gets the LogicalSlotInfos for all the logical replication slots of the

6. An optimization in count_old_cluster_logical_slots(void): Turn
slot_count to a function static variable so that the for loop isn't
required every time because the slot count is prepared in
get_old_cluster_logical_slot_infos only once and won't change later
on. Do you see any problem with the following? This saves a few CPU
cycles when there are large number of replication slots.
{
static int slot_count = 0;
static bool first_time = true;

if (first_time)
{
for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;

first_time = false;
}

return slot_count;
}

7. A typo: s/slotname/slot name. "slot name" looks better in user
visible messages.
+pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %s",

8.
+else
+{
+test_upgrade_from_pre_PG17($old_publisher, $new_publisher,
+@pg_upgrade_cmd);
+}
Will this ever be tested in current TAP test framework? I mean, will
the TAP test framework allow testing upgrades from one PG version to
another PG version?

9. A nit: Can single quotes around variable names in the comments be
removed just to be consistent?
+ * We also skip decoding in 'fast_forward' mode. This check must be last
+/* Do we need to process any change in 'fast_forward' mode? */

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-22 Thread Hayato Kuroda (Fujitsu)
Dear Bharath,

Thank you for reviewing! PSA new version.

> 1. A nit:
> +
> +/*
> + * We also skip decoding in 'fast_forward' mode. In passing set the
> + * 'processing_required' flag to indicate, were it not for this mode,
> + * processing *would* have been required.
> + */
> How about "We also skip decoding in fast_forward mode. In passing set
> the processing_required flag to indicate that if it were not for
> fast_forward mode, processing would have been required."?

Fixed.

> 2. Don't we need InvalidateSystemCaches() after FreeDecodingContext()?
> 
> +/* Clean up */
> +FreeDecodingContext(ctx);

Right. Older system caches should be thrown away here for upcoming pg_dump.

> 3. Don't we need to put CreateDecodingContext in PG_TRY-PG_CATCH with
> InvalidateSystemCaches() in PG_CATCH block? I think we need to clear
> all timetravel entries with InvalidateSystemCaches(), no?

Added.

> 4. The following assertion better be an error? Or we ensure that
> binary_upgrade_slot_has_caught_up isn't called for an invalidated slot
> at all?
> +
> +/* Slots must be valid as otherwise we won't be able to scan the WAL */
> +Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);

I kept the Assert() because pg_upgrade won't call this function for invalidated
slots.

> 5. This better be an error instead of returning false? IMO, null value
> for slot name is an error.
> +/* Quick exit if the input is NULL */
> +if (PG_ARGISNULL(0))
> +PG_RETURN_BOOL(false);

Hmm, OK, changed to elog(ERROR).
If current style is kept and NULL were to input, an empty string may be reported
as slotname in invalid_logical_replication_slots.txt. It is quite strange. Note
again that it won't be expected.

> 6. A nit: how about is_decodable_txn or is_decodable_change or some
> other instead of just a plain name processing_required?
> +/* Do we need to process any change in 'fast_forward' mode? */
> +boolprocessing_required;

I preferred current one. Because not only decodable txn, non-txn change and
empty transactions also be processed.

> 7. Can the following pg_fatal message be consistent and start with
> lowercase letter something like "expected 0 logical replication slots
> "?
> +pg_fatal("Expected 0 logical replication slots but found %d.",
> + nslots_on_new);

Note that the Upper/Lower case rule has been broken in this file. Lower case was
used here because I regarded this sentence as hint message. Please see previous
posts [1] [2].


> 8. s/problem/problematic - "A list of problematic slots is in the file:\n"
> + "A list of the problem slots is in the file:\n"

Fixed.

> 9. IMO, binary_upgrade_logical_replication_slot_has_caught_up seems
> better, meaningful and consistent despite a bit long than just
> binary_upgrade_slot_has_caught_up.

Fixed.

> 10. How about an assert that the passed-in replication slot is logical
> in binary_upgrade_slot_has_caught_up?

Fixed.

> 11. How about adding CheckLogicalDecodingRequirements too in
> binary_upgrade_slot_has_caught_up after CheckSlotPermissions just in
> case?

Not added. CheckLogicalDecodingRequirements() ensures that WALs can be decodable
and the changes can be applied, but both of them are not needed for fast_forward
mode. Also, pre-existing function pg_logical_replication_slot_advance() does not
call it.

> 12. Not necessary but adding ReplicationSlotValidateName(slot_name,
> ERROR); for the passed-in slotname in
> binary_upgrade_slot_has_caught_up may be a good idea, at least in
> assert builds to help with input validations.

Not added because ReplicationSlotAcquire() can report even if invalid name is
added. Also, pre-existing function pg_logical_replication_slot_advance() does 
not
call it.

> 13. Can the functionality of LogicalReplicationSlotHasPendingWal be
> moved to binary_upgrade_slot_has_caught_up and get rid of a separate
> function LogicalReplicationSlotHasPendingWal? Or is it that the
> function exists in logical.c to avoid extra dependencies between
> logical.c and pg_upgrade_support.c?

I kept current style. I think upgrade functions should be short so that actual
tasks should be done in other place. SetAttrMissing() is called only from an
upgrading function, so we do not have a policy to avoid deviding function.
Also, LogicalDecodingProcessRecord() is called from only files in 
src/backend/replication,
so we can keep them.

> 14. I think it's better to check if the old cluster contains the
> necessary function binary_upgrade_slot_has_caught_up instead of just
> relying on major version.
> +/* Logical slots can be migrated since PG17. */
> +if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> +return;

I kept current style because I could not find a merit for the approach. If the
patch is committed PG17.X surely has 
binary_upgrade_logical_replication_slot_has_caught_up().
Also, other upgrading function are not checked from the 

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-20 Thread Bharath Rupireddy
On Fri, Oct 20, 2023 at 8:51 PM Zhijie Hou (Fujitsu)
 wrote:
>
> Attach the new version patch.

Thanks. Here are some comments on v55 patch:

1. A nit:
+
+/*
+ * We also skip decoding in 'fast_forward' mode. In passing set the
+ * 'processing_required' flag to indicate, were it not for this mode,
+ * processing *would* have been required.
+ */
How about "We also skip decoding in fast_forward mode. In passing set
the processing_required flag to indicate that if it were not for
fast_forward mode, processing would have been required."?

2. Don't we need InvalidateSystemCaches() after FreeDecodingContext()?

+/* Clean up */
+FreeDecodingContext(ctx);

3. Don't we need to put CreateDecodingContext in PG_TRY-PG_CATCH with
InvalidateSystemCaches() in PG_CATCH block? I think we need to clear
all timetravel entries with InvalidateSystemCaches(), no?

4. The following assertion better be an error? Or we ensure that
binary_upgrade_slot_has_caught_up isn't called for an invalidated slot
at all?
+
+/* Slots must be valid as otherwise we won't be able to scan the WAL */
+Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);

5. This better be an error instead of returning false? IMO, null value
for slot name is an error.
+/* Quick exit if the input is NULL */
+if (PG_ARGISNULL(0))
+PG_RETURN_BOOL(false);

6. A nit: how about is_decodable_txn or is_decodable_change or some
other instead of just a plain name processing_required?
+/* Do we need to process any change in 'fast_forward' mode? */
+boolprocessing_required;

7. Can the following pg_fatal message be consistent and start with
lowercase letter something like "expected 0 logical replication slots
"?
+pg_fatal("Expected 0 logical replication slots but found %d.",
+ nslots_on_new);

8. s/problem/problematic - "A list of problematic slots is in the file:\n"
+ "A list of the problem slots is in the file:\n"

9. IMO, binary_upgrade_logical_replication_slot_has_caught_up seems
better, meaningful and consistent despite a bit long than just
binary_upgrade_slot_has_caught_up.

10. How about an assert that the passed-in replication slot is logical
in binary_upgrade_slot_has_caught_up?

11. How about adding CheckLogicalDecodingRequirements too in
binary_upgrade_slot_has_caught_up after CheckSlotPermissions just in
case?

12. Not necessary but adding ReplicationSlotValidateName(slot_name,
ERROR); for the passed-in slotname in
binary_upgrade_slot_has_caught_up may be a good idea, at least in
assert builds to help with input validations.

13. Can the functionality of LogicalReplicationSlotHasPendingWal be
moved to binary_upgrade_slot_has_caught_up and get rid of a separate
function LogicalReplicationSlotHasPendingWal? Or is it that the
function exists in logical.c to avoid extra dependencies between
logical.c and pg_upgrade_support.c?

14. I think it's better to check if the old cluster contains the
necessary function binary_upgrade_slot_has_caught_up instead of just
relying on major version.
+/* Logical slots can be migrated since PG17. */
+if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+return;

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-20 Thread Zhijie Hou (Fujitsu)
On Friday, October 20, 2023 11:24 AM vignesh C  wrote:
> 
> On Thu, 19 Oct 2023 at 16:16, Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Dear Vignesh,
> >
> > Thanks for revieing! New patch can be available in [1].
> >
> > > Few comments:
> > > 1) Even if we comment 3rd point "Emit a non-transactional message",
> > > test_slot2 still appears in the
> > > invalid_logical_replication_slots.txt
> > > file. There is something wrong here.
> > > +   # 2. Advance the slot test_slot2 up to the current WAL location,
> but
> > > +   #test_slot1 still has unconsumed WAL records.
> > > +   $old_publisher->safe_psql('postgres',
> > > +   "SELECT pg_replication_slot_advance('test_slot2',
> > > + NULL);");
> > > +
> > > +   # 3. Emit a non-transactional message. test_slot2 detects
> > > + the message
> > > so
> > > +   #that this slot will be also reported by upcoming
> pg_upgrade.
> > > +   $old_publisher->safe_psql('postgres',
> > > +   "SELECT count(*) FROM
> > > + pg_logical_emit_message('false',
> > > 'prefix', 'This is a non-transactional message');"
> > > +   );
> >
> > The comment was updated based on others. How do you think?
> 
> I mean if we comment or remove this statement like in the attached patch, the
> test is still passing with 'The slot "test_slot2" has not consumed the WAL 
> yet', in
> this case should the test_slot2 be still invalid as we have called
> pg_replication_slot_advance for test_slot2.

It's because we pass NULL to pg_replication_slot_advance(). We should pass 
pg_current_wal_lsn() instead. I have fixed it in V55 version.

Best Regards,
Hou zj


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-20 Thread Zhijie Hou (Fujitsu)
On Friday, October 20, 2023 9:50 AM Peter Smith  wrote:
> 
> Here are some review comments for v54-0001

Thanks for the review.

> 
> ==
> src/backend/replication/slot.c
> 
> 1.
> + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade) {
> + ereport(ERROR, errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("replication slots must not be invalidated during the
> + upgrade"), errhint("\"max_slot_wal_keep_size\" must not be set to -1
> + during the
> upgrade"));
> + }
> 
> This new error is replacing the old code:
> + Assert(max_slot_wal_keep_size_mb == -1);
> 
> Is that errhint correct? Shouldn't it say "must" instead of "must not"?

Fixed.

> 
> ==
> src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> 
> 2. General formating
> 
> Some of the "]);" formatting and indenting for the multiple SQL commands is
> inconsistent.
> 
> For example,
> 
> + $old_publisher->safe_psql(
> + 'postgres', qq[
> + SELECT pg_create_logical_replication_slot('test_slot1',
> + 'test_decoding'); SELECT
> + pg_create_logical_replication_slot('test_slot2', 'test_decoding'); ]
> + );
> 
> versus
> 
> + $old_publisher->safe_psql(
> + 'postgres', qq[
> + CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a; SELECT
> + pg_replication_slot_advance('test_slot2', NULL); SELECT count(*) FROM
> + pg_logical_emit_message('false', 'prefix',
> 'This is a non-transactional message');
> + ]);
> 

Fixed.

> ~~~
> 
> 3.
> +# Set up some settings for the old cluster, so that we can ensures that
> +initdb # will be done.
> +my @initdb_params = ();
> +push @initdb_params, ('--encoding', 'UTF-8'); push @initdb_params,
> +('--locale', 'C'); $node_params{extra} = \@initdb_params;
> +
> +$old_publisher->init(%node_params);
> 
> Why would initdb not be done if these were not set? I didn't understand the
> comment.
> 
> /so that we can ensures/to ensure/

The node->init() will use a previously initialized cluster if no parameter was
specified, but that cluster could be of wrong version when doing cross-version
test, so we set something to let the initdb happen.

I added some explanation in the comment.

> ~~~
> 
> 4.
> +# XXX: For PG9.6 and prior, the TAP Cluster.pm assigns
> +'max_wal_senders' and # 'max_connections' to the same value (10). But
> +these versions considered # max_wal_senders as a subset of
> +max_connections, so setting the same value # will fail. This adjustment
> +will not be needed when packages for older #versions are defined.
> +if ($old_publisher->pg_version->major <= 9.6) {
> +$old_publisher->append_conf(  'postgresql.conf', qq[  max_wal_senders =
> +5  max_connections = 10  ]); }
> 
> 4a.
> IMO remove the complicated comment trying to explain the problem and just
> to unconditionally set the values you want.
> 
> SUGGESTION#1
> # Older PG version had different rules for the inter-dependency of
> 'max_wal_senders' and 'max_connections', # so assign values which will work
> for all PG versions.
> $old_publisher->append_conf(
>   'postgresql.conf', qq[
>   max_wal_senders = 5
>   max_connections = 10
>   ]);
> 
> ~~

As Kuroda-san mentioned, we may fix Cluster.pm later, so I kept the XXX comment
but simplify it based on your suggestion.

Attach the new version patch.

Best Regards,
Hou zj


v55-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v55-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-19 Thread vignesh C
On Thu, 19 Oct 2023 at 16:16, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Vignesh,
>
> Thanks for revieing! New patch can be available in [1].
>
> > Few comments:
> > 1) Even if we comment 3rd point "Emit a non-transactional message",
> > test_slot2 still appears in the invalid_logical_replication_slots.txt
> > file. There is something wrong here.
> > +   # 2. Advance the slot test_slot2 up to the current WAL location, but
> > +   #test_slot1 still has unconsumed WAL records.
> > +   $old_publisher->safe_psql('postgres',
> > +   "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> > +
> > +   # 3. Emit a non-transactional message. test_slot2 detects the 
> > message
> > so
> > +   #that this slot will be also reported by upcoming 
> > pg_upgrade.
> > +   $old_publisher->safe_psql('postgres',
> > +   "SELECT count(*) FROM pg_logical_emit_message('false',
> > 'prefix', 'This is a non-transactional message');"
> > +   );
>
> The comment was updated based on others. How do you think?

I mean if we comment or remove this statement like in the attached
patch, the test is still passing with 'The slot "test_slot2" has not
consumed the WAL yet', in this case should the test_slot2 be still
invalid as we have called pg_replication_slot_advance for test_slot2.

Regards,
Vignesh
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
index c6be7d5d9e..49720cc66d 100644
--- a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -74,7 +74,6 @@ sub test_upgrade_from_PG17_and_later
 		'postgres', qq[
 			CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
 			SELECT pg_replication_slot_advance('test_slot2', NULL);
-			SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
 		]);
 	$old_publisher->stop;
 


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-19 Thread vignesh C
On Thu, 19 Oct 2023 at 16:14, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Vignesh,
>
> Thanks for reviewing! New patch can be available in [1].
>
> >
> > Few comments:
> > 1) We will be able to override the value of max_slot_wal_keep_size by
> > using --new-options like '--new-options  "-c
> > max_slot_wal_keep_size=val"':
> > +   /*
> > +* Use max_slot_wal_keep_size as -1 to prevent the WAL removal by 
> > the
> > +* checkpointer process.  If WALs required by logical replication 
> > slots
> > +* are removed, the slots are unusable.  This setting prevents the
> > +* invalidation of slots during the upgrade. We set this option when
> > +* cluster is PG17 or later because logical replication slots
> > can only be
> > +* migrated since then. Besides, max_slot_wal_keep_size is
> > added in PG13.
> > +*/
> > +   if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
> > +   appendPQExpBufferStr(, " -c
> > max_slot_wal_keep_size=-1");
> >
> > Should there be a check to throw an error if this option is specified
> > or do we need some documentation that this option should not be
> > specified?
>
> Hmm, I don't think we have to add checks. Other settings, like 
> synchronous_commit
> and fsync, can be also overwritten, but pg_upgrade has never checked. 
> Therefore,
> it's user's responsibility to not set max_slot_wal_keep_size to a dangerous
> value.
>
> > 2) Because we are able to override max_slot_wal_keep_size there is a
> > chance of slot getting invalidated and Assert being hit:
> > +   /*
> > +* The logical replication slots shouldn't be invalidated as
> > +* max_slot_wal_keep_size GUC is set to -1 during the
> > upgrade.
> > +*
> > +* The following is just a sanity check.
> > +*/
> > +   if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> > +   {
> > +   Assert(max_slot_wal_keep_size_mb == -1);
> > +   elog(ERROR, "replication slots must not be
> > invalidated during the upgrade");
> > +   }
>
> Hmm, so how about removing an assert and changing the error message more
> appropriate? I still think it seldom occurs.

As this scenario can occur by overriding max_slot_wal_keep_size, it is
better to remove the Assert.

Regards,
Vignesh




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-19 Thread Peter Smith
Here are some review comments for v54-0001

==
src/backend/replication/slot.c

1.
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ {
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("replication slots must not be invalidated during the upgrade"),
+ errhint("\"max_slot_wal_keep_size\" must not be set to -1 during the
upgrade"));
+ }

This new error is replacing the old code:
+ Assert(max_slot_wal_keep_size_mb == -1);

Is that errhint correct? Shouldn't it say "must" instead of "must not"?

==
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

2. General formating

Some of the "]);" formatting and indenting for the multiple SQL
commands is inconsistent.

For example,

+ $old_publisher->safe_psql(
+ 'postgres', qq[
+ SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding');
+ SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding');
+ ]
+ );

versus

+ $old_publisher->safe_psql(
+ 'postgres', qq[
+ CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+ SELECT pg_replication_slot_advance('test_slot2', NULL);
+ SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');
+ ]);

~~~

3.
+# Set up some settings for the old cluster, so that we can ensures that initdb
+# will be done.
+my @initdb_params = ();
+push @initdb_params, ('--encoding', 'UTF-8');
+push @initdb_params, ('--locale', 'C');
+$node_params{extra} = \@initdb_params;
+
+$old_publisher->init(%node_params);

Why would initdb not be done if these were not set? I didn't
understand the comment.

/so that we can ensures/to ensure/

~~~

4.
+# XXX: For PG9.6 and prior, the TAP Cluster.pm assigns 'max_wal_senders' and
+# 'max_connections' to the same value (10). But these versions considered
+# max_wal_senders as a subset of max_connections, so setting the same value
+# will fail. This adjustment will not be needed when packages for older
+#versions are defined.
+if ($old_publisher->pg_version->major <= 9.6)
+{
+ $old_publisher->append_conf(
+ 'postgresql.conf', qq[
+ max_wal_senders = 5
+ max_connections = 10
+ ]);
+}

4a.
IMO remove the complicated comment trying to explain the problem and
just to unconditionally set the values you want.

SUGGESTION#1
# Older PG version had different rules for the inter-dependency of
'max_wal_senders' and 'max_connections',
# so assign values which will work for all PG versions.
$old_publisher->append_conf(
  'postgresql.conf', qq[
  max_wal_senders = 5
  max_connections = 10
  ]);

~~

4b.
If you really want to put special code here then I think the comment
needs to be more descriptive like below. IMO this suggestion is
overkill, #4a above is much simpler.

SUGGESTION#2
# Versions prior to PG12 considered max_walsenders as a subset
max_connections, so setting the same value will fail.
#
# The TAP Cluster.pm assigns default 'max_wal_senders' and
'max_connections' as follows:
# PG_11:  'max_wal_senders=5' and 'max_connections=10'
# PG_10:  'max_wal_senders=5' and 'max_connections=10'
# Everything else: 'max_wal_senders=10' and 'max_connections=10'
#
# The following code is needed to make adjustments for versions not
already being handled by Cluster.pm.

~

4c.
Alternatively, make necessary adjustments in the Cluster.pm to set
appropriate defaults for all older versions. Then probably you can
remove all this code entirely.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-19 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

> Thanks for reviewing! PSA new version.

Hmm. The cfbot got angry, whereas it can pass on my machine.
It seems that the ordering in invalid_logical_replication_slots.txt is not 
fixed.

A change for checking the content was reverted. It could pass on my CI.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v54-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v54-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-19 Thread Hayato Kuroda (Fujitsu)
Dear Vignesh,

Thanks for revieing! New patch can be available in [1].

> Few comments:
> 1) Even if we comment 3rd point "Emit a non-transactional message",
> test_slot2 still appears in the invalid_logical_replication_slots.txt
> file. There is something wrong here.
> +   # 2. Advance the slot test_slot2 up to the current WAL location, but
> +   #test_slot1 still has unconsumed WAL records.
> +   $old_publisher->safe_psql('postgres',
> +   "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> +
> +   # 3. Emit a non-transactional message. test_slot2 detects the message
> so
> +   #that this slot will be also reported by upcoming pg_upgrade.
> +   $old_publisher->safe_psql('postgres',
> +   "SELECT count(*) FROM pg_logical_emit_message('false',
> 'prefix', 'This is a non-transactional message');"
> +   );

The comment was updated based on others. How do you think?

> 2) If the test fails here, it is difficult to debug as the
> pg_upgrade_output.d directory was removed, so better to keep the
> directory as it is this case:
> +   # Check the file content. Both slots should be reporting that they 
> have
> +   # unconsumed WAL records.
> +   like(
> +   slurp_file($slots_filename),
> +   qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
> +   'the previous test failed due to unconsumed WALs');
> +   like(
> +   slurp_file($slots_filename),
> +   qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
> +   'the previous test failed due to unconsumed WALs');
> +
> +   # Clean up
> +   rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");

Right. Current style just follows the 002 test. I removed rmtree().

> 3) The below could be changed:
> +   # Check the file content. Both slots should be reporting that they 
> have
> +   # unconsumed WAL records.
> +   like(
> +   slurp_file($slots_filename),
> +   qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
> +   'the previous test failed due to unconsumed WALs');
> +   like(
> +   slurp_file($slots_filename),
> +   qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
> +   'the previous test failed due to unconsumed WALs');
> 
> to:
> my $result = slurp_file($slots_filename);
> is( $result, qq(The slot "test_slot1" has not consumed the WAL yet
> The slot "test_slot2" has not consumed the WAL yet
> ),
> 'the previous test failed due to unconsumed WALs');
>

Replaced, but the formatting seems not good. I wanted to hear opinions from 
others.

[1]: 
https://www.postgresql.org/message-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-19 Thread Hayato Kuroda (Fujitsu)
Dear Shlok,

> 
> I have tested the above scenario. We are able to override the
> max_slot_wal_keep_size by using  '--new-options  "-c
> max_slot_wal_keep_size=val"'. And also with some insert statements
> during pg_upgrade, old WAL file were deleted and logical replication
> slots were invalidated. Since the slots were invalidated replication
> was not happening after the upgrade.

Yeah, theoretically it could be overwritten, but I still think we do not have to
guard. Also, connections must not be established during the upgrade [1].
I improved the ereport() message in the new patch[2]. How do you think?

[1]: https://www.postgresql.org/message-id/ZNZ4AxUMIrnMgRbo%40momjian.us
[2]: 
https://www.postgresql.org/message-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-19 Thread Hayato Kuroda (Fujitsu)
Dear Hou,

Thanks for reviewing! New patch can be available in [1].

> Thanks for updating the patch, here are few comments for the test.
> 
> 1.
> 
> >
> # The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections'
> to
> # the same value (10) but PG12 and prior considered max_walsenders as a subset
> # of max_connections, so setting the same value will fail.
> if ($old_publisher->pg_version->major < 12)
> {
>   $old_publisher->append_conf(
>   'postgresql.conf', qq[
>   max_wal_senders = 5
>   max_connections = 10
>   ]);
> >
> 
> I think we already set max_wal_senders to 5 in init() function(in Cluster.pm),
> so is this necessary ? And 002_pg_upgrade.pl doesn't seems set this.

I thought you mentioned about Cluster::V_11::init(). I analyzed based on that 
and
found a fault. Could you please check [1]?

> 2.
> 
>   SELECT pg_create_logical_replication_slot('test_slot1',
> 'test_decoding', false, true);
>   SELECT pg_create_logical_replication_slot('test_slot2',
> 'test_decoding', false, true);
> 
> I think we don't need to set the last two parameters here as we don't check
> these info in the tests.

Removed.

> 3.
> 
> # Set extra params if cross-version checks are required. This is needed to
> # avoid using previously initdb'd cluster
> if (defined($ENV{oldinstall}))
> {
>   my @initdb_params = ();
>   push @initdb_params, ('--encoding', 'UTF-8');
>   push @initdb_params, ('--locale', 'C');
> 
> I am not sure I understand the comment, would it be possible provide a bit 
> more
> explanation about the purpose of this setting ? And I see 002_pg_upgrade 
> always
> have these setting even if oldinstall is not defined, so shall we follow the
> same ?

Fixed.
Actually settings are not needed for new cluster, but seems better to follow 
002.

> 4.
> 
> + command_ok(
> + [
> + 'pg_upgrade', '--no-sync',
> + '-d', $old_publisher->data_dir,
> + '-D', $new_publisher->data_dir,
> + '-b', $oldbindir,
> + '-B', $newbindir,
> + '-s', $new_publisher->host,
> + '-p', $old_publisher->port,
> + '-P', $new_publisher->port,
> + $mode,
> + ],
> 
> I think all the pg_upgrade commands in the test are the same, so we can save 
> the
> cmd
> in a variable and pass them to command_xx(). I think it can save some effort 
> to
> check the difference of each command and can also reduce some codes.

Fixed.

[1]: 
https://www.postgresql.org/message-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-19 Thread Hayato Kuroda (Fujitsu)
Dear Vignesh,

Thanks for reviewing! New patch can be available in [1].

> 
> Few comments:
> 1) We will be able to override the value of max_slot_wal_keep_size by
> using --new-options like '--new-options  "-c
> max_slot_wal_keep_size=val"':
> +   /*
> +* Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
> +* checkpointer process.  If WALs required by logical replication 
> slots
> +* are removed, the slots are unusable.  This setting prevents the
> +* invalidation of slots during the upgrade. We set this option when
> +* cluster is PG17 or later because logical replication slots
> can only be
> +* migrated since then. Besides, max_slot_wal_keep_size is
> added in PG13.
> +*/
> +   if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
> +   appendPQExpBufferStr(, " -c
> max_slot_wal_keep_size=-1");
> 
> Should there be a check to throw an error if this option is specified
> or do we need some documentation that this option should not be
> specified?

Hmm, I don't think we have to add checks. Other settings, like 
synchronous_commit
and fsync, can be also overwritten, but pg_upgrade has never checked. Therefore,
it's user's responsibility to not set max_slot_wal_keep_size to a dangerous
value.

> 2) Because we are able to override max_slot_wal_keep_size there is a
> chance of slot getting invalidated and Assert being hit:
> +   /*
> +* The logical replication slots shouldn't be invalidated as
> +* max_slot_wal_keep_size GUC is set to -1 during the
> upgrade.
> +*
> +* The following is just a sanity check.
> +*/
> +   if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> +   {
> +   Assert(max_slot_wal_keep_size_mb == -1);
> +   elog(ERROR, "replication slots must not be
> invalidated during the upgrade");
> +   }

Hmm, so how about removing an assert and changing the error message more
appropriate? I still think it seldom occurs.

> 3) File 003_logical_replication_slots.pl is now changed to
> 003_upgrade_logical_replication_slots.pl, it should be change here too
> accordingly:
> index 5834513add..815d1a7ca1 100644
> --- a/src/bin/pg_upgrade/Makefile
> +++ b/src/bin/pg_upgrade/Makefile
> @@ -3,6 +3,9 @@
>  PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
>  PGAPPICON = win32
> 
> +# required for 003_logical_replication_slots.pl
> +EXTRA_INSTALL=contrib/test_decoding
> +

Fixed.

[1]: 
https://www.postgresql.org/message-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-19 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thanks for reviewing! PSA new version.

> ==
> src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> 
> 1.
> + # 2. max_replication_slots is set to smaller than the number of slots (2)
> + # present on the old cluster
> 
> SUGGESTION
> 2. Set 'max_replication_slots' to be less than the number of slots (2)
> present on the old cluster.

Fixed.

> 2.
> + # Set max_replication_slots to the same value as the number of slots. Both
> + # of slots will be used for subsequent tests.
> 
> SUGGESTION
> Set 'max_replication_slots' to match the number of slots (2) present
> on the old cluster.
> Both slots will be used for subsequent tests.

Fixed.

> 
> 3.
> + # 3. Emit a non-transactional message. test_slot2 detects the message so
> + # that this slot will be also reported by upcoming pg_upgrade.
> + $old_publisher->safe_psql('postgres',
> + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> 'This is a non-transactional message');"
> + );
> 
> SUGGESTION
> 3. Emit a non-transactional message. This will cause test_slot2 to
> detect the unconsumed WAL record.

Fixed.

> 
> 4.
> + # Preparations for the subsequent test:
> + # 1. Generate extra WAL records. At this point neither test_slot1 nor
> + # test_slot2 has consumed them.
> + $old_publisher->start;
> + $old_publisher->safe_psql('postgres',
> + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> +
> + # 2. Advance the slot test_slot2 up to the current WAL location, but
> + # test_slot1 still has unconsumed WAL records.
> + $old_publisher->safe_psql('postgres',
> + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> +
> + # 3. Emit a non-transactional message. test_slot2 detects the message so
> + # that this slot will be also reported by upcoming pg_upgrade.
> + $old_publisher->safe_psql('postgres',
> + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> 'This is a non-transactional message');"
> + );
> +
> + $old_publisher->stop;
> 
> All of the above are sequentially executed on the
> old_publisher->safe_psql, so consider if it is worth combining them
> all in a single call (keeping the comments 1,2,3 separate still)
> 
> For example,
> 
> $old_publisher->start;
> $old_publisher->safe_psql('postgres', qq[
>   CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
>   SELECT pg_replication_slot_advance('test_slot2', NULL);
>   SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> 'This is a non-transactional message');
> ]);
> $old_publisher->stop;

Fixed.

> 
> 5.
> + # Clean up
> + $subscriber->stop();
> + $new_publisher->stop();
> 
> Should this also drop the 'test_slot1' and 'test_slot2'?

'test_slot1' and 'test_slot2' have already been removed while preparing in
"Successful upgrade" case. Also, I don't think objects have to be removed at the
end. It is tested by other parts, and it may make the test more difficult to
debug, if there are some failures.

> 6.
> +# Verify that logical replication slots cannot be migrated.  This function
> +# will be executed when the old cluster is PG16 and prior.
> +sub test_upgrade_from_pre_PG17
> +{
> + my ($old_publisher, $new_publisher, $mode) = @_;
> +
> + my $oldbindir = $old_publisher->config_data('--bindir');
> + my $newbindir = $new_publisher->config_data('--bindir');
> 
> SUGGESTION (let's not mention lots of different numbers; just refer to 17)
> This function will be executed when the old cluster version is prior to PG17.

Fixed.


> 7.
> + # Actual run, successful upgrade is expected
> + command_ok(
> + [
> + 'pg_upgrade', '--no-sync',
> + '-d', $old_publisher->data_dir,
> + '-D', $new_publisher->data_dir,
> + '-b', $oldbindir,
> + '-B', $newbindir,
> + '-s', $new_publisher->host,
> + '-p', $old_publisher->port,
> + '-P', $new_publisher->port,
> + $mode,
> + ],
> + 'run of pg_upgrade of old cluster');
> +
> + ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
> + "pg_upgrade_output.d/ removed after pg_upgrade success");
> 
> 7a.
> The comment is wrong?
> 
> SUGGESTION
> # pg_upgrade should NOT be successful

No, pg_uprade will success but no logical replication slots are migrated.
Comments docs were added.

> 7b.
> There is a blank line here before the ok() function, but in the other
> tests, there was none. Better to be consistent.

Removed.

> 8.
> + # Clean up
> + $new_publisher->stop();
> 
> Should this also drop the 'test_slot'?

I don't think so. Please see above.

> 
> 9.
> +# The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections'
> to
> +# the same value (10) but PG12 and prior considered max_walsenders as a
> subset
> +# of max_connections, so setting the same value will fail.
> +if ($old_publisher->pg_version->major < 12)
> +{
> + $old_publisher->append_conf(
> + 'postgresql.conf', qq[
> + max_wal_senders = 5
> + max_connections = 10
> + ]);
> +}
> 
> If the comment is correct, then PG12 *and* prior, should be testing
> "<= 12", not "< 12". right?

I analyzed more and I was wrong - we 

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-19 Thread Hayato Kuroda (Fujitsu)
Dear Shlok,

Thanks for testing the feature!

> 
> I tested a test scenario:
> I started a new publisher with 'max_replication_slots' parameter set
> to '1' and created a streaming replication with the new publisher as
> primary node.

Just to confirm what you did - you set up a physical replication and the
target of pg_upgrade was set to the primary, right?

I think we can assume that new cluster (target of pg_upgrade) is not used yet.
The documentation describes the usage [1] and it says that we must initialize
the cluster (at step 4) and then run the pg_upgrade (at step 10).

Therefore I don't think we should document anything about it.

[1]: 
https://www.postgresql.org/docs/devel/pgupgrade.html#:~:text=Initialize%20the%20new%20PostgreSQL%20cluster

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-19 Thread Shlok Kyal
I tested a test scenario:
I started a new publisher with 'max_replication_slots' parameter set
to '1' and created a streaming replication with the new publisher as
primary node.
Then I did a pg_upgrade from old publisher to new publisher. The
upgrade failed with following error:

Restoring logical replication slots in the new cluster
SQL command failed
SELECT * FROM pg_catalog.pg_create_logical_replication_slot('test1',
'pgoutput', false, false);
ERROR:  all replication slots are in use
HINT:  Free one or increase max_replication_slots.

Failure, exiting

Should we document that the existing replication slots are taken in
consideration while setting 'max_replication_slots' value in the new
publisher?

Thanks
Shlok Kumar Kyal

On Wed, 18 Oct 2023 at 15:01, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Peter,
>
> Thank you for reviewing! PSA new version.
> Note that 0001 and 0002 are combined into one patch.
>
> > Here are some review comments for v51-0001
> >
> > ==
> > src/bin/pg_upgrade/check.c
> >
> > 0.
> > +check_old_cluster_for_valid_slots(bool live_check)
> > +{
> > + char output_path[MAXPGPATH];
> > + FILE*script = NULL;
> > +
> > + prep_status("Checking for valid logical replication slots");
> > +
> > + snprintf(output_path, sizeof(output_path), "%s/%s",
> > + log_opts.basedir,
> > + "invalid_logical_relication_slots.txt");
> >
> > 0a
> > typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
>
> Fixed.
>
> > 0b.
> > Since the non-upgradable slots are not strictly "invalid", is this an
> > appropriate filename for the bad ones?
> >
> > But I don't have very good alternatives. Maybe:
> > - non_upgradable_logical_replication_slots.txt
> > - problem_logical_replication_slots.txt
>
> Per discussion [1], I kept current style.
>
> > src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> >
> > 1.
> > +# --
> > +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> > +#
> > +# There are two requirements for GUCs - wal_level and 
> > max_replication_slots,
> > +# but only max_replication_slots will be tested here. This is because to
> > +# reduce the execution time of the test.
> >
> >
> > SUGGESTION
> > # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> > #
> > # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> > # reduce the test execution time, only 'max_replication_slots' is tested 
> > here.
>
> First part was fixed. Second part was removed per [1].
>
> > 2.
> > +# Preparations for the subsequent test:
> > +# 1. Create two slots on the old cluster
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot1',
> > 'test_decoding', false, true);"
> > +);
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot2',
> > 'test_decoding', false, true);"
> > +);
> >
> >
> > Can't you combine those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 3.
> > +# Clean up
> > +rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
> > +# Set max_replication_slots to the same value as the number of slots. Both 
> > of
> > +# slots will be used for subsequent tests.
> > +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 
> > 1");
> >
> > The code doesn't seem to match the comment - is this correct? The
> > old_publisher created 2 slots, so why are you setting new_publisher
> > "max_replication_slots = 1" again?
>
> Fixed to "max_replication_slots = 2" Note that previous test worked well 
> because
> GUC checking on new cluster is done after checking the status of slots.
>
> > 4.
> > +# Preparations for the subsequent test:
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> > +
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> > +
> > +# 3. Emit a non-transactional message. test_slot2 detects the message so 
> > that
> > +# this slot will be also reported by upcoming pg_upgrade.
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> > 'This is a non-transactional message');"
> > +);
> >
> >
> > I felt this test would be clearer if you emphasised the state of the
> > test_slot1 also. e.g.
> >
> > 4a.
> > BEFORE
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> >
> > SUGGESTION
> > # 1. Generate extra WAL records. At this point neither test_slot1 nor 
> > test_slot2
> > #has consumed them.
>
> Fixed.
>
> > 4b.
> > BEFORE

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-19 Thread vignesh C
On Wed, 18 Oct 2023 at 14:55, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Peter,
>
> Thank you for reviewing! PSA new version.
> Note that 0001 and 0002 are combined into one patch.
>
> > Here are some review comments for v51-0001
> >
> > ==
> > src/bin/pg_upgrade/check.c
> >
> > 0.
> > +check_old_cluster_for_valid_slots(bool live_check)
> > +{
> > + char output_path[MAXPGPATH];
> > + FILE*script = NULL;
> > +
> > + prep_status("Checking for valid logical replication slots");
> > +
> > + snprintf(output_path, sizeof(output_path), "%s/%s",
> > + log_opts.basedir,
> > + "invalid_logical_relication_slots.txt");
> >
> > 0a
> > typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
>
> Fixed.
>
> > 0b.
> > Since the non-upgradable slots are not strictly "invalid", is this an
> > appropriate filename for the bad ones?
> >
> > But I don't have very good alternatives. Maybe:
> > - non_upgradable_logical_replication_slots.txt
> > - problem_logical_replication_slots.txt
>
> Per discussion [1], I kept current style.
>
> > src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> >
> > 1.
> > +# --
> > +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> > +#
> > +# There are two requirements for GUCs - wal_level and 
> > max_replication_slots,
> > +# but only max_replication_slots will be tested here. This is because to
> > +# reduce the execution time of the test.
> >
> >
> > SUGGESTION
> > # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> > #
> > # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> > # reduce the test execution time, only 'max_replication_slots' is tested 
> > here.
>
> First part was fixed. Second part was removed per [1].
>
> > 2.
> > +# Preparations for the subsequent test:
> > +# 1. Create two slots on the old cluster
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot1',
> > 'test_decoding', false, true);"
> > +);
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot2',
> > 'test_decoding', false, true);"
> > +);
> >
> >
> > Can't you combine those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 3.
> > +# Clean up
> > +rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
> > +# Set max_replication_slots to the same value as the number of slots. Both 
> > of
> > +# slots will be used for subsequent tests.
> > +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 
> > 1");
> >
> > The code doesn't seem to match the comment - is this correct? The
> > old_publisher created 2 slots, so why are you setting new_publisher
> > "max_replication_slots = 1" again?
>
> Fixed to "max_replication_slots = 2" Note that previous test worked well 
> because
> GUC checking on new cluster is done after checking the status of slots.
>
> > 4.
> > +# Preparations for the subsequent test:
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> > +
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> > +
> > +# 3. Emit a non-transactional message. test_slot2 detects the message so 
> > that
> > +# this slot will be also reported by upcoming pg_upgrade.
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> > 'This is a non-transactional message');"
> > +);
> >
> >
> > I felt this test would be clearer if you emphasised the state of the
> > test_slot1 also. e.g.
> >
> > 4a.
> > BEFORE
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> >
> > SUGGESTION
> > # 1. Generate extra WAL records. At this point neither test_slot1 nor 
> > test_slot2
> > #has consumed them.
>
> Fixed.
>
> > 4b.
> > BEFORE
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> >
> > SUGGESTION
> > # 2. Advance the slot test_slot2 up to the current WAL location, but 
> > test_slot2
> > #still has unconsumed WAL records.
>
> IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I 
> think
> "but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.
>
> > 5.
> > +# pg_upgrade will fail because the slot still has unconsumed WAL records
> > +command_checks_all(
> >
> > /because the slot still has/because there are slots still having/
>
> Fixed.
>
> > 6.
> > + [qr//],
> > + 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
> > +);
> >
> > /slot/slots/
>
> Fixed.
>
> > 7.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-19 Thread Shlok Kyal
> Few comments:
> 1) We will be able to override the value of max_slot_wal_keep_size by
> using --new-options like '--new-options  "-c
> max_slot_wal_keep_size=val"':
> +   /*
> +* Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
> +* checkpointer process.  If WALs required by logical replication 
> slots
> +* are removed, the slots are unusable.  This setting prevents the
> +* invalidation of slots during the upgrade. We set this option when
> +* cluster is PG17 or later because logical replication slots
> can only be
> +* migrated since then. Besides, max_slot_wal_keep_size is
> added in PG13.
> +*/
> +   if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
> +   appendPQExpBufferStr(, " -c
> max_slot_wal_keep_size=-1");
>
> Should there be a check to throw an error if this option is specified
> or do we need some documentation that this option should not be
> specified?

I have tested the above scenario. We are able to override the
max_slot_wal_keep_size by using  '--new-options  "-c
max_slot_wal_keep_size=val"'. And also with some insert statements
during pg_upgrade, old WAL file were deleted and logical replication
slots were invalidated. Since the slots were invalidated replication
was not happening after the upgrade.

Thanks,
Shlok Kumar Kyal




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-18 Thread Zhijie Hou (Fujitsu)
On Wednesday, October 18, 2023 5:26 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> 
> Thank you for reviewing! PSA new version.
> Note that 0001 and 0002 are combined into one patch.

Thanks for updating the patch, here are few comments for the test.

1.

>
# The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections' to
# the same value (10) but PG12 and prior considered max_walsenders as a subset
# of max_connections, so setting the same value will fail.
if ($old_publisher->pg_version->major < 12)
{
$old_publisher->append_conf(
'postgresql.conf', qq[
max_wal_senders = 5
max_connections = 10
]);
>

I think we already set max_wal_senders to 5 in init() function(in Cluster.pm),
so is this necessary ? And 002_pg_upgrade.pl doesn't seems set this.

2.

SELECT pg_create_logical_replication_slot('test_slot1', 
'test_decoding', false, true);
SELECT pg_create_logical_replication_slot('test_slot2', 
'test_decoding', false, true);

I think we don't need to set the last two parameters here as we don't check
these info in the tests.

3.

# Set extra params if cross-version checks are required. This is needed to
# avoid using previously initdb'd cluster
if (defined($ENV{oldinstall}))
{
my @initdb_params = ();
push @initdb_params, ('--encoding', 'UTF-8');
push @initdb_params, ('--locale', 'C');

I am not sure I understand the comment, would it be possible provide a bit more
explanation about the purpose of this setting ? And I see 002_pg_upgrade always
have these setting even if oldinstall is not defined, so shall we follow the
same ?

4.

+   command_ok(
+   [
+   'pg_upgrade', '--no-sync',
+   '-d', $old_publisher->data_dir,
+   '-D', $new_publisher->data_dir,
+   '-b', $oldbindir,
+   '-B', $newbindir,
+   '-s', $new_publisher->host,
+   '-p', $old_publisher->port,
+   '-P', $new_publisher->port,
+   $mode,
+   ],

I think all the pg_upgrade commands in the test are the same, so we can save 
the cmd
in a variable and pass them to command_xx(). I think it can save some effort to
check the difference of each command and can also reduce some codes.

Best Regards,
Hou zj


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-18 Thread vignesh C
On Wed, 18 Oct 2023 at 14:55, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Peter,
>
> Thank you for reviewing! PSA new version.
> Note that 0001 and 0002 are combined into one patch.
>
> > Here are some review comments for v51-0001
> >
> > ==
> > src/bin/pg_upgrade/check.c
> >
> > 0.
> > +check_old_cluster_for_valid_slots(bool live_check)
> > +{
> > + char output_path[MAXPGPATH];
> > + FILE*script = NULL;
> > +
> > + prep_status("Checking for valid logical replication slots");
> > +
> > + snprintf(output_path, sizeof(output_path), "%s/%s",
> > + log_opts.basedir,
> > + "invalid_logical_relication_slots.txt");
> >
> > 0a
> > typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
>
> Fixed.
>
> > 0b.
> > Since the non-upgradable slots are not strictly "invalid", is this an
> > appropriate filename for the bad ones?
> >
> > But I don't have very good alternatives. Maybe:
> > - non_upgradable_logical_replication_slots.txt
> > - problem_logical_replication_slots.txt
>
> Per discussion [1], I kept current style.
>
> > src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> >
> > 1.
> > +# --
> > +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> > +#
> > +# There are two requirements for GUCs - wal_level and 
> > max_replication_slots,
> > +# but only max_replication_slots will be tested here. This is because to
> > +# reduce the execution time of the test.
> >
> >
> > SUGGESTION
> > # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> > #
> > # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> > # reduce the test execution time, only 'max_replication_slots' is tested 
> > here.
>
> First part was fixed. Second part was removed per [1].
>
> > 2.
> > +# Preparations for the subsequent test:
> > +# 1. Create two slots on the old cluster
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot1',
> > 'test_decoding', false, true);"
> > +);
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot2',
> > 'test_decoding', false, true);"
> > +);
> >
> >
> > Can't you combine those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 3.
> > +# Clean up
> > +rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
> > +# Set max_replication_slots to the same value as the number of slots. Both 
> > of
> > +# slots will be used for subsequent tests.
> > +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 
> > 1");
> >
> > The code doesn't seem to match the comment - is this correct? The
> > old_publisher created 2 slots, so why are you setting new_publisher
> > "max_replication_slots = 1" again?
>
> Fixed to "max_replication_slots = 2" Note that previous test worked well 
> because
> GUC checking on new cluster is done after checking the status of slots.
>
> > 4.
> > +# Preparations for the subsequent test:
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> > +
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> > +
> > +# 3. Emit a non-transactional message. test_slot2 detects the message so 
> > that
> > +# this slot will be also reported by upcoming pg_upgrade.
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> > 'This is a non-transactional message');"
> > +);
> >
> >
> > I felt this test would be clearer if you emphasised the state of the
> > test_slot1 also. e.g.
> >
> > 4a.
> > BEFORE
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> >
> > SUGGESTION
> > # 1. Generate extra WAL records. At this point neither test_slot1 nor 
> > test_slot2
> > #has consumed them.
>
> Fixed.
>
> > 4b.
> > BEFORE
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> >
> > SUGGESTION
> > # 2. Advance the slot test_slot2 up to the current WAL location, but 
> > test_slot2
> > #still has unconsumed WAL records.
>
> IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I 
> think
> "but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.
>
> > 5.
> > +# pg_upgrade will fail because the slot still has unconsumed WAL records
> > +command_checks_all(
> >
> > /because the slot still has/because there are slots still having/
>
> Fixed.
>
> > 6.
> > + [qr//],
> > + 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
> > +);
> >
> > /slot/slots/
>
> Fixed.
>
> > 7.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-18 Thread Peter Smith
Here are some review comments for v52-0001

==
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

1.
+ # 2. max_replication_slots is set to smaller than the number of slots (2)
+ # present on the old cluster

SUGGESTION
2. Set 'max_replication_slots' to be less than the number of slots (2)
present on the old cluster.

~~~

2.
+ # Set max_replication_slots to the same value as the number of slots. Both
+ # of slots will be used for subsequent tests.

SUGGESTION
Set 'max_replication_slots' to match the number of slots (2) present
on the old cluster.
Both slots will be used for subsequent tests.

~~~

3.
+ # 3. Emit a non-transactional message. test_slot2 detects the message so
+ # that this slot will be also reported by upcoming pg_upgrade.
+ $old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+ );

SUGGESTION
3. Emit a non-transactional message. This will cause test_slot2 to
detect the unconsumed WAL record.

~~~

4.
+ # Preparations for the subsequent test:
+ # 1. Generate extra WAL records. At this point neither test_slot1 nor
+ # test_slot2 has consumed them.
+ $old_publisher->start;
+ $old_publisher->safe_psql('postgres',
+ "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+ # 2. Advance the slot test_slot2 up to the current WAL location, but
+ # test_slot1 still has unconsumed WAL records.
+ $old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+ # 3. Emit a non-transactional message. test_slot2 detects the message so
+ # that this slot will be also reported by upcoming pg_upgrade.
+ $old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+ );
+
+ $old_publisher->stop;

All of the above are sequentially executed on the
old_publisher->safe_psql, so consider if it is worth combining them
all in a single call (keeping the comments 1,2,3 separate still)

For example,

$old_publisher->start;
$old_publisher->safe_psql('postgres', qq[
  CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
  SELECT pg_replication_slot_advance('test_slot2', NULL);
  SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');
]);
$old_publisher->stop;

~~~

5.
+ # Clean up
+ $subscriber->stop();
+ $new_publisher->stop();

Should this also drop the 'test_slot1' and 'test_slot2'?

~~~

6.
+# Verify that logical replication slots cannot be migrated.  This function
+# will be executed when the old cluster is PG16 and prior.
+sub test_upgrade_from_pre_PG17
+{
+ my ($old_publisher, $new_publisher, $mode) = @_;
+
+ my $oldbindir = $old_publisher->config_data('--bindir');
+ my $newbindir = $new_publisher->config_data('--bindir');

SUGGESTION (let's not mention lots of different numbers; just refer to 17)
This function will be executed when the old cluster version is prior to PG17.

~~

7.
+ # Actual run, successful upgrade is expected
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $oldbindir,
+ '-B', $newbindir,
+ '-s', $new_publisher->host,
+ '-p', $old_publisher->port,
+ '-P', $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old cluster');
+
+ ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+ "pg_upgrade_output.d/ removed after pg_upgrade success");

7a.
The comment is wrong?

SUGGESTION
# pg_upgrade should NOT be successful

~

7b.
There is a blank line here before the ok() function, but in the other
tests, there was none. Better to be consistent.

~~~

8.
+ # Clean up
+ $new_publisher->stop();

Should this also drop the 'test_slot'?

~~~

9.
+# The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections' to
+# the same value (10) but PG12 and prior considered max_walsenders as a subset
+# of max_connections, so setting the same value will fail.
+if ($old_publisher->pg_version->major < 12)
+{
+ $old_publisher->append_conf(
+ 'postgresql.conf', qq[
+ max_wal_senders = 5
+ max_connections = 10
+ ]);
+}

If the comment is correct, then PG12 *and* prior, should be testing
"<= 12", not "< 12". right?

~~~

10.
+# Test according to the major version of the old cluster.
+# Upgrading logical replication slots has been supported only since PG17.
+if ($old_publisher->pg_version->major >= 17)

This comment seems wrong IMO. I think we always running the latest
version of pg_upgrade so slot migration is always "supported" from now
on. IIUC you intended this comment to be saying something about the
old_publisher slots.

BEFORE
Upgrading logical replication slots has been supported only since PG17.

SUGGESTION
Upgrading logical replication slots from versions older than PG17 is
not supported.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-18 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thank you for reviewing! New patch is available in [1].

> ==
> src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> 
> 1.
> +# Set max_wal_senders to a lower value if the old cluster is prior to PG12.
> +# Such clusters regard max_wal_senders as part of max_connections, but the
> +# current TAP tester sets these GUCs to the same value.
> +if ($old_publisher->pg_version < 12)
> +{
> + $old_publisher->append_conf('postgresql.conf', "max_wal_senders = 5");
> +}
> 
> 1a.
> I was initially unsure what the above comment meant -- thanks for the
> offline explanation.
> 
> SUGGESTION
> The TAP Cluster.pm assigns default 'max_wal_senders' and
> 'max_connections' to the same value (10) but PG12 and prior considered
> max_walsenders as a subset of max_connections, so setting the same
> value will fail.

Fixed.

> 1b.
> I also felt it is better to explicitly set both values in the < PG12
> configuration because otherwise, you are still assuming knowledge that
> the TAP default max_connections is 10.
> 
> SUGGESTION
> $old_publisher->append_conf('postgresql.conf', qq{
> max_wal_senders = 5
> max_connections = 10
> });

Fixed.

> 2.
> +# Switch workloads depend on the major version of the old cluster.  Upgrading
> +# logical replication slots has been supported since PG17.
> +if ($old_publisher->pg_version <= 16)
> +{
> + test_for_16_and_prior($old_publisher, $new_publisher, $mode);
> +}
> +else
> +{
> + test_for_17_and_later($old_publisher, $new_publisher, $mode);
> +}
> 
> IMO it is less confusing to have fewer version numbers floating around
> in comments and names and code. So instead of referring to 16 and 17,
> how about just referring to 17 everywhere?
> 
> For example
> 
> SUGGESTION
> # Test according to the major version of the old cluster.
> # Upgrading logical replication slots has been supported only since PG17.
> 
> if ($old_publisher->pg_version >= 17)
> {
>   test_upgrade_from_PG17_and_later($old_publisher, $new_publisher, $mode);
> }
> else
> {
>   test_upgrade_from_pre_PG17($old_publisher, $new_publisher, $mode);
> }

In HEAD code, the pg_version seems "17devel". The string seemed smaller than 17 
for Perl.
(i.e., "17devel" >= 17 means false)
For the purpose of comparing only the major version, pg_version->major was used.

Also, I removed the support for ~PG9.4. I cannot find descriptions, but 
according to [2],
Cluster.pm does not support such binaries.
(cluster_name is set when the server process is started, but the GUC has been 
added in PG9.5)

[1]: 
https://www.postgresql.org/message-id/TYCPR01MB5870EBEBC89F5224F6B3788CF5D5A%40TYCPR01MB5870.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/YsUrUDrRhUbuU/6k%40paquier.xyz

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-18 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thank you for reviewing! PSA new version.
Note that 0001 and 0002 are combined into one patch.

> Here are some review comments for v51-0001
> 
> ==
> src/bin/pg_upgrade/check.c
> 
> 0.
> +check_old_cluster_for_valid_slots(bool live_check)
> +{
> + char output_path[MAXPGPATH];
> + FILE*script = NULL;
> +
> + prep_status("Checking for valid logical replication slots");
> +
> + snprintf(output_path, sizeof(output_path), "%s/%s",
> + log_opts.basedir,
> + "invalid_logical_relication_slots.txt");
> 
> 0a
> typo /invalid_logical_relication_slots/invalid_logical_replication_slots/

Fixed.

> 0b.
> Since the non-upgradable slots are not strictly "invalid", is this an
> appropriate filename for the bad ones?
> 
> But I don't have very good alternatives. Maybe:
> - non_upgradable_logical_replication_slots.txt
> - problem_logical_replication_slots.txt

Per discussion [1], I kept current style.

> src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> 
> 1.
> +# --
> +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> +#
> +# There are two requirements for GUCs - wal_level and max_replication_slots,
> +# but only max_replication_slots will be tested here. This is because to
> +# reduce the execution time of the test.
> 
> 
> SUGGESTION
> # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> #
> # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> # reduce the test execution time, only 'max_replication_slots' is tested here.

First part was fixed. Second part was removed per [1].

> 2.
> +# Preparations for the subsequent test:
> +# 1. Create two slots on the old cluster
> +$old_publisher->start;
> +$old_publisher->safe_psql('postgres',
> + "SELECT pg_create_logical_replication_slot('test_slot1',
> 'test_decoding', false, true);"
> +);
> +$old_publisher->safe_psql('postgres',
> + "SELECT pg_create_logical_replication_slot('test_slot2',
> 'test_decoding', false, true);"
> +);
> 
> 
> Can't you combine those SQL in the same $old_publisher->safe_psql.

Combined.

> 3.
> +# Clean up
> +rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
> +# Set max_replication_slots to the same value as the number of slots. Both of
> +# slots will be used for subsequent tests.
> +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
> 
> The code doesn't seem to match the comment - is this correct? The
> old_publisher created 2 slots, so why are you setting new_publisher
> "max_replication_slots = 1" again?

Fixed to "max_replication_slots = 2" Note that previous test worked well because
GUC checking on new cluster is done after checking the status of slots.

> 4.
> +# Preparations for the subsequent test:
> +# 1. Generate extra WAL records. Because these WAL records do not get
> consumed
> +# it will cause the upcoming pg_upgrade test to fail.
> +$old_publisher->start;
> +$old_publisher->safe_psql('postgres',
> + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> +
> +# 2. Advance the slot test_slot2 up to the current WAL location
> +$old_publisher->safe_psql('postgres',
> + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> +
> +# 3. Emit a non-transactional message. test_slot2 detects the message so that
> +# this slot will be also reported by upcoming pg_upgrade.
> +$old_publisher->safe_psql('postgres',
> + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> 'This is a non-transactional message');"
> +);
> 
> 
> I felt this test would be clearer if you emphasised the state of the
> test_slot1 also. e.g.
> 
> 4a.
> BEFORE
> +# 1. Generate extra WAL records. Because these WAL records do not get
> consumed
> +# it will cause the upcoming pg_upgrade test to fail.
> 
> SUGGESTION
> # 1. Generate extra WAL records. At this point neither test_slot1 nor 
> test_slot2
> #has consumed them.

Fixed.

> 4b.
> BEFORE
> +# 2. Advance the slot test_slot2 up to the current WAL location
> 
> SUGGESTION
> # 2. Advance the slot test_slot2 up to the current WAL location, but 
> test_slot2
> #still has unconsumed WAL records.

IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I 
think 
"but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.

> 5.
> +# pg_upgrade will fail because the slot still has unconsumed WAL records
> +command_checks_all(
> 
> /because the slot still has/because there are slots still having/

Fixed.

> 6.
> + [qr//],
> + 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
> +);
> 
> /slot/slots/

Fixed.

> 7.
> +# And check the content. Both of slots must be reported that they have
> +# unconsumed WALs after confirmed_flush_lsn.
> 
> SUGGESTION
> # Check the file content. Both slots should be reporting that they have
> # unconsumed WAL records.

Fixed.

> 
> 8.
> +# Preparations for the subsequent test:
> +# 1. Setup logical replication
> +my $old_connstr = 

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-18 Thread Peter Smith
Here are some comments for the patch v51-0002

==
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

1.
+# Set max_wal_senders to a lower value if the old cluster is prior to PG12.
+# Such clusters regard max_wal_senders as part of max_connections, but the
+# current TAP tester sets these GUCs to the same value.
+if ($old_publisher->pg_version < 12)
+{
+ $old_publisher->append_conf('postgresql.conf', "max_wal_senders = 5");
+}

1a.
I was initially unsure what the above comment meant -- thanks for the
offline explanation.

SUGGESTION
The TAP Cluster.pm assigns default 'max_wal_senders' and
'max_connections' to the same value (10) but PG12 and prior considered
max_walsenders as a subset of max_connections, so setting the same
value will fail.

~

1b.
I also felt it is better to explicitly set both values in the < PG12
configuration because otherwise, you are still assuming knowledge that
the TAP default max_connections is 10.

SUGGESTION
$old_publisher->append_conf('postgresql.conf', qq{
max_wal_senders = 5
max_connections = 10
});

~~~

2.
+# Switch workloads depend on the major version of the old cluster.  Upgrading
+# logical replication slots has been supported since PG17.
+if ($old_publisher->pg_version <= 16)
+{
+ test_for_16_and_prior($old_publisher, $new_publisher, $mode);
+}
+else
+{
+ test_for_17_and_later($old_publisher, $new_publisher, $mode);
+}

IMO it is less confusing to have fewer version numbers floating around
in comments and names and code. So instead of referring to 16 and 17,
how about just referring to 17 everywhere?

For example

SUGGESTION
# Test according to the major version of the old cluster.
# Upgrading logical replication slots has been supported only since PG17.

if ($old_publisher->pg_version >= 17)
{
  test_upgrade_from_PG17_and_later($old_publisher, $new_publisher, $mode);
}
else
{
  test_upgrade_from_pre_PG17($old_publisher, $new_publisher, $mode);
}

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-17 Thread Amit Kapila
On Wed, Oct 18, 2023 at 7:31 AM Peter Smith  wrote:
>
> ==
> src/bin/pg_upgrade/check.c
>
> 0.
> +check_old_cluster_for_valid_slots(bool live_check)
> +{
> + char output_path[MAXPGPATH];
> + FILE*script = NULL;
> +
> + prep_status("Checking for valid logical replication slots");
> +
> + snprintf(output_path, sizeof(output_path), "%s/%s",
> + log_opts.basedir,
> + "invalid_logical_relication_slots.txt");
>
> 0a
> typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
>
> ~
>
> 0b.
> Since the non-upgradable slots are not strictly "invalid", is this an
> appropriate filename for the bad ones?
>
> But I don't have very good alternatives. Maybe:
> - non_upgradable_logical_replication_slots.txt
> - problem_logical_replication_slots.txt
>

I prefer the current naming. I think 'invalid' here indicates both
types of slots that are invalidated by the checkpointer and those that
have pending WAL to be consumed.

> ==
> src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
>
> 1.
> +# --
> +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> +#
> +# There are two requirements for GUCs - wal_level and max_replication_slots,
> +# but only max_replication_slots will be tested here. This is because to
> +# reduce the execution time of the test.
>
>
> SUGGESTION
> # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> #
> # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> # reduce the test execution time, only 'max_replication_slots' is tested here.
>

I think we don't need the second part of the comment: "Two GUCs ...".
Ideally, we should test each parameter's invalid value but that could
be costly, so I think it is okay to test a few of them.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-17 Thread Peter Smith
Here are some review comments for v51-0001

==
src/bin/pg_upgrade/check.c

0.
+check_old_cluster_for_valid_slots(bool live_check)
+{
+ char output_path[MAXPGPATH];
+ FILE*script = NULL;
+
+ prep_status("Checking for valid logical replication slots");
+
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "invalid_logical_relication_slots.txt");

0a
typo /invalid_logical_relication_slots/invalid_logical_replication_slots/

~

0b.
Since the non-upgradable slots are not strictly "invalid", is this an
appropriate filename for the bad ones?

But I don't have very good alternatives. Maybe:
- non_upgradable_logical_replication_slots.txt
- problem_logical_replication_slots.txt

==
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

1.
+# --
+# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
+#
+# There are two requirements for GUCs - wal_level and max_replication_slots,
+# but only max_replication_slots will be tested here. This is because to
+# reduce the execution time of the test.


SUGGESTION
# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
#
# Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
# reduce the test execution time, only 'max_replication_slots' is tested here.

~~~

2.
+# Preparations for the subsequent test:
+# 1. Create two slots on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot1',
'test_decoding', false, true);"
+);
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);"
+);


Can't you combine those SQL in the same $old_publisher->safe_psql.

~~~

3.
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Set max_replication_slots to the same value as the number of slots. Both of
+# slots will be used for subsequent tests.
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");

The code doesn't seem to match the comment - is this correct? The
old_publisher created 2 slots, so why are you setting new_publisher
"max_replication_slots = 1" again?

~~~

4.
+# Preparations for the subsequent test:
+# 1. Generate extra WAL records. Because these WAL records do not get consumed
+# it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+# 2. Advance the slot test_slot2 up to the current WAL location
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+# 3. Emit a non-transactional message. test_slot2 detects the message so that
+# this slot will be also reported by upcoming pg_upgrade.
+$old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+);


I felt this test would be clearer if you emphasised the state of the
test_slot1 also. e.g.

4a.
BEFORE
+# 1. Generate extra WAL records. Because these WAL records do not get consumed
+# it will cause the upcoming pg_upgrade test to fail.

SUGGESTION
# 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
#has consumed them.

~

4b.
BEFORE
+# 2. Advance the slot test_slot2 up to the current WAL location

SUGGESTION
# 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
#still has unconsumed WAL records.

~~~

5.
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(

/because the slot still has/because there are slots still having/

~~~

6.
+ [qr//],
+ 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
+);

/slot/slots/

~~~

7.
+# And check the content. Both of slots must be reported that they have
+# unconsumed WALs after confirmed_flush_lsn.

SUGGESTION
# Check the file content. Both slots should be reporting that they have
# unconsumed WAL records.


~~~

8.
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+$old_publisher->start;
+
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot1');");
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+$old_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub FOR ALL TABLES;");


8a.
/Setup logical replication/Setup logical replication (first, cleanup
slots from the previous tests)/

~

8b.
Can't you combine all those SQL in the same $old_publisher->safe_psql.

~~~

9.
+
+# Actual run, successful upgrade is expected
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $bindir,
+ '-B', $bindir,
+ '-s', $new_publisher->host,
+ '-p', 

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-17 Thread Hayato Kuroda (Fujitsu)
Dear Vignesh,

Thank you for reviewing! New version can be available in [1].

> 1) Should this:
> +# Copyright (c) 2023, PostgreSQL Global Development Group
> +
> +# Tests for upgrading replication slots
> +
> be:
> "Tests for upgrading logical replication slots"

Fixed.

> 2)  This statement is not entirely true:
> + 
> +  
> +   The old cluster has replicated all the changes to subscribers.
> +  
> 
> If we have some changes like shutdown_checkpoint the upgrade passes,
> if we have some changes like create view whose changes will not be
> replicated the upgrade fails.

Hmm, I felt current description seems sufficient, but how about the below?
"The old cluster has replicated all the transactions and logical decoding
 messages to subscribers."

> 3) All these includes are not required except for "logical.h"
> --- a/src/backend/utils/adt/pg_upgrade_support.c
> +++ b/src/backend/utils/adt/pg_upgrade_support.c
> @@ -11,14 +11,20 @@
> 
>  #include "postgres.h"
> 
> +#include "access/xlogutils.h"
> +#include "access/xlog_internal.h"
>  #include "catalog/binary_upgrade.h"
>  #include "catalog/heap.h"
>  #include "catalog/namespace.h"
>  #include "catalog/pg_type.h"
>  #include "commands/extension.h"
> +#include "funcapi.h"
>  #include "miscadmin.h"
> +#include "replication/logical.h"
> +#include "replication/slot.h"
>  #include "utils/array.h"
>  #include "utils/builtins.h"
> +#include "utils/pg_lsn.h"

I preferred to include all the needed items in each C files, but removed.

> 4) We could print two_phase as true/false instead of 0/1:
> +static void
> +print_slot_infos(LogicalSlotInfoArr *slot_arr)
> +{
> +   /* Quick return if there are no logical slots. */
> +   if (slot_arr->nslots == 0)
> +   return;
> +
> +   pg_log(PG_VERBOSE, "Logical replication slots within the database:");
> +
> +   for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
> +   {
> +   LogicalSlotInfo *slot_info = _arr->slots[slotnum];
> +
> +   pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\",
> two_phase: %d",
> +  slot_info->slotname,
> +  slot_info->plugin,
> +  slot_info->two_phase);
> +   }
> +}

Fixed.

> 5) test passes without the below, maybe this is not required:
> +# 2. Consume WAL records to avoid another type of upgrade failure. It will be
> +#   tested in subsequent cases.
> +$old_publisher->safe_psql('postgres',
> +   "SELECT count(*) FROM
> pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
> +);

This part is removed because of the refactoring.

> 6) This message "run of pg_upgrade of old cluster with idle
> replication slots" seems wrong:
> +# pg_upgrade will fail because the slot still has unconsumed WAL records
> +command_checks_all(
> +   [
> +   'pg_upgrade', '--no-sync',
> +   '-d', $old_publisher->data_dir,
> +   '-D', $new_publisher->data_dir,
> +   '-b', $bindir,
> +   '-B', $bindir,
> +   '-s', $new_publisher->host,
> +   '-p', $old_publisher->port,
> +   '-P', $new_publisher->port,
> +   $mode,
> +   ],
> +   1,
> +   [
> +   qr/Your installation contains invalid logical
> replication slots./
> +   ],
> +   [qr//],
> +   'run of pg_upgrade of old cluster with idle replication slots');
> +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
> +   "pg_upgrade_output.d/ not removed after pg_upgrade failure");

Rephased.

> 7) You could run pgindent and pgperlytidy, it shows there are few
> issues present with the patch.

I ran both.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866AC8A7694113BCBE0A71EF5D6A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-17 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thanks for reviewing! PSA new version.

> 
> Yeah, I think introducing additional complexity unless it is really
> required sounds a bit scary to me as well. BTW, please find attached
> some cosmetic changes.

Basically LGTM, but below part was conflicted with Bharath's comment [1].

```
@@ -1607,7 +1605,7 @@ check_old_cluster_for_valid_slots(bool live_check)
fclose(script);
 
pg_log(PG_REPORT, "fatal");
-   pg_fatal("Your installation contains logical replication slots 
that cannot be upgraded.\n"
+   pg_fatal("Your installation contains invalid logical 
replication slots.\n"
```

How about " Your installation contains logical replication slots that can't be 
upgraded."?

> One minor additional comment:
> +# Initialize subscriber cluster
> +my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
> +$subscriber->init(allows_streaming => 'logical');
> 
> Why do we need to set wal_level as logical for subscribers?

It is not mandatory. The line was copied from tests in src/test/subscription.
Removed the setting from my patch. I felt that it could be removed from other
patches. I will fork new thread and post the patch.


Also, I did some improvements based on the v50, basically for tests.

1. Test file was refactored. pg_uprade was executed many times in the test so 
the
   test time was increasing. Below refactorings were done.

===
a. Checks for both transactional and non-transactional changes were done at the
   same time.
b. Removed the dry-run test. It did not improve the coverage.
c. Removed the wal_level test. Other tests like subscriptions and test_decoding
   do not contain test for GUCs, so I thought it could be acceptable. Removing
   all the GUC test (for max_replication_slots) might be risky, so it was 
remained.
===

2. Supported the cross-version checks. If an environment variable "oldinstall"
   is set, use the binary as old cluster. If the specified one is PG16-, the
   test verifies that logical replication slots would not be migrated.
   002_pg_upgrade.pl requires that $ENV(olddump) must be also defined, but it
   is not needed for our test. I tried to support from PG9.2, which is the 
oldest
   version for Xupgrade test [2]. You can see 0002 patch for it.
   IIUC pg_create_logical_replication_slot() can be available since PG9.4, so 
tests
   will be skipped if older executables are specified, like:

```
$ oldinstall=/home/hayato/older/pg92/ make check 
PROVE_TESTS='t/003_upgrade_logical_replication_slots.pl'
...
# +++ tap check in src/bin/pg_upgrade +++
t/003_upgrade_logical_replication_slots.pl .. skipped: Logical replication 
slots can be available since PG9.4
Files=1, Tests=0,  0 wallclock secs ( 0.03 usr  0.00 sys +  0.08 cusr  0.02 
csys =  0.13 CPU)
Result: NOTESTS
```

[1]: 
https://www.postgresql.org/message-id/CALj2ACXp%2BLXioY_%3D9mboEbLD--4c4nnpJCZ%2Bj4fckBdSOQhENA%40mail.gmail.com
[2]: 
https://github.com/PGBuildFarm/client-code/releases#:~:text=support%20for%20testing%20cross%20version%20upgrade%20extended%20back%20to%209.2

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v51-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v51-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


v51-0002-support-cross-version-upgrade.patch
Description: v51-0002-support-cross-version-upgrade.patch


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-17 Thread vignesh C
On Mon, 16 Oct 2023 at 14:44, Amit Kapila  wrote:
>
> On Sat, Oct 14, 2023 at 10:45 AM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Here is a new patch.
> >
> > Previously I wrote:
> > > Based on above idea, I made new version patch which some functionalities 
> > > were
> > > exported from pg_resetwal. In this approach, pg_upgrade itself removed 
> > > WALs and
> > > then create logical slots, then pg_resetwal would be called with new 
> > > option
> > > --no-switch, which avoid to switch a WAL segment file. The option is only 
> > > used
> > > for the upgrading purpose so it is not written in doc and usage(). This 
> > > option
> > > is not required if pg_resetwal -o does not discard WAL records. Please 
> > > see the
> > > fork thread [1].
> >
> > But for now, these changes were reverted because changing pg_resetwal -o 
> > stuff
> > may be a bit risky. This has been located more than ten years so that we 
> > should
> > be more careful for modifying.
> > Also, I cannot come up with problems if slots are created after the 
> > pg_resetwal.
> > Background processes would not generate decodable changes (listed in [1]), 
> > and
> > BGworkers by extensions could be ignored [2].
> > Based on the discussion on forked thread [3] and if it is accepted, we will 
> > apply
> > again.
> >

1) Should this:
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
be:
"Tests for upgrading logical replication slots"

2)  This statement is not entirely true:
+ 
+  
+   The old cluster has replicated all the changes to subscribers.
+  

If we have some changes like shutdown_checkpoint the upgrade passes,
if we have some changes like create view whose changes will not be
replicated the upgrade fails.

3) All these includes are not required except for "logical.h"
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@

 #include "postgres.h"

+#include "access/xlogutils.h"
+#include "access/xlog_internal.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
+#include "funcapi.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/slot.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"

4) We could print two_phase as true/false instead of 0/1:
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+   /* Quick return if there are no logical slots. */
+   if (slot_arr->nslots == 0)
+   return;
+
+   pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+   for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+   {
+   LogicalSlotInfo *slot_info = _arr->slots[slotnum];
+
+   pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\",
two_phase: %d",
+  slot_info->slotname,
+  slot_info->plugin,
+  slot_info->two_phase);
+   }
+}

5) test passes without the below, maybe this is not required:
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#   tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+   "SELECT count(*) FROM
pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);

6) This message "run of pg_upgrade of old cluster with idle
replication slots" seems wrong:
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
+   [
+   'pg_upgrade', '--no-sync',
+   '-d', $old_publisher->data_dir,
+   '-D', $new_publisher->data_dir,
+   '-b', $bindir,
+   '-B', $bindir,
+   '-s', $new_publisher->host,
+   '-p', $old_publisher->port,
+   '-P', $new_publisher->port,
+   $mode,
+   ],
+   1,
+   [
+   qr/Your installation contains invalid logical
replication slots./
+   ],
+   [qr//],
+   'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+   "pg_upgrade_output.d/ not removed after pg_upgrade failure");

7) You could run pgindent and pgperlytidy, it shows there are few
issues present with the patch.

Regards,
Vignesh




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-16 Thread Amit Kapila
On Sat, Oct 14, 2023 at 10:45 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Here is a new patch.
>
> Previously I wrote:
> > Based on above idea, I made new version patch which some functionalities 
> > were
> > exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs 
> > and
> > then create logical slots, then pg_resetwal would be called with new option
> > --no-switch, which avoid to switch a WAL segment file. The option is only 
> > used
> > for the upgrading purpose so it is not written in doc and usage(). This 
> > option
> > is not required if pg_resetwal -o does not discard WAL records. Please see 
> > the
> > fork thread [1].
>
> But for now, these changes were reverted because changing pg_resetwal -o stuff
> may be a bit risky. This has been located more than ten years so that we 
> should
> be more careful for modifying.
> Also, I cannot come up with problems if slots are created after the 
> pg_resetwal.
> Background processes would not generate decodable changes (listed in [1]), and
> BGworkers by extensions could be ignored [2].
> Based on the discussion on forked thread [3] and if it is accepted, we will 
> apply
> again.
>

Yeah, I think introducing additional complexity unless it is really
required sounds a bit scary to me as well. BTW, please find attached
some cosmetic changes.

One minor additional comment:
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');

Why do we need to set wal_level as logical for subscribers?

-- 
With Regards,
Amit Kapila.
diff --git a/src/backend/replication/logical/decode.c 
b/src/backend/replication/logical/decode.c
index 4144a43afd..cfa955a679 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -618,9 +618,9 @@ logicalmsg_decode(LogicalDecodingContext *ctx, 
XLogRecordBuffer *buf)
return;
 
/*
-* We can also skip decoding when in 'fast_forward' mode. This check 
must
-* be last because we don't want to set that processing_required flag
-* unnecessarily.
+* We also skip decoding in 'fast_forward' mode. This check must be last
+* because we don't want to set the processing_required flag unless
+* we have a decodable message.
 */
if (ctx->fast_forward)
{
@@ -1307,8 +1307,8 @@ DecodeTXNNeedSkip(LogicalDecodingContext *ctx, 
XLogRecordBuffer *buf,
return true;
 
/*
-* We can also skip decoding when in 'fast_forward' mode. In passing set
-* the 'processing_required' flag to indicate, were it not for this 
mode,
+* We also skip decoding in 'fast_forward' mode. In passing set the
+* 'processing_required' flag to indicate, were it not for this mode,
 * processing *would* have been required.
 */
if (ctx->fast_forward)
diff --git a/src/backend/replication/logical/logical.c 
b/src/backend/replication/logical/logical.c
index 32869a75ab..e02cd0fa44 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1953,9 +1953,9 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 }
 
 /*
- * Read to end of WAL starting from the decoding slot's restart_lsn. Return
- * true if any meaningful/decodable WAL records are encountered, otherwise
- * false.
+ * Read up to the end of WAL starting from the decoding slot's restart_lsn.
+ * Return true if any meaningful/decodable WAL records are encountered,
+ * otherwise false.
  *
  * Although this function is currently used only during pg_upgrade, there are
  * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
diff --git a/src/backend/utils/adt/pg_upgrade_support.c 
b/src/backend/utils/adt/pg_upgrade_support.c
index 2a831bc397..a3a8ade405 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -274,8 +274,8 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
  * Returns true if there are no decodable WAL records after the
  * confirmed_flush_lsn. Otherwise false.
  *
- * This is a special purpose function to ensure the given slot can be upgraded
- * without data loss.
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
  */
 Datum
 binary_upgrade_slot_has_pending_wal(PG_FUNCTION_ARGS)
@@ -294,16 +294,10 @@ binary_upgrade_slot_has_pending_wal(PG_FUNCTION_ARGS)
 
slot_name = PG_GETARG_NAME(0);
 
-   /*
-* Acquire the given slot. There should be no error because the caller 
has
-* already checked the slot exists.
-*/
+   /* Acquire the given slot. */
ReplicationSlotAcquire(NameStr(*slot_name), true);
 
-   /*
-* It's caller's responsibility to check the health of the slot.  
Upcoming
-* functions assume the restart_lsn points to a valid record.
-*/
+   /* 

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-13 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

Here is a new patch.

Previously I wrote:
> Based on above idea, I made new version patch which some functionalities were
> exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs 
> and
> then create logical slots, then pg_resetwal would be called with new option
> --no-switch, which avoid to switch a WAL segment file. The option is only used
> for the upgrading purpose so it is not written in doc and usage(). This option
> is not required if pg_resetwal -o does not discard WAL records. Please see the
> fork thread [1].

But for now, these changes were reverted because changing pg_resetwal -o stuff
may be a bit risky. This has been located more than ten years so that we should
be more careful for modifying.
Also, I cannot come up with problems if slots are created after the pg_resetwal.
Background processes would not generate decodable changes (listed in [1]), and
BGworkers by extensions could be ignored [2].
Based on the discussion on forked thread [3] and if it is accepted, we will 
apply
again.

Also. some comments and function name was improved.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB58660273EACEFC5BF256B133F50DA%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: 
https://www.postgresql.org/message-id/CAA4eK1L4JB%2BKH_4EQryDEhyaLBPW6V20LqjdzOxCWyL7rbxqsA%40mail.gmail.com
[3]: 
https://www.postgresql.org/message-id/flat/CAA4eK1KRyPMiY4fW98qFofsYrPd87Oc83zDNxSeHfTYh_asdBg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v50-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v50-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-12 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thanks for reviewing! New patch is available at [1].

> 
> Some more comments:
> 1. Let's restruture binary_upgrade_validate_wal_logical_end() a bit.
> First, let's change its name to binary_upgrade_slot_has_pending_wal()
> or something like that. Then move the context creation and free
> related code into DecodingContextHasDecodedItems(). We can rename
> DecodingContextHasDecodedItems() as
> pg_logical_replication_slot_has_pending_wal() and place it in
> slotfuncs.c. This will make the code structure similar to other slot
> functions like pg_replication_slot_advance().

Seems clearer than mine. Fixed.

> 2. + * Returns true if there are no changes after the confirmed_flush_lsn.
> 
> How about something like: "Returns true if there are no decodable WAL
> records after the confirmed_flush_lsn."?

Fixed.

> 3. Shouldn't we need to call CheckSlotPermissions() in
> binary_upgrade_validate_wal_logical_end?

Added, but actually it is not needed. This is because only superusers can 
connect
to the server while upgrading. Please see below codes in InitPostgres().

```
if (IsBinaryUpgrade && !am_superuser)
{
ereport(FATAL,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 errmsg("must be superuser to connect in binary 
upgrade mode")));
}
```

> 4.
> + /*
> + * Also, set processing_required flag if the message is not
> + * transactional. It is needed to notify the message's existence to
> + * the caller side. Usually, the flag is set when either the COMMIT or
> + * ABORT records are decoded, but this must be turned on here because
> + * the non-transactional logical message is decoded without waiting
> + * for these records.
> + */
> 
> The first sentence of the comments doesn't seem to be required as that
> just says what the code does. So, let's slightly change it to: "We
> need to set processing_required flag to notify the message's existence
> to the caller side. Usually, the flag is set when either the COMMIT or
> ABORT records are decoded, but this must be turned on here because the
> non-transactional logical message is decoded without waiting for these
> records."

Fixed.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866B0614F80CE9F5EF051BDF5D3A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-12 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thanks for your suggestion! PSA new version.

> The other problem is that pg_resetwal removes all pre-existing WAL
> files which in this case could lead to the removal of the WAL file
> corresponding to restart_lsn. This is because at least the shutdown
> checkpoint record will be written after the creation of slots which
> could be in the new file used for restart_lsn. Then when we invoke
> pg_resetwal, it can remove that file.
> 
> One idea to deal with this could be to do the reset WAL stuff
> (FindEndOfXLOG(), KillExistingXLOG(), KillExistingArchiveStatus(),
> WriteEmptyXLOG()) in a separate function (say in pg_upgrade) and then
> create slots. If we do this, then we additionally need an option in
> pg_resetwal which skips resetting the WAL as that would have been done
> before creating the slots.

Based on above idea, I made new version patch which some functionalities were
exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs and
then create logical slots, then pg_resetwal would be called with new option
--no-switch, which avoid to switch a WAL segment file. The option is only used
for the upgrading purpose so it is not written in doc and usage(). This option
is not required if pg_resetwal -o does not discard WAL records. Please see the
fork thread [1].

We do not have to reserve future restart_lsn while creating a slot, so the 
binary
function binary_upgrade_create_logical_replication_slot() was removed.

Another advantage of this approach is to avoid calling pg_log_standby_snapshot()
after the pg_resetwal. This was needed because of two reasons, but they were
resolved automatically.
  1) pg_resetwal removes all WAL files.
  2) Logical slots requires a RUNNING_XACTS record for building a snapshot.
 
[1]: 
https://www.postgresql.org/message-id/CAA4eK1KRyPMiY4fW98qFofsYrPd87Oc83zDNxSeHfTYh_asdBg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


v49-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v49-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-12 Thread Amit Kapila
On Wed, Oct 11, 2023 at 4:27 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Thank you for reviewing! PSA new version.
>

Some more comments:
1. Let's restruture binary_upgrade_validate_wal_logical_end() a bit.
First, let's change its name to binary_upgrade_slot_has_pending_wal()
or something like that. Then move the context creation and free
related code into DecodingContextHasDecodedItems(). We can rename
DecodingContextHasDecodedItems() as
pg_logical_replication_slot_has_pending_wal() and place it in
slotfuncs.c. This will make the code structure similar to other slot
functions like pg_replication_slot_advance().

2. + * Returns true if there are no changes after the confirmed_flush_lsn.

How about something like: "Returns true if there are no decodable WAL
records after the confirmed_flush_lsn."?

3. Shouldn't we need to call CheckSlotPermissions() in
binary_upgrade_validate_wal_logical_end?

4.
+ /*
+ * Also, set processing_required flag if the message is not
+ * transactional. It is needed to notify the message's existence to
+ * the caller side. Usually, the flag is set when either the COMMIT or
+ * ABORT records are decoded, but this must be turned on here because
+ * the non-transactional logical message is decoded without waiting
+ * for these records.
+ */

The first sentence of the comments doesn't seem to be required as that
just says what the code does. So, let's slightly change it to: "We
need to set processing_required flag to notify the message's existence
to the caller side. Usually, the flag is set when either the COMMIT or
ABORT records are decoded, but this must be turned on here because the
non-transactional logical message is decoded without waiting for these
records."

-- 
With Regards,
Amit Kapila.




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-11 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thank you for reviewing! PSA new version.

> > I think you need to set the new flag only when we are not skipping the
> > transaction or in other words when we decide to process the
> > transaction. Otherwise, how will you distinguish the case where the
> > xact is already decoded and sent to client?

Actually, I wondered what should be, but I followed it. Indeed, we should avoid
the case which the xact has already been sent. But I was not sure other 
conditions
like transactions for another database - IIUC previous version regarded it as 
not
acceptable.

Now, I reconsider these cases can be ignored because they would not be sent to
subscriber. The consistency between pub/sub would not be broken even if these
WALs are remained.

> In the attached patch atop your v47*, I have changed it to show you
> what I have in mind.

Thanks, was included.

> A few more comments:
> =
> 1.
> +
> + /*
> + * Did the logical decoding context skip outputting any changes?
> + *
> + * This flag is used only when the context is in the silent mode.
> + */
> + bool output_skipped;
>  } LogicalDecodingContext;
> 
> This doesn't seem to convey the meaning to the caller. How about
> processing_required? BTW, I have made this change as well in the
> patch.

LGTM, changed like that.

> 2.
> @@ -295,7 +295,7 @@ xact_decode(LogicalDecodingContext *ctx,
> XLogRecordBuffer *buf)
> */
> if (TransactionIdIsValid(xid))
> {
> - if (!ctx->fast_forward)
> + if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
> ReorderBufferAddInvalidations(reorder, xid,
>   buf->origptr,
>   invals->nmsgs,
> @@ -303,7 +303,7 @@ xact_decode(LogicalDecodingContext *ctx,
> XLogRecordBuffer *buf)
> ReorderBufferXidSetCatalogChanges(ctx->reorder, xid,
>   buf->origptr);
> }
> - else if ((!ctx->fast_forward))
> + else if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
> ReorderBufferImmediateInvalidation(ctx->reorder,
>invals->nmsgs,
>invals->msgs);
> 
> We don't to execute the invalidations even in silent mode. Looking at
> this and other changes in the patch related to silent mode, I wonder
> whether we really need to introduce 'silent_mode'. Can't we simply set
> processing_required when 'fast_forward' mode is true and then let the
> caller decide whether it needs to further process the WAL?

After considering again, I agreed to remove silent mode. Initially, it was
introduced because did_process flag is set at XXX_cb_wrapper and reorderbuffer
layer. Now, the processing_required is set in 
DecodeCommit()->DecodeTXNNeedSkip(),
which means that each records does not need to be decoded. Based on that,
I removed the silent mode and use fast-forwarding mode instead.

Also, some parts (mostly code comments) were modified.

Acknowledgement: Thanks Peter and Hou for discussing with me.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v48-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v48-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-11 Thread Amit Kapila
On Tue, Oct 10, 2023 at 6:17 PM Amit Kapila  wrote:
>
>  DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
> Oid txn_dbid, RepOriginId origin_id)
>  {
> - return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
> - (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
> - ctx->fast_forward || FilterByOrigin(ctx, origin_id));
> + bool need_skip;
> +
> + need_skip = (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
> + (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
> + ctx->decoding_mode != DECODING_MODE_NORMAL ||
> + FilterByOrigin(ctx, origin_id));
> +
> + /* Set a flag if we are in the slient mode */
> + if (ctx->decoding_mode == DECODING_MODE_SILENT)
> + ctx->output_skipped = true;
> +
> + return need_skip;
>
> I think you need to set the new flag only when we are not skipping the
> transaction or in other words when we decide to process the
> transaction. Otherwise, how will you distinguish the case where the
> xact is already decoded and sent to client?
>

In the attached patch atop your v47*, I have changed it to show you
what I have in mind.

A few more comments:
=
1.
+
+ /*
+ * Did the logical decoding context skip outputting any changes?
+ *
+ * This flag is used only when the context is in the silent mode.
+ */
+ bool output_skipped;
 } LogicalDecodingContext;

This doesn't seem to convey the meaning to the caller. How about
processing_required? BTW, I have made this change as well in the
patch.

2.
@@ -295,7 +295,7 @@ xact_decode(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf)
*/
if (TransactionIdIsValid(xid))
{
- if (!ctx->fast_forward)
+ if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
ReorderBufferAddInvalidations(reorder, xid,
  buf->origptr,
  invals->nmsgs,
@@ -303,7 +303,7 @@ xact_decode(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf)
ReorderBufferXidSetCatalogChanges(ctx->reorder, xid,
  buf->origptr);
}
- else if ((!ctx->fast_forward))
+ else if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
ReorderBufferImmediateInvalidation(ctx->reorder,
   invals->nmsgs,
   invals->msgs);

We don't to execute the invalidations even in silent mode. Looking at
this and other changes in the patch related to silent mode, I wonder
whether we really need to introduce 'silent_mode'. Can't we simply set
processing_required when 'fast_forward' mode is true and then let the
caller decide whether it needs to further process the WAL?

-- 
With Regards,
Amit Kapila.
diff --git a/src/backend/replication/logical/decode.c 
b/src/backend/replication/logical/decode.c
index 6de54153f7..f3c561d8ed 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -631,7 +631,7 @@ logicalmsg_decode(LogicalDecodingContext *ctx, 
XLogRecordBuffer *buf)
if (ctx->decoding_mode == DECODING_MODE_SILENT &&
!message->transactional)
{
-   ctx->output_skipped = true;
+   ctx->processing_required = true;
return;
}
 
@@ -1294,8 +1294,6 @@ DecodeXLogTuple(char *data, Size len, 
ReorderBufferTupleBuf *tuple)
  * 2) The transaction happened in another database.
  * 3) The output plugin is not interested in the origin.
  * 4) We are not in the normal decoding mode.
- *
- * Also, set output_skipped flag if we are in the slient mode.
  */
 static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
@@ -1308,9 +1306,15 @@ DecodeTXNNeedSkip(LogicalDecodingContext *ctx, 
XLogRecordBuffer *buf,
 ctx->decoding_mode != DECODING_MODE_NORMAL ||
 FilterByOrigin(ctx, origin_id));
 
-   /* Set a flag if we are in the slient mode */
+   if (need_skip)
+   return true;
+
+   /*
+* We don't need to process the transaction in silent mode. Indicate the
+* same via LogicalDecodingContext, so that the caller can skip 
processing.
+*/
if (ctx->decoding_mode == DECODING_MODE_SILENT)
-   ctx->output_skipped = true;
+   ctx->processing_required = true;
 
-   return need_skip;
+   return true;
 }
diff --git a/src/backend/replication/logical/logical.c 
b/src/backend/replication/logical/logical.c
index 0f4b1c6323..e47f2ebd7c 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -2030,7 +2030,7 @@ DecodingContextHasdecodedItems(LogicalDecodingContext 
*ctx,
InvalidateSystemCaches();
 
/* Loop until the end of WAL or some changes are processed */
-   while (!ctx->output_skipped && ctx->reader->EndRecPtr < end_of_wal)
+   while (!ctx->processing_required && ctx->reader->EndRecPtr < end_of_wal)
{
XLogRecord *record;
char   *errm = NULL;
@@ -2046,5 +2046,5 @@ DecodingContextHasdecodedItems(LogicalDecodingContext 
*ctx,

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-10 Thread Amit Kapila
On Tue, Oct 10, 2023 at 4:51 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> >
> > Isn't it sufficient to add a test for silent mode in
> > begin/stream_start/begin_prepare kind of APIs and set
> > ctx->did_process? In all other APIs, we can assert that did_process
> > shouldn't be set and we never reach there when decoding mode is
> > silent.
> >
> >
> > + /* Check whether the meaningful change was found */
> > + found = (ctx->reorder->by_txn_last_xid != InvalidTransactionId ||
> > + ctx->did_process);
> >
> > Are you talking about this check in the patch? If so, can you please
> > explain when does the first check help?
>
> I changed around here so I describe once again.
>
> A flag (output_skipped) is set when the transaction is decoded till the end in
> silent mode. It is done in DecodeTXNNeedSkip() because the function is the 
> common
> path for both committed/aborted transactions. Also, DecodeTXNNeedSkip() 
> returns
> true when the decoding context is in the silent mode. Therefore, any 
> cb_wrapper
> functions would not be called anymore. DecodingContextHasdecodedItems() just
> returns output_skipped.
>
> This approach needs to read WALs till end of transactions before returning the
> upgrading function, but codes look simpler than the previous version.
>

 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
Oid txn_dbid, RepOriginId origin_id)
 {
- return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
- (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
- ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+ bool need_skip;
+
+ need_skip = (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+ ctx->decoding_mode != DECODING_MODE_NORMAL ||
+ FilterByOrigin(ctx, origin_id));
+
+ /* Set a flag if we are in the slient mode */
+ if (ctx->decoding_mode == DECODING_MODE_SILENT)
+ ctx->output_skipped = true;
+
+ return need_skip;

I think you need to set the new flag only when we are not skipping the
transaction or in other words when we decide to process the
transaction. Otherwise, how will you distinguish the case where the
xact is already decoded and sent to client?

--
With Regards,
Amit Kapila




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-10 Thread Hayato Kuroda (Fujitsu)
Dear Bharath,

Thanks for giving comments and apologize for late reply.
New version is available in [1].

> +1 for this approach. It looks neat.
> 
> I think we also need to add TAP tests to generate decodable WAL
> records (RUNNING_XACT, CHECKPOINT_ONLINE, XLOG_FPI_FOR_HINT,
> XLOG_SWITCH, XLOG_PARAMETER_CHANGE, XLOG_HEAP2_PRUNE) during
> pg_upgrade as described here
> https://www.postgresql.org/message-id/TYAPR01MB58660273EACEFC5BF256
> B133F50DA%40TYAPR01MB5866.jpnprd01.prod.outlook.com.
> Basically, these were the exceptional WAL records that may be
> generated by pg_upgrade, so having tests for them is good.

Hmm, I'm not sure it is really good. If we add such a test, we may have to add
further tests in future if new WAL log types during upgrade is introduced.
Currently we do not have if-statement for each WAL types, so it does not improve
coverage, I thought. Another concern is that I'm not sure how do we simply and
surely generate XLOG_HEAP2_PRUNE.

Based on above, I did not add the test case for now.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866068CB6591C8AE1F9690BF5CDA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-10 Thread Hayato Kuroda (Fujitsu)
Dear Vignesh,

Thanks for reviewing! You can available new version in [1].

> 
> Few comments:
> 1)  Should we add binary upgrade check "CHECK_IS_BINARY_UPGRADE" for
> this funcion too:
> +binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
> +{
> +   Namename = PG_GETARG_NAME(0);
> +   Nameplugin = PG_GETARG_NAME(1);
> +
> +   /* Temporary slots is never handled in this function */
> +   booltwo_phase = PG_GETARG_BOOL(2);

Yeah, needed. For testing purpose I did not add, but it should have.
Added.

> 2) Generally we are specifying the slot name in this case, is slot
> name null check required:
> +Datum
> +binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
> +{
> +   Nameslot_name;
> +   XLogRecPtr  end_of_wal;
> +   LogicalDecodingContext *ctx = NULL;
> +   boolhas_record;
> +
> +   CHECK_IS_BINARY_UPGRADE;
> +
> +   /* Quick exit if the input is NULL */
> +   if (PG_ARGISNULL(0))
> +   PG_RETURN_BOOL(false);


NULL check was added. I felt that we should raise an ERROR. 

> 3) Since this is similar to pg_create_logical_replication_slot, can we
> add a comment saying any change in pg_create_logical_replication_slot
> would also need the same check to be added in
> binary_upgrade_create_logical_replication_slot:
> +/*
> + * SQL function for creating a new logical replication slot.
> + *
> + * This function is almost same as pg_create_logical_replication_slot(), but
> + * this can specify the restart_lsn.
> + */
> +Datum
> +binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
> +{
> +   Namename = PG_GETARG_NAME(0);
> +   Nameplugin = PG_GETARG_NAME(1);
> +
> +   /* Temporary slots is never handled in this function */

Added.

> 4) Any conclusion on this try catch comment, do you want to add which
> setting you want to revert in catch, if try/catch is not required we
> can remove this comment:
> +   ReplicationSlotAcquire(NameStr(*slot_name), true);
> +
> +   /* XXX: Is PG_TRY/CATCH needed around here? */
> +
> +   /*
> +* We use silent mode here to decode all changes without
> outputting them,
> +* allowing us to detect all the records that could be sent 
> downstream.
> +*/

After considering more, it's OK to raise an ERROR because caller can detect it.
Also, there are any setting to be reverted. The comment is removed.

> 5) I felt these 2 comments can be combined as both are trying to say
> the same thing:
> + * This is a special purpose function to ensure that there are no WAL records
> + * pending to be decoded after the given LSN.
> + *
> + * It is used to ensure that there is no pending WAL to be consumed for
> + * the logical slots.

Later part was removed.

> 6) I feel this memset is not required as we are initializing at the
> beginning of function, if you want to keep the memset, the
> initialization can be removed:
> +   values[2] = CStringGetTextDatum(xlogfilename);
> +
> +   memset(nulls, 0, sizeof(nulls));
> +
> +   tuple = heap_form_tuple(tupdesc, values, nulls);

The initialization was removed to follow pg_create_logical_replication_slot.

> 7) looks like a typo, "mu" should be "must":
> +   /*
> +* Also, we mu execute pg_log_standby_snapshot() when logical
> replication
> +* slots are migrated. Because RUNNING_XACTS record is
> required to create
> +* a consistent snapshot.
> +*/
> +   if (count_old_cluster_logical_slots())
> +   create_consistent_snapshot();

Fixed.

> 8) consitent should be consistent:
> +/*
> + * Log the details of the current snapshot to the WAL, allowing the snapshot
> + * state to be reconstructed for logical decoding on the upgraded slots.
> + */
> +static void
> +create_consistent_snapshot(void)
> +{
> +   DbInfo *old_db = _cluster.dbarr.dbs[0];
> +   PGconn *conn;
> +
> +   prep_status("Creating a consitent snapshot on new cluster");

Fixed.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866068CB6591C8AE1F9690BF5CDA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-10 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thank you for reviewing! PSA new version.

> > Internally, I added a new decoding mode - DECODING_MODE_SILENT - and
> used it.
> > If the decoding context is in the mode, the output plugin is not loaded, but
> > any WALs are decoded without skipping.
> >
> 
> I think it may be okay not to load the output plugin as we are not
> going to process any record in this case but is that the only reason
> or you have something else in mind as well?

My main concern was for skipping to set output plugin options. Even if the
pgoutput plugin, some options like protocol_version, publications, etc are
required while loading a plugin. We cannot predict requirements for external
plugins. Based on that I thought output plugins should not be loaded during the
decode.

> > Also, a new flag "did_process" is also
> > added. This flag is set if wrappers for output plugin callbacks are called 
> > during
> > the silent mode.
>
> Isn't it sufficient to add a test for silent mode in
> begin/stream_start/begin_prepare kind of APIs and set
> ctx->did_process? In all other APIs, we can assert that did_process
> shouldn't be set and we never reach there when decoding mode is
> silent.
>
> 
> + /* Check whether the meaningful change was found */
> + found = (ctx->reorder->by_txn_last_xid != InvalidTransactionId ||
> + ctx->did_process);
> 
> Are you talking about this check in the patch? If so, can you please
> explain when does the first check help?

I changed around here so I describe once again.

A flag (output_skipped) is set when the transaction is decoded till the end in
silent mode. It is done in DecodeTXNNeedSkip() because the function is the 
common
path for both committed/aborted transactions. Also, DecodeTXNNeedSkip() returns
true when the decoding context is in the silent mode. Therefore, any cb_wrapper
functions would not be called anymore. DecodingContextHasdecodedItems() just
returns output_skipped.

This approach needs to read WALs till end of transactions before returning the
upgrading function, but codes look simpler than the previous version.

> >
> > Based on that, I added another binary function
> binary_upgrade_create_logical_replication_slot().
> > This function is similar to pg_create_logical_replication_slot(), but the
> > restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
> > filename is returned and it is passed to pg_resetwal command.
> >
> 
> I am not sure if it is a good idea that a
> binary_upgrade_create_logical_replication_slot() API does the logfile
> name calculation.
> 
> > One consideration is that pg_log_standby_snapshot() must be executed before
> > slots consuming changes. New cluster does not have RUNNING_XACTS records
> so that
> > decoding context on new cluster cannot be create a consistent snapshot 
> > as-is.
> > This may lead to discard changes during the upcoming consuming event. To
> > prevent it the function is called after the final pg_resetwal.
> >
> > How do you think?
> >
> 
> + /*
> + * Also, we mu execute pg_log_standby_snapshot() when logical replication
> + * slots are migrated. Because RUNNING_XACTS record is required to create
> + * a consistent snapshot.
> + */
> + if (count_old_cluster_logical_slots())
> + create_consistent_snapshot();
> 
> We shouldn't do this separately. Instead
> binary_upgrade_create_logical_replication_slot() should ensure that
> corresponding WAL is reserved similar to what we do in
> ReplicationSlotReserveWal() and then similarly invoke
> LogStandbySnapshot() to ensure that we have enough information to
> start.

I did not handle these parts because they needed more analysis. Let's discuss
in later versions.

> 
> Few minor comments:
> ==
> 1. The commit message and other comments like atop
> get_old_cluster_logical_slot_infos() needs to be adjusted as per
> recent changes.

I revisited comments and updated.

> 2.
> @@ -1268,7 +1346,11 @@ stream_start_cb_wrapper(ReorderBuffer *cache,
> ReorderBufferTXN *txn,
>   LogicalErrorCallbackState state;
>   ErrorContextCallback errcallback;
> 
> - Assert(!ctx->fast_forward);
> + /*
> + * In silent mode all the two-phase callbacks are not set so that the
> + * wrapper should not be called.
> + */
> + Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
> 
> This and other similar comments doesn't seems to be consistent as the
> function name and comments are not matching.

Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


v47-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v47-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-10 Thread Amit Kapila
On Sat, Oct 7, 2023 at 3:46 AM Amit Kapila  wrote:
>
> On Fri, Oct 6, 2023 at 6:30 PM Hayato Kuroda (Fujitsu)
> >
> > Based on that, I added another binary function 
> > binary_upgrade_create_logical_replication_slot().
> > This function is similar to pg_create_logical_replication_slot(), but the
> > restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
> > filename is returned and it is passed to pg_resetwal command.
> >
>
> I am not sure if it is a good idea that a
> binary_upgrade_create_logical_replication_slot() API does the logfile
> name calculation.
>

The other problem is that pg_resetwal removes all pre-existing WAL
files which in this case could lead to the removal of the WAL file
corresponding to restart_lsn. This is because at least the shutdown
checkpoint record will be written after the creation of slots which
could be in the new file used for restart_lsn. Then when we invoke
pg_resetwal, it can remove that file.

One idea to deal with this could be to do the reset WAL stuff
(FindEndOfXLOG(), KillExistingXLOG(), KillExistingArchiveStatus(),
WriteEmptyXLOG()) in a separate function (say in pg_upgrade) and then
create slots. If we do this, then we additionally need an option in
pg_resetwal which skips resetting the WAL as that would have been done
before creating the slots.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-06 Thread Amit Kapila
On Thu, Oct 5, 2023 at 6:43 PM Bharath Rupireddy
 wrote:
>
> On Thu, Oct 5, 2023 at 1:48 AM Amit Kapila  wrote:
> >
>
> > The other potential problem Andres pointed out is that during shutdown
> > if due to some reason, the walreceiver goes down, we won't be able to
> > send the required WAL and users won't be able to ensure that because
> > even after restart the same situation can happen. The ideal way is to
> > have something that puts the system in READ ONLY state during shutdown
> > and then we can probably allow walreceivers to reconnect and receive
> > the required WALs. As we don't have such functionality available and
> > it won't be easy to achieve the same, we can leave this for now.
> >
> > Thoughts?
>
> You mean walreceiver for streaming replication? Or the apply workers
> going down for logical replication?
>

Apply workers.

>
> If there's yet-to-be-sent-out WAL,
> pg_upgrade will fail no? How does the above scenario a problem for
> pg_upgrade of a cluster with just logical replication slots?
>

Even, if there is a WAL yet to be sent, the walsender will simply exit
as it will receive PqMsg_Terminate ('X') from standby. See
ProcessRepliesIfAny(). After that shutdown checkpoint will finish. So,
in this case upgrade can fail due to slots. But, I think the server
should be able to succeed in consecutive runs. Does this make sense?

-- 
With Regards,
Amit Kapila.




  1   2   3   4   5   >