Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Jul-30, Tom Lane wrote: > BTW, quite aside from stability, is it really necessary for this test to > be so freakin' slow? florican for instance reports > > [12:54:07] t/033_replay_tsp_drops.pl ok 117840 ms ( 0.01 usr > 0.00 sys + 8.72 cusr 5.41 csys = 14.14 CPU) > > 027 is so bloated because it runs the core regression tests YA time, > which I'm not very happy about either; but that's no excuse for > every new test to contribute an additional couple of minutes. Definitely not intended. It looks like the reason is just that the DROP DATABASE/TABLESPACE commands are super slow, and this test does a lot of that. I added some instrumentation and the largest fraction of time goes to execute this CREATE DATABASE dropme_db1 WITH TABLESPACE dropme_ts1; CREATE TABLE t (a int) TABLESPACE dropme_ts2; CREATE DATABASE dropme_db2 WITH TABLESPACE dropme_ts2; CREATE DATABASE moveme_db TABLESPACE source_ts; ALTER DATABASE moveme_db SET TABLESPACE target_ts; CREATE DATABASE newdb TEMPLATE template_db; ALTER DATABASE template_db IS_TEMPLATE = false; DROP DATABASE dropme_db1; DROP TABLE t; DROP DATABASE dropme_db2; DROP TABLESPACE dropme_ts2; DROP TABLESPACE source_ts; DROP DATABASE template_db; Maybe this is overkill and we can reduce the test without damaging the coverage. I'll have a look during the weekend. I'll repair the reliability problem too, separately. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "This is a foot just waiting to be shot"(Andrew Dunstan)
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Sun, Jul 31, 2022 at 11:17 AM Tom Lane wrote: > Thomas Munro writes: > > I noticed this is a 32 bit FBSD system. Is it running on UFS, perhaps > > on slow storage? Are soft updates enabled (visible as options in > > output of "mount")? > > It's an ancient (2006) mac mini with 5400RPM spinning rust. > "mount" says > > /dev/ada0s2a on / (ufs, local, soft-updates, journaled soft-updates) > devfs on /dev (devfs) I don't have all the details and I may be way off here but I have the impression that when you create and then unlink trees of files quickly, sometimes soft-updates are flushed synchronously, which turns into many 5400 RPM seeks; dtrace could be used to check, but some clues in your numbers would be some kind of correlation between time and number of clusters that are set up and torn down by each test. Without soft-updates, it'd be much worse, because then many more things become synchronous I/O. Even with write caching enabled, soft-updates flush the drive cache when there's a barrier needed for crash safety. It may also be that there is something strange about Apple hardware that makes it extra slow at full-cache-flush operations (cf unexplainable excess slowness of F_FULLFSYNC under macOS including old spinning rust systems and current flash systems, and complaints about this general area on current Apple hardware from the Asahi Linux/M1 port people, though how relevant that is to 2006 spinning rust I dunno). It would be nice to look into how to tune, fix or work around all of that, as it also affects CI which has a IO limits (though admittedly a couple of orders of mag higher IOPS than 5400 RPM).
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Sun, Jul 31, 2022 at 3:46 PM Thomas Munro wrote: > On Sun, Jul 31, 2022 at 2:37 AM Tom Lane wrote: > > Alvaro Herrera writes: > > > WFM, pushed that way. > > > > Looks like conchuela is still intermittently unhappy. > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2022-07-30%2004%3A57%3A51 > > And here's one from CI that failed on Linux (this was a cfbot run with > an unrelated patch, parent commit b998196 so a few commits after "Fix > test instability"): > > https://cirrus-ci.com/task/5282155000496128 > > https://api.cirrus-ci.com/v1/artifact/task/5282155000496128/log/src/test/recovery/tmp_check/log/033_replay_tsp_drops_primary1_WAL_LOG.log > > It looks like this sequence is racy and we need to wait for more than > just "connection is made" before dropping the slot? > > $node_standby->start; > > # Make sure connection is made > $node_primary->poll_query_until('postgres', > 'SELECT count(*) = 1 FROM pg_stat_replication'); > $node_primary->safe_psql('postgres', "SELECT > pg_drop_replication_slot('slot')"); > > Why not set the replication slot name so that the standby uses it > "properly", like in other tests? Or to keep doing it this way, does that pg_stat_replication query need a WHERE clause looking at the state?
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Hi, On 2022-07-30 10:37:55 -0400, Tom Lane wrote: > Alvaro Herrera writes: > > WFM, pushed that way. > > Looks like conchuela is still intermittently unhappy. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2022-07-30%2004%3A57%3A51 CI as well: https://cirrus-ci.com/task/5295464063959040?logs=test_world#L2671 https://cirrus-ci.com/task/5042590885085184?logs=test_world#L2664 Greetings, Andres Freund
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Sun, Jul 31, 2022 at 2:37 AM Tom Lane wrote: > Alvaro Herrera writes: > > WFM, pushed that way. > > Looks like conchuela is still intermittently unhappy. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2022-07-30%2004%3A57%3A51 And here's one from CI that failed on Linux (this was a cfbot run with an unrelated patch, parent commit b998196 so a few commits after "Fix test instability"): https://cirrus-ci.com/task/5282155000496128 https://api.cirrus-ci.com/v1/artifact/task/5282155000496128/log/src/test/recovery/tmp_check/log/033_replay_tsp_drops_primary1_WAL_LOG.log It looks like this sequence is racy and we need to wait for more than just "connection is made" before dropping the slot? $node_standby->start; # Make sure connection is made $node_primary->poll_query_until('postgres', 'SELECT count(*) = 1 FROM pg_stat_replication'); $node_primary->safe_psql('postgres', "SELECT pg_drop_replication_slot('slot')"); Why not set the replication slot name so that the standby uses it "properly", like in other tests?
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Thomas Munro writes: > I noticed this is a 32 bit FBSD system. Is it running on UFS, perhaps > on slow storage? Are soft updates enabled (visible as options in > output of "mount")? It's an ancient (2006) mac mini with 5400RPM spinning rust. "mount" says /dev/ada0s2a on / (ufs, local, soft-updates, journaled soft-updates) devfs on /dev (devfs) regards, tom lane
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Sun, Jul 31, 2022 at 4:51 AM Tom Lane wrote: > BTW, quite aside from stability, is it really necessary for this test to > be so freakin' slow? florican for instance reports > > [12:43:38] t/025_stuck_on_old_timeline.pl ... ok49010 ms ( 0.00 usr > 0.00 sys + 3.64 cusr 2.49 csys = 6.13 CPU) > [12:44:12] t/026_overwrite_contrecord.pl ok34751 ms ( 0.01 usr > 0.00 sys + 3.14 cusr 1.76 csys = 4.91 CPU) > [12:49:00] t/027_stream_regress.pl .. ok 287278 ms ( 0.00 usr > 0.00 sys + 9.66 cusr 6.95 csys = 16.60 CPU) > [12:50:04] t/028_pitr_timelines.pl .. ok64543 ms ( 0.00 usr > 0.00 sys + 3.59 cusr 3.20 csys = 6.78 CPU) > [12:50:17] t/029_stats_restart.pl ... ok12505 ms ( 0.02 usr > 0.00 sys + 3.16 cusr 1.40 csys = 4.57 CPU) > [12:50:51] t/030_stats_cleanup_replica.pl ... ok33933 ms ( 0.01 usr > 0.01 sys + 3.55 cusr 2.46 csys = 6.03 CPU) > [12:51:25] t/031_recovery_conflict.pl ... ok34249 ms ( 0.00 usr > 0.00 sys + 3.37 cusr 2.20 csys = 5.57 CPU) > [12:52:09] t/032_relfilenode_reuse.pl ... ok44274 ms ( 0.01 usr > 0.00 sys + 3.21 cusr 2.05 csys = 5.27 CPU) > [12:54:07] t/033_replay_tsp_drops.pl ok 117840 ms ( 0.01 usr > 0.00 sys + 8.72 cusr 5.41 csys = 14.14 CPU) > > 027 is so bloated because it runs the core regression tests YA time, > which I'm not very happy about either; but that's no excuse for > every new test to contribute an additional couple of minutes. Complaints about 027 noted, I'm thinking about what we could do about that. As for 033, I worried that it might be the new ProcSignalBarrier stuff around tablespaces, but thankfully the DEBUG logging I added there recently shows those all completing in single digit milliseconds. I also confirmed there are no unexpected fsync'd being produced here. That is quite a lot of CPU, but it's a huge amount of total runtime. It runs in 5-8 seconds on various modern systems, 19 seconds on my Linux RPi4, and 50 seconds on my Celeron-powered NAS box with spinning disks. I noticed this is a 32 bit FBSD system. Is it running on UFS, perhaps on slow storage? Are soft updates enabled (visible as options in output of "mount")? Without soft updates, a lot more file system ops perform synchronous I/O, which really slows down our tests. In general, UFS isn't as good as modern file systems at avoiding I/O for short-lived files, and we set up and tear down a lot of them in our testing. Another thing that makes a difference is to use a filesystem with 8KB block size. This has been a subject of investigation for speeding up CI (see src/tools/ci/gcp_freebsd_repartition.sh), but several mysteries remain unsolved...
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
I wrote: > Looks like conchuela is still intermittently unhappy. BTW, quite aside from stability, is it really necessary for this test to be so freakin' slow? florican for instance reports [12:43:38] t/025_stuck_on_old_timeline.pl ... ok49010 ms ( 0.00 usr 0.00 sys + 3.64 cusr 2.49 csys = 6.13 CPU) [12:44:12] t/026_overwrite_contrecord.pl ok34751 ms ( 0.01 usr 0.00 sys + 3.14 cusr 1.76 csys = 4.91 CPU) [12:49:00] t/027_stream_regress.pl .. ok 287278 ms ( 0.00 usr 0.00 sys + 9.66 cusr 6.95 csys = 16.60 CPU) [12:50:04] t/028_pitr_timelines.pl .. ok64543 ms ( 0.00 usr 0.00 sys + 3.59 cusr 3.20 csys = 6.78 CPU) [12:50:17] t/029_stats_restart.pl ... ok12505 ms ( 0.02 usr 0.00 sys + 3.16 cusr 1.40 csys = 4.57 CPU) [12:50:51] t/030_stats_cleanup_replica.pl ... ok33933 ms ( 0.01 usr 0.01 sys + 3.55 cusr 2.46 csys = 6.03 CPU) [12:51:25] t/031_recovery_conflict.pl ... ok34249 ms ( 0.00 usr 0.00 sys + 3.37 cusr 2.20 csys = 5.57 CPU) [12:52:09] t/032_relfilenode_reuse.pl ... ok44274 ms ( 0.01 usr 0.00 sys + 3.21 cusr 2.05 csys = 5.27 CPU) [12:54:07] t/033_replay_tsp_drops.pl ok 117840 ms ( 0.01 usr 0.00 sys + 8.72 cusr 5.41 csys = 14.14 CPU) 027 is so bloated because it runs the core regression tests YA time, which I'm not very happy about either; but that's no excuse for every new test to contribute an additional couple of minutes. regards, tom lane
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Alvaro Herrera writes: > WFM, pushed that way. Looks like conchuela is still intermittently unhappy. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2022-07-30%2004%3A57%3A51 regards, tom lane
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Jul-29, Kyotaro Horiguchi wrote: > At Fri, 29 Jul 2022 11:27:01 +1200, Thomas Munro > wrote in > > Maybe it just needs a replication slot? I see: > > > > ERROR: requested WAL segment 00010003 has already been > > removed > > Agreed, I see the same. The same failure can be surely reproducible > by inserting wal-switch+checkpoint after taking backup [1]. And it is > fixed by the attached. WFM, pushed that way. I added a slot drop after the pg_stat_replication count check to be a little less intrusive. Thanks Matthias for reporting. (Note that the Cirrus page has a download link for the complete logs as artifacts). -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "I'm always right, but sometimes I'm more right than other times." (Linus Torvalds)
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Fri, 29 Jul 2022 11:27:01 +1200, Thomas Munro wrote in > Maybe it just needs a replication slot? I see: > > ERROR: requested WAL segment 00010003 has already been > removed Agreed, I see the same. The same failure can be surely reproducible by inserting wal-switch+checkpoint after taking backup [1]. And it is fixed by the attached. regards. -- Kyotaro Horiguchi NTT Open Source Software Center [1]: --- a/src/test/recovery/t/033_replay_tsp_drops.pl +++ b/src/test/recovery/t/033_replay_tsp_drops.pl @@ -30,6 +30,13 @@ sub test_tablespace my $backup_name = 'my_backup'; $node_primary->backup($backup_name); + $node_primary->psql( + 'postgres', + qq[ + CREATE TABLE t(); DROP TABLE t; SELECT pg_switch_wal(); + CHECKPOINT; + ]); + my $node_standby = PostgreSQL::Test::Cluster->new("standby2_$strategy"); $node_standby->init_from_backup($node_primary, $backup_name, has_streaming => 1); diff --git a/src/test/recovery/t/033_replay_tsp_drops.pl b/src/test/recovery/t/033_replay_tsp_drops.pl index 9b74cb09ac..0756ca6c87 100644 --- a/src/test/recovery/t/033_replay_tsp_drops.pl +++ b/src/test/recovery/t/033_replay_tsp_drops.pl @@ -20,6 +20,7 @@ sub test_tablespace $node_primary->psql( 'postgres', qq[ + SELECT pg_create_physical_replication_slot('slot1', true); SET allow_in_place_tablespaces=on; CREATE TABLESPACE dropme_ts1 LOCATION ''; CREATE TABLESPACE dropme_ts2 LOCATION '';
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Fri, Jul 29, 2022 at 9:57 AM Tom Lane wrote: > Matthias van de Meent writes: > > I'd like to bring to your attention that the test that was introduced > > with 9e4f914b seem to be flaky in FreeBSD 13 in the CFBot builds: it > > sometimes times out while waiting for the secondary to catch up. Or, > > at least I think it does, and I'm not too familiar with TAP failure > > outputs: it returns with error code 29 and logs that I'd expect when > > the timeout is reached. > > It's also failing in the buildfarm, eg > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2022-07-28%2020%3A57%3A50 > > Looks like only conchuela so far, reinforcing the idea that we're > only seeing it on FreeBSD. I'd tentatively bet on a timing problem > that requires some FreeBSD scheduling quirk to manifest; we've seen > such quirks before. Maybe it just needs a replication slot? I see: ERROR: requested WAL segment 00010003 has already been removed
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Matthias van de Meent writes: > I'd like to bring to your attention that the test that was introduced > with 9e4f914b seem to be flaky in FreeBSD 13 in the CFBot builds: it > sometimes times out while waiting for the secondary to catch up. Or, > at least I think it does, and I'm not too familiar with TAP failure > outputs: it returns with error code 29 and logs that I'd expect when > the timeout is reached. It's also failing in the buildfarm, eg https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2022-07-28%2020%3A57%3A50 Looks like only conchuela so far, reinforcing the idea that we're only seeing it on FreeBSD. I'd tentatively bet on a timing problem that requires some FreeBSD scheduling quirk to manifest; we've seen such quirks before. regards, tom lane
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Wed, 27 Jul 2022 at 20:55, Alvaro Herrera wrote: > > Okay, I think I'm done with this. Here's v27 for the master branch, > where I fixed some comments as well as thinkos in the test script. > The ones on older branches aren't materially different, they just have > tonnes of conflicts resolved. I'll get this pushed tomorrow morning. > > I have run it through CI and it seems ... not completely broken, at > least, but I have no working recipes for Windows on branches 14 and > older, so it doesn't really work fully. If anybody does, please share. > You can see mine here > https://github.com/alvherre/postgres/commits/REL_11_STABLE [etc] > https://cirrus-ci.com/build/5320904228995072 > https://cirrus-ci.com/github/alvherre/postgres I'd like to bring to your attention that the test that was introduced with 9e4f914b seem to be flaky in FreeBSD 13 in the CFBot builds: it sometimes times out while waiting for the secondary to catch up. Or, at least I think it does, and I'm not too familiar with TAP failure outputs: it returns with error code 29 and logs that I'd expect when the timeout is reached. See bottom for examples (all 3 builds for different patches). Kind regards, Matthias van de Meent. [1] https://cirrus-ci.com/task/4960990331666432?logs=test_world#L2631-L2662 [2] https://cirrus-ci.com/task/5012678384025600?logs=test_world#L2631-L2662 [3] https://cirrus-ci.com/task/5147001137397760?logs=test_world#L2631-L2662
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Okay, I think I'm done with this. Here's v27 for the master branch, where I fixed some comments as well as thinkos in the test script. The ones on older branches aren't materially different, they just have tonnes of conflicts resolved. I'll get this pushed tomorrow morning. I have run it through CI and it seems ... not completely broken, at least, but I have no working recipes for Windows on branches 14 and older, so it doesn't really work fully. If anybody does, please share. You can see mine here https://github.com/alvherre/postgres/commits/REL_11_STABLE [etc] https://cirrus-ci.com/build/5320904228995072 https://cirrus-ci.com/github/alvherre/postgres -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Every machine is a smoke machine if you operate it wrong enough." https://twitter.com/libseybieda/status/1541673325781196801 >From b84f66975d664c45babd878a43c67601b7f7d2b6 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 27 Jul 2022 20:22:21 +0200 Subject: [PATCH v27] Fix replay of create database records on standby MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Crash recovery on standby may encounter missing directories when replaying database-creation WAL records. Prior to this patch, the standby would fail to recover in such a case; however, the directories could be legitimately missing. Consider the following sequence of commands: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, crash recovery must be able to continue. A fix for this problem was already attempted in 49d9cfc68bf4, but it was reverted because of design issues. This new version is based on Robert Haas' proposal: any missing tablespaces are created during recovery before reaching consistency. Tablespaces are created as real directories, and should be deleted by later replay. CheckRecoveryConsistency ensures they have disappeared. The problems detected by this new code are reported as PANIC, except when allow_in_place_tablespaces is set to ON, in which case they are WARNING. Apart from making tests possible, this gives users an escape hatch in case things don't go as planned. Author: Kyotaro Horiguchi Author: Asim R Praveen Author: Paul Guo Reviewed-by: Anastasia Lubennikova (older versions) Reviewed-by: Fujii Masao (older versions) Reviewed-by: Michaël Paquier Diagnosed-by: Paul Guo Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27dn5xwj2p9-roh16e4...@mail.gmail.com --- src/backend/access/transam/xlogrecovery.c | 50 ++ src/backend/commands/dbcommands.c | 77 + src/backend/commands/tablespace.c | 40 ++--- src/test/recovery/t/033_replay_tsp_drops.pl | 169 4 files changed, 305 insertions(+), 31 deletions(-) create mode 100644 src/test/recovery/t/033_replay_tsp_drops.pl diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index e383c2123a..27e02fbfcd 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -42,6 +42,7 @@ #include "access/xlogutils.h" #include "catalog/pg_control.h" #include "commands/tablespace.h" +#include "common/file_utils.h" #include "miscadmin.h" #include "pgstat.h" #include "postmaster/bgwriter.h" @@ -2008,6 +2009,47 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI) } } +/* + * Verify that, in non-test mode, ./pg_tblspc doesn't contain any real + * directories. + * + * Replay of database creation XLOG records for databases that were later + * dropped can create fake directories in pg_tblspc. By the time consistency + * is reached these directories should have been removed; here we verify + * that this did indeed happen. This is to be called at the point where + * consistent state is reached. + * + * allow_in_place_tablespaces turns the PANIC into a WARNING, which is + * useful for testing purposes, and also allows for an escape hatch in case + * things go south. + */ +static void +CheckTablespaceDirectory(void) +{ + DIR *dir; + struct dirent *de; + + dir = AllocateDir("pg_tblspc"); + while ((de = ReadDir(dir, "pg_tblspc")) != NULL) + { + char path[MAXPGPATH + 10]; + + /* Skip entries of non-oid names */ + if (strspn(de->d_name, "0123456789") != strlen(de->d_name)) + continue; + + snprintf(path, sizeof(path), "pg_tblspc/%s", de->d_name); + + if (get_dirent_type(path, de, false, ERROR) != PGFILETYPE_LNK) + ereport(allow_in_place_tablespaces ? WARNING : PANIC, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("unexpected directory entry \"%s\" found in %s", + de->d_name, "pg_tblspc/"), + errdetail("All directory entries in pg_tblspc/ should be symbolic links."), + errhint("Remove those directories, or set allow_in_place_tablespace
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Jul-22, Kyotaro Horiguchi wrote: > At Fri, 22 Jul 2022 10:18:58 +0200, Alvaro Herrera > wrote in > > Which commits would we consider? > > > > 7170f2159fb2Allow "in place" tablespaces. > > f6f0db4d6240Fix pg_tablespace_location() with in-place tablespaces > > The second one is just to make the function work with in-place > tablespaces. Without it the function yeilds the following error. > > > ERROR: could not read symbolic link "pg_tblspc/16407": Invalid argument > > This looks actually odd but I think no need of back-patching because > there's no actual user of the feature is not seen in our test suite. > If we have a test that needs the feature in future, it would be enough > to back-patch it then. Actually, I found that the new test added by the fix in this thread does depend on this being fixed, so I included an even larger set, which I think makes this more complete: 7170f2159fb2 Allow "in place" tablespaces. c6f2f01611d4 Fix pg_basebackup with in-place tablespaces. f6f0db4d6240 Fix pg_tablespace_location() with in-place tablespaces 7a7cd84893e0 doc: Remove mention to in-place tablespaces for pg_tablespace_location() 5344723755bd Remove unnecessary Windows-specific basebackup code. I didn't include any of the test changes for now. I don't intend to do so, unless we see another reason for that; I think the new tests that are going to be added by the recovery bugfix should be sufficient coverage. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La fuerza no está en los medios físicos sino que reside en una voluntad indomable" (Gandhi)
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Fri, Jul 22, 2022 at 8:19 PM Alvaro Herrera wrote: > On 2022-Jul-22, Kyotaro Horiguchi wrote: > > At Thu, 21 Jul 2022 23:14:57 +1200, Thomas Munro > > wrote in > > > Would it help if we back-patched the allow_in_place_tablespaces stuff? > > > I'm not sure how hard/destabilising that would be, but I could take a > > > look tomorrow. > > > > +1. Addiotional reason for me is it is a developer option. > > OK, I'll wait for allow_in_place_tablespaces to be backpatched then. > > I would like to get this fix pushed before the next set of minors, so if > you won't have time for the backpatches early enough, maybe I can work > on getting it done. > > Which commits would we consider? I wonder how crazy it would be to back-patch src/test/recovery/t/027_stream_regress.pl too.
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Fri, 22 Jul 2022 10:18:58 +0200, Alvaro Herrera wrote in > OK, I'll wait for allow_in_place_tablespaces to be backpatched then. > > I would like to get this fix pushed before the next set of minors, so if > you won't have time for the backpatches early enough, maybe I can work > on getting it done. > > Which commits would we consider? > > 7170f2159fb2 Allow "in place" tablespaces. > f6f0db4d6240Fix pg_tablespace_location() with in-place tablespaces The second one is just to make the function work with in-place tablespaces. Without it the function yeilds the following error. > ERROR: could not read symbolic link "pg_tblspc/16407": Invalid argument This looks actually odd but I think no need of back-patching because there's no actual user of the feature is not seen in our test suite. If we have a test that needs the feature in future, it would be enough to back-patch it then. So I think only the first one is needed for now. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Jul-22, Kyotaro Horiguchi wrote: > At Thu, 21 Jul 2022 23:14:57 +1200, Thomas Munro > wrote in > > Would it help if we back-patched the allow_in_place_tablespaces stuff? > > I'm not sure how hard/destabilising that would be, but I could take a > > look tomorrow. > > +1. Addiotional reason for me is it is a developer option. OK, I'll wait for allow_in_place_tablespaces to be backpatched then. I would like to get this fix pushed before the next set of minors, so if you won't have time for the backpatches early enough, maybe I can work on getting it done. Which commits would we consider? 7170f2159fb2Allow "in place" tablespaces. f6f0db4d6240Fix pg_tablespace_location() with in-place tablespaces -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Most hackers will be perfectly comfortable conceptualizing users as entropy sources, so let's move on." (Nathaniel Smith)
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Thu, 21 Jul 2022 13:25:05 +0200, Alvaro Herrera wrote in > On 2022-Jul-21, Alvaro Herrera wrote: > > > Yeah, I think that would reduce cruft. I'm not sure this is more > > against backpatching policy or less, compared to adding a separate > > GUC just for this bugfix. > > cruft: > > { > {"allow_recovery_tablespaces", PG_POSTMASTER, WAL_RECOVERY, > gettext_noop("Continues recovery after finding invalid database > directories."), > gettext_noop("It is possible for tablespace drop to interfere > with database creation " > "so that WAL replay is forced to create fake > database directories. " > "These should have been dropped by the time recovery > ends; " > "but in case they aren't, this option lets recovery > continue if they " > "are present. Note that these directories must be > removed manually afterwards."), > GUC_NOT_IN_SAMPLE > }, > &allow_recovery_tablespaces, > false, > NULL, NULL, NULL > }, > > This is not a very good explanation, but I don't know how to make it > better. It looks a bit too detailed. I crafted the following.. Recovery can create tentative in-place tablespace directories under pg_tblspc/. They are assumed to be removed until reaching recovery consistency, but otherwise PostgreSQL raises a PANIC-level error, aborting the recovery. Setting allow_recovery_tablespaces to true causes the system to allow such directories during normal operation. In case those directories are left after reaching consistency, that implies data loss and metadata inconsistency and may cause failure of future tablespace creation. Though, after writing this, I became to think that piggy-backing on allow_in_place_tablespaces might be a bit different.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Thu, 21 Jul 2022 23:14:57 +1200, Thomas Munro wrote in > On Thu, Jul 21, 2022 at 11:01 PM Alvaro Herrera > wrote: > > On 2022-Jul-20, Alvaro Herrera wrote: > > > I see the following alternatives: > > > > > > 1. not backpatch this fix to 14 and older > > > 2. use a different GUC; either allow_invalid_pages as previously > > >suggested, or create a new one just for this purpose > > > 3. not provide any overriding mechanism in versions 14 and older > > > > I've got no opinions on this. I don't like either 1 or 3, so I'm going > > to add and backpatch a new GUC allow_recovery_tablespaces as the > > override mechanism. > > > > If others disagree with this choice, please speak up. > > Would it help if we back-patched the allow_in_place_tablespaces stuff? > I'm not sure how hard/destabilising that would be, but I could take a > look tomorrow. +1. Addiotional reason for me is it is a developer option. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Jul-21, Alvaro Herrera wrote: > Yeah, I think that would reduce cruft. I'm not sure this is more > against backpatching policy or less, compared to adding a separate > GUC just for this bugfix. cruft: { {"allow_recovery_tablespaces", PG_POSTMASTER, WAL_RECOVERY, gettext_noop("Continues recovery after finding invalid database directories."), gettext_noop("It is possible for tablespace drop to interfere with database creation " "so that WAL replay is forced to create fake database directories. " "These should have been dropped by the time recovery ends; " "but in case they aren't, this option lets recovery continue if they " "are present. Note that these directories must be removed manually afterwards."), GUC_NOT_IN_SAMPLE }, &allow_recovery_tablespaces, false, NULL, NULL, NULL }, This is not a very good explanation, but I don't know how to make it better. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "I think my standards have lowered enough that now I think 'good design' is when the page doesn't irritate the living f*ck out of me." (JWZ)
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Jul-21, Thomas Munro wrote: > On Thu, Jul 21, 2022 at 11:01 PM Alvaro Herrera > wrote: > > I've got no opinions on this. I don't like either 1 or 3, so I'm going > > to add and backpatch a new GUC allow_recovery_tablespaces as the > > override mechanism. > > > > If others disagree with this choice, please speak up. > > Would it help if we back-patched the allow_in_place_tablespaces stuff? > I'm not sure how hard/destabilising that would be, but I could take a > look tomorrow. Yeah, I think that would reduce cruft. I'm not sure this is more against backpatching policy or less, compared to adding a separate GUC just for this bugfix. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "The problem with the facetime model is not just that it's demoralizing, but that the people pretending to work interrupt the ones actually working." (Paul Graham)
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Jul-21, Thomas Munro wrote: > On Wed, Jul 20, 2022 at 10:51 PM Alvaro Herrera > wrote: > > v26 here. I spent some time fighting the readdir() stuff for > > Windows (so that get_dirent_type returns LNK for junction points) > > but couldn't make it to work and was unable to figure out why. > > Was it because of this? > > https://www.postgresql.org/message-id/CA%2BhUKGKv%2B736Pc8kSj3%3DDijDGd1eC79-uT3Vi16n7jYkcc_raw%40mail.gmail.com Oh, that sounds very likely, yeah. I didn't think of testing the FILE_ATTRIBUTE_DIRECTORY bit for junction points. I +1 pushing both of these patches to 14. Then this patch becomes a couple of lines shorter. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Before you were born your parents weren't as boring as they are now. They got that way paying your bills, cleaning up your room and listening to you tell them how idealistic you are." -- Charles J. Sykes' advice to teenagers
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Thu, Jul 21, 2022 at 11:01 PM Alvaro Herrera wrote: > On 2022-Jul-20, Alvaro Herrera wrote: > > I see the following alternatives: > > > > 1. not backpatch this fix to 14 and older > > 2. use a different GUC; either allow_invalid_pages as previously > >suggested, or create a new one just for this purpose > > 3. not provide any overriding mechanism in versions 14 and older > > I've got no opinions on this. I don't like either 1 or 3, so I'm going > to add and backpatch a new GUC allow_recovery_tablespaces as the > override mechanism. > > If others disagree with this choice, please speak up. Would it help if we back-patched the allow_in_place_tablespaces stuff? I'm not sure how hard/destabilising that would be, but I could take a look tomorrow.
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Wed, Jul 20, 2022 at 10:51 PM Alvaro Herrera wrote: > v26 here. I spent some time fighting the readdir() stuff for > Windows (so that get_dirent_type returns LNK for junction points) > but couldn't make it to work and was unable to figure out why. Was it because of this? https://www.postgresql.org/message-id/CA%2BhUKGKv%2B736Pc8kSj3%3DDijDGd1eC79-uT3Vi16n7jYkcc_raw%40mail.gmail.com
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Jul-20, Alvaro Herrera wrote: > I see the following alternatives: > > 1. not backpatch this fix to 14 and older > 2. use a different GUC; either allow_invalid_pages as previously >suggested, or create a new one just for this purpose > 3. not provide any overriding mechanism in versions 14 and older I've got no opinions on this. I don't like either 1 or 3, so I'm going to add and backpatch a new GUC allow_recovery_tablespaces as the override mechanism. If others disagree with this choice, please speak up. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Jul-20, Alvaro Herrera wrote: > On 2022-Jul-20, Alvaro Herrera wrote: > > > I also change the use of allow_invalid_pages to > > allow_in_place_tablespaces. We could add a > > separate GUC for this, but it seems overengineering. > > Oh, but allow_in_place_tablespaces doesn't exist in versions 14 and > older, so this strategy doesn't really work. ... and get_dirent_type is new in 14, so that'll be one more hurdle. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Cuando no hay humildad las personas se degradan" (A. Christie)
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Jul-20, Alvaro Herrera wrote: > I also change the use of allow_invalid_pages to > allow_in_place_tablespaces. We could add a > separate GUC for this, but it seems overengineering. Oh, but allow_in_place_tablespaces doesn't exist in versions 14 and older, so this strategy doesn't really work. I see the following alternatives: 1. not backpatch this fix to 14 and older 2. use a different GUC; either allow_invalid_pages as previously suggested, or create a new one just for this purpose 3. not provide any overriding mechanism in versions 14 and older -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Always assume the user will do much worse than the stupidest thing you can imagine."(Julien PUYDT)
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
v26 here. I spent some time fighting the readdir() stuff for Windows (so that get_dirent_type returns LNK for junction points) but couldn't make it to work and was unable to figure out why. So I ended up doing what do_pg_backup_start is already doing: an #ifdef to call pgwin32_is_junction instead. I remove the newly added path_is_symlink function, because I realized that it would mean an extra syscall everywhere other than Windows. So if somebody wants to fix get_dirent_type() so that it works properly on Windows, we can change all these places together. I also change the use of allow_invalid_pages to allow_in_place_tablespaces. We could add a separate GUC for this, but it seems overengineering. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Most hackers will be perfectly comfortable conceptualizing users as entropy sources, so let's move on." (Nathaniel Smith) >From 26a0be53592a20aa09501e9015f77a4b3c3c3302 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 13 Jul 2022 18:14:18 +0200 Subject: [PATCH v26] Fix replay of create database records on standby Crash recovery on standby may encounter missing directories when replaying create database WAL records. Prior to this patch, the standby would fail to recover in such a case. However, the directories could be legitimately missing. Consider a sequence of WAL records as follows: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, the crash recovery must be able to move on. This patch allows missing tablespaces to be created during recovery before reaching consistency. The tablespaces are created as real directories that should not exists but will be removed until reaching consistency. CheckRecoveryConsistency is responsible to make sure they have disappeared. The problems detected by this new code are reported as PANIC, except when allow_in_place_tablespaces is set to ON, in which case they are WARNING. Apart from making tests possible, this gives users an escape hatch in case things don't go as planned. Diagnosed-by: Paul Guo Author: Paul Guo Author: Kyotaro Horiguchi Author: Asim R Praveen Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27dn5xwj2p9-roh16e4...@mail.gmail.com --- src/backend/access/transam/xlogrecovery.c | 54 +++ src/backend/commands/dbcommands.c | 77 ++ src/backend/commands/tablespace.c | 28 +--- src/test/recovery/t/033_replay_tsp_drops.pl | 155 4 files changed, 287 insertions(+), 27 deletions(-) create mode 100644 src/test/recovery/t/033_replay_tsp_drops.pl diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 5d6f1b5e46..850ab6d7e6 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -42,6 +42,7 @@ #include "access/xlogutils.h" #include "catalog/pg_control.h" #include "commands/tablespace.h" +#include "common/file_utils.h" #include "miscadmin.h" #include "pgstat.h" #include "postmaster/bgwriter.h" @@ -2008,6 +2009,51 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI) } } +/* + * Verify that, in non-test mode, ./pg_tblspc doesn't contain any real + * directories. + * + * Replay of database creation XLOG records for databases that were later + * dropped can create fake directories in pg_tblspc. By the time consistency + * is reached these directories should have been removed; here we verify + * that this did indeed happen. This is to be called at the point where + * consistent state is reached. + * + * allow_in_place_tablespaces turns the PANIC into a WARNING, which is + * useful for testing purposes, and also allows for an escape hatch in case + * things go south. + */ +static void +CheckTablespaceDirectory(void) +{ + DIR *dir; + struct dirent *de; + + dir = AllocateDir("pg_tblspc"); + while ((de = ReadDir(dir, "pg_tblspc")) != NULL) + { + char path[MAXPGPATH + 10]; + + /* Skip entries of non-oid names */ + if (strspn(de->d_name, "0123456789") != strlen(de->d_name)) + continue; + + snprintf(path, sizeof(path), "pg_tblspc/%s", de->d_name); + +#ifdef WIN32 + if (!pgwin32_is_junction(path)) +#else + if (get_dirent_type(path, de, false, ERROR) != PGFILETYPE_LNK) +#endif + ereport(allow_in_place_tablespaces ? WARNING : PANIC, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("unexpected directory entry \"%s\" found in %s", + de->d_name, "pg_tblspc/"), + errdetail("All directory entries in pg_tblspc/ should be symbolic links."), + errhint("Remove those directories, or set allow_in_place_tablespaces to ON transiently to let recovery complete."))); + } +} + /* * Checks if recovery has reached a consistent state. When consistency is * reached and we have a
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Jul-15, Alvaro Herrera wrote: > However, looking closer I noticed that on Windows we use our own > readdir() implementation, which AFAICT includes everything to handle > reparse points as symlinks correctly in get_dirent_type. Which means > that do_pg_start_backup is wasting its time with the "#ifdef WIN32" bits > to handle junction points separately. We could just do this > > diff --git a/src/backend/access/transam/xlog.c > b/src/backend/access/transam/xlog.c > index b809a2152c..4966213fde 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -8302,13 +8302,8 @@ do_pg_backup_start(const char *backupidstr, bool fast, > TimeLineID *starttli_p, >* we sometimes use allow_in_place_tablespaces to create >* directories directly under pg_tblspc, which would > fail below. >*/ > -#ifdef WIN32 > - if (!pgwin32_is_junction(fullpath)) > - continue; > -#else > if (get_dirent_type(fullpath, de, false, ERROR) != > PGFILETYPE_LNK) > continue; > -#endif > > #if defined(HAVE_READLINK) || defined(WIN32) > rllen = readlink(fullpath, linkpath, sizeof(linkpath)); > > And everything should continue to work. Hmm, but it does not: https://cirrus-ci.com/build/4824963784900608 -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Jul-15, Kyotaro Horiguchi wrote: > 0002: > +is_path_tslink(const char *path) > > What the "ts" of tslink stands for? If it stands for tablespace, the > function is not specific for table spaces. We already have > > + errmsg("could not stat file \"%s\": > %m", path)); > > I'm not sure we need such correctness, but what is failing there is > lstat. I found similar codes in two places in backend and one place > in frontend. So couldn't it be moved to /common and have a more > generic name? I wondered whether it'd be better to check whether get_dirent_type returns PGFILETYPE_LNK. However, that doesn't deal with junction points at all, which seems pretty odd ... I mean, isn't it rather useful as an abstraction if it doesn't abstract away the one platform-dependent point we have in the area? However, looking closer I noticed that on Windows we use our own readdir() implementation, which AFAICT includes everything to handle reparse points as symlinks correctly in get_dirent_type. Which means that do_pg_start_backup is wasting its time with the "#ifdef WIN32" bits to handle junction points separately. We could just do this diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b809a2152c..4966213fde 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8302,13 +8302,8 @@ do_pg_backup_start(const char *backupidstr, bool fast, TimeLineID *starttli_p, * we sometimes use allow_in_place_tablespaces to create * directories directly under pg_tblspc, which would fail below. */ -#ifdef WIN32 - if (!pgwin32_is_junction(fullpath)) - continue; -#else if (get_dirent_type(fullpath, de, false, ERROR) != PGFILETYPE_LNK) continue; -#endif #if defined(HAVE_READLINK) || defined(WIN32) rllen = readlink(fullpath, linkpath, sizeof(linkpath)); And everything should continue to work. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Jul-15, Kyotaro Horiguchi wrote: > At Thu, 14 Jul 2022 23:47:40 +0200, Alvaro Herrera > wrote in > > Here's a couple of fixups. 0001 is the same as before. In 0002 I think > > Thanks! > > + if (!S_ISLNK(st.st_mode)) > + #else > + if (!pgwin32_is_junction(path)) > + #endif > + elog(ignore_invalid_pages ? WARNING : PANIC, > + "real directory found in pg_tblspc directory: > %s", de->d_name); > > A regular file with an oid-name also causes this error. Doesn't > something like "unexpected non-(sym)link entry..." work? Hmm, good point. I also wonder if we need to cater for using the term "junction point" rather than "symlink" when under Windows. > > CheckTablespaceDirectory ends up easier to read if we split out the test > > for validity of the link. Looking at that again, I think we don't need > > to piggyback on ignore_invalid_pages, which is already a stretch, so > > let's not -- instead we can use allow_in_place_tablespaces if users need > > a workaround. So that's 0003 (this bit needs more than zero docs, > > however.) > > The result of 0003 looks good. Great, will merge. > 0002: > +is_path_tslink(const char *path) > > What the "ts" of tslink stands for? If it stands for tablespace, the > function is not specific for table spaces. Oh, of course. > We already have > > + errmsg("could not stat file \"%s\": > %m", path)); > > I'm not sure we need such correctness, but what is failing there is > lstat. I'll have a look at what we use for lstat failures in other places. > I found similar codes in two places in backend and one place > in frontend. So couldn't it be moved to /common and have a more > generic name? I'll have a look at those. I had the same instinct initially ... > - dir = AllocateDir(tblspc_path); > - while ((de = ReadDir(dir, tblspc_path)) != NULL) > + dir = AllocateDir("pg_tblspc"); > + while ((de = ReadDir(dir, "pg_tblspc")) != NULL) > > xlog.c uses the macro XLOGDIR. Why don't we define TBLSPCDIR? Oh yes, let's do that. I'd even backpatch that, to avoid a future backpatching gotcha. > - for (p = de->d_name; *p && isdigit(*p); p++); > - if (*p) > + if (strspn(de->d_name, "0123456789") != strlen(de->d_name)) > continue; > > The pattern "strspn != strlen" looks kind of remote, or somewhat > pedantic.. > > + charpath[MAXPGPATH + 10]; > .. > - snprintf(path, MAXPGPATH, "%s/%s", tblspc_path, de->d_name); > + snprintf(path, sizeof(path), "pg_tblspc/%s", de->d_name); > > I don't think we need the extra 10 bytes. I forgot to mention this, but I just copied these bits from some other place that processes pg_tblspc entries. It seemed to me that the bodiless for loop was a bit too suspicious-looking. > A bit paranoic, but we can check the return value to confirm the > d_name is fully stored in the buffer. Hmm ... I don't think we need to care about that in this patch. This coding pattern is already being used in other places. If we want to change that, let's do it everywhere, and not in an unrelated backpatchable bug fix. > > After all this, I'm not sure what to think of dbase_redo. At line 3102, > > is the directory supposed to exist or not? I'm confused as to what is > > the expected state at that point. I rewrote this, but now I think my > > rewrite continues to be confusing, so I'll have to think more about it. > > I'm not sure l3102 exactly points, but haven't we chosen to create > everything required to keep recovery going, whether it is supposed to > exist or not? I mean just after the two stat() calls for the target directory. > > Another aspect are the tests. Robert described a scenario where the > > previously committed version of this patch created trouble. Do we have > > a test case to cover that problematic case? I think we should strive to > > cover it, if possible. > > I counldn't recall that clearly and failed to dig out from the thread, > but doesn't the "creating everything needed" strategy naturally save > that case? We could add that test, but it seems to me a little > cumbersome to confirm the test correctly detect that case.. Well, I *hope* it does ... but hope is no strategy, and I've frequently been on the wrong side when trusting that untested code does what I think it does. Thanks for reviewing, -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "It takes less than 2 seconds to get to 78% complete; that's a good sign. A few seconds later it's at 90%, but it seems to have stuck there. Did somebody make percentages logarithmic while I wasn't looking?" http://smylers.hates-software.com/2005/09/08/1995c749.html
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Thu, 14 Jul 2022 23:47:40 +0200, Alvaro Herrera wrote in > Here's a couple of fixups. 0001 is the same as before. In 0002 I think Thanks! + if (!S_ISLNK(st.st_mode)) + #else + if (!pgwin32_is_junction(path)) + #endif + elog(ignore_invalid_pages ? WARNING : PANIC, +"real directory found in pg_tblspc directory: %s", de->d_name); A regular file with an oid-name also causes this error. Doesn't something like "unexpected non-(sym)link entry..." work? > CheckTablespaceDirectory ends up easier to read if we split out the test > for validity of the link. Looking at that again, I think we don't need > to piggyback on ignore_invalid_pages, which is already a stretch, so > let's not -- instead we can use allow_in_place_tablespaces if users need > a workaround. So that's 0003 (this bit needs more than zero docs, > however.) The result of 0003 looks good. 0002: +is_path_tslink(const char *path) What the "ts" of tslink stands for? If it stands for tablespace, the function is not specific for table spaces. We already have + errmsg("could not stat file \"%s\": %m", path)); I'm not sure we need such correctness, but what is failing there is lstat. I found similar codes in two places in backend and one place in frontend. So couldn't it be moved to /common and have a more generic name? - dir = AllocateDir(tblspc_path); - while ((de = ReadDir(dir, tblspc_path)) != NULL) + dir = AllocateDir("pg_tblspc"); + while ((de = ReadDir(dir, "pg_tblspc")) != NULL) xlog.c uses the macro XLOGDIR. Why don't we define TBLSPCDIR? - for (p = de->d_name; *p && isdigit(*p); p++); - if (*p) + if (strspn(de->d_name, "0123456789") != strlen(de->d_name)) continue; The pattern "strspn != strlen" looks kind of remote, or somewhat pedantic.. + charpath[MAXPGPATH + 10]; .. - snprintf(path, MAXPGPATH, "%s/%s", tblspc_path, de->d_name); + snprintf(path, sizeof(path), "pg_tblspc/%s", de->d_name); I don't think we need the extra 10 bytes. A bit paranoic, but we can check the return value to confirm the d_name is fully stored in the buffer. > 0004 is straightforward: let's check for bad directories before logging > about consistent state. I was about to write a comment to do this when looking 0001. > After all this, I'm not sure what to think of dbase_redo. At line 3102, > is the directory supposed to exist or not? I'm confused as to what is > the expected state at that point. I rewrote this, but now I think my > rewrite continues to be confusing, so I'll have to think more about it. I'm not sure l3102 exactly points, but haven't we chosen to create everything required to keep recovery going, whether it is supposed to exist or not? > Another aspect are the tests. Robert described a scenario where the > previously committed version of this patch created trouble. Do we have > a test case to cover that problematic case? I think we should strive to > cover it, if possible. I counldn't recall that clearly and failed to dig out from the thread, but doesn't the "creating everything needed" strategy naturally save that case? We could add that test, but it seems to me a little cumbersome to confirm the test correctly detect that case.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Here's a couple of fixups. 0001 is the same as before. In 0002 I think CheckTablespaceDirectory ends up easier to read if we split out the test for validity of the link. Looking at that again, I think we don't need to piggyback on ignore_invalid_pages, which is already a stretch, so let's not -- instead we can use allow_in_place_tablespaces if users need a workaround. So that's 0003 (this bit needs more than zero docs, however.) 0004 is straightforward: let's check for bad directories before logging about consistent state. After all this, I'm not sure what to think of dbase_redo. At line 3102, is the directory supposed to exist or not? I'm confused as to what is the expected state at that point. I rewrote this, but now I think my rewrite continues to be confusing, so I'll have to think more about it. Another aspect are the tests. Robert described a scenario where the previously committed version of this patch created trouble. Do we have a test case to cover that problematic case? I think we should strive to cover it, if possible. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "The eagle never lost so much time, as when he submitted to learn of the crow." (William Blake) >From bdb691c2a86301e5369b3ae7f735fa5f7c29304d Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 13 Jul 2022 18:14:18 +0200 Subject: [PATCH v25 1/4] Fix replay of create database records on standby Crash recovery on standby may encounter missing directories when replaying create database WAL records. Prior to this patch, the standby would fail to recover in such a case. However, the directories could be legitimately missing. Consider a sequence of WAL records as follows: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, the crash recovery must be able to move on. This patch allows missing tablespaces to be created during recovery before reaching consistency. The tablespaces are created as real directories that should not exists but will be removed until reaching consistency. CheckRecoveryConsistency is responsible to make sure they have disappeared. Similar to log_invalid_page mechanism, the GUC ignore_invalid_pages turns into PANIC errors detected by this patch into WARNING, which allows continueing recovery. Diagnosed-by: Paul Guo Author: Paul Guo Author: Kyotaro Horiguchi Author: Asim R Praveen Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27dn5xwj2p9-roh16e4...@mail.gmail.com --- doc/src/sgml/config.sgml| 5 +- src/backend/access/transam/xlogrecovery.c | 59 src/backend/commands/dbcommands.c | 71 + src/backend/commands/tablespace.c | 28 +--- src/backend/utils/misc/guc.c| 8 +- src/include/access/xlogutils.h | 2 + src/test/recovery/t/029_replay_tsp_drops.pl | 155 7 files changed, 296 insertions(+), 32 deletions(-) create mode 100644 src/test/recovery/t/029_replay_tsp_drops.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 37fd80388c..1e1c8c1cb7 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -11363,11 +11363,12 @@ LOG: CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1) If set to off (the default), detection of -WAL records having references to invalid pages during +WAL records having references to invalid pages or +WAL records resulting in invalid directory operations during recovery causes PostgreSQL to raise a PANIC-level error, aborting the recovery. Setting ignore_invalid_pages to on -causes the system to ignore invalid page references in WAL records +causes the system to ignore invalid actions caused by such WAL records (but still report a warning), and continue the recovery. This behavior may cause crashes, data loss, propagate or hide corruption, or other serious problems. diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 5d6f1b5e46..ae81244e06 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -2008,6 +2008,57 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI) } } +/* + * Makes sure that ./pg_tblspc directory doesn't contain a real directory. + * + * This is intended to be called after reaching consistency. + * ignore_invalid_pages=on turns into the PANIC error into WARNING so that + * recovery can continue. + * + * This can't be checked in allow_in_place_tablespaces mode, so skip it in + * that case. + */ +static void +CheckTablespaceDirectory(void) +{ + char *tblspc_path = "./pg_tblspc"; + DIR *dir; + struct dirent *de; + + /* Do not check for this when
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Not a review, just a preparatory rebase across some trivially conflicting changes. I also noticed that src/test/recovery/t/031_recovery_conflict.pl, which was added two days after v23 was sent, and which uses allow_in_place_tablespaces, bails out because of the checks introduced by this patch, so I made the check routine do nothing in that case. Anyway, here's v24. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "La conclusión que podemos sacar de esos estudios es que no podemos sacar ninguna conclusión de ellos" (Tanenbaum) >From de2c77f5e2c31c911674377a51359ba8fe662c96 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 13 Jul 2022 18:14:18 +0200 Subject: [PATCH v24] Fix replay of create database records on standby Crash recovery on standby may encounter missing directories when replaying create database WAL records. Prior to this patch, the standby would fail to recover in such a case. However, the directories could be legitimately missing. Consider a sequence of WAL records as follows: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, the crash recovery must be able to move on. This patch allows missing tablespaces to be created during recovery before reaching consistency. The tablespaces are created as real directories that should not exists but will be removed until reaching consistency. CheckRecoveryConsistency is responsible to make sure they have disappeared. Similar to log_invalid_page mechanism, the GUC ignore_invalid_pages turns into PANIC errors detected by this patch into WARNING, which allows continueing recovery. Diagnosed-by: Paul Guo Author: Paul Guo Author: Kyotaro Horiguchi Author: Asim R Praveen Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27dn5xwj2p9-roh16e4...@mail.gmail.com --- doc/src/sgml/config.sgml| 5 +- src/backend/access/transam/xlogrecovery.c | 59 src/backend/commands/dbcommands.c | 71 + src/backend/commands/tablespace.c | 28 +--- src/backend/utils/misc/guc.c| 8 +- src/include/access/xlogutils.h | 2 + src/test/recovery/t/029_replay_tsp_drops.pl | 155 7 files changed, 296 insertions(+), 32 deletions(-) create mode 100644 src/test/recovery/t/029_replay_tsp_drops.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 37fd80388c..1e1c8c1cb7 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -11363,11 +11363,12 @@ LOG: CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1) If set to off (the default), detection of -WAL records having references to invalid pages during +WAL records having references to invalid pages or +WAL records resulting in invalid directory operations during recovery causes PostgreSQL to raise a PANIC-level error, aborting the recovery. Setting ignore_invalid_pages to on -causes the system to ignore invalid page references in WAL records +causes the system to ignore invalid actions caused by such WAL records (but still report a warning), and continue the recovery. This behavior may cause crashes, data loss, propagate or hide corruption, or other serious problems. diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 5d6f1b5e46..ae81244e06 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -2008,6 +2008,57 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI) } } +/* + * Makes sure that ./pg_tblspc directory doesn't contain a real directory. + * + * This is intended to be called after reaching consistency. + * ignore_invalid_pages=on turns into the PANIC error into WARNING so that + * recovery can continue. + * + * This can't be checked in allow_in_place_tablespaces mode, so skip it in + * that case. + */ +static void +CheckTablespaceDirectory(void) +{ + char *tblspc_path = "./pg_tblspc"; + DIR *dir; + struct dirent *de; + + /* Do not check for this when test tablespaces are in use */ + if (allow_in_place_tablespaces) + return; + + dir = AllocateDir(tblspc_path); + while ((de = ReadDir(dir, tblspc_path)) != NULL) + { + char path[MAXPGPATH]; + char *p; +#ifndef WIN32 + struct stat st; +#endif + + /* Skip entries of non-oid names */ + for (p = de->d_name; *p && isdigit(*p); p++); + if (*p) + continue; + + snprintf(path, MAXPGPATH, "%s/%s", tblspc_path, de->d_name); + +#ifndef WIN32 + if (lstat(path, &st) < 0) + ereport(ERROR, errcode_for_file_access(), + errmsg("could not stat file \"%s\": %m", path)); + + if (!S_ISLNK(st.st_mode)) +#else + if (!pgwin32_is_junction(path)) +#endif + elog(ignore_
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Tue, 05 Apr 2022 16:38:06 +0900 (JST), Kyotaro Horiguchi wrote in > At Tue, 05 Apr 2022 11:16:44 +0900 (JST), Kyotaro Horiguchi > wrote in > > So, I have the following points in my mind for now. > > > > - We create the directory "since we know it is just tentative state". > > > > - Then, check that no directory in pg_tblspc when reaching consistency > > when allow_in_place_tablespaces is false. > > > > - Leave the log_invalid_page() mechanism alone as it is always result > > in a corrpt page if a differential WAL record is applied on a newly > > created page that should have been exist. > > > > However, while working on it, I found that I found that recovery faces > > missing tablespace directories *after* reaching consistency. I'm > > examining that further. > > Okay, it was my thinko. > > This is the first cut of the above. It had an unused variable for Windows. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 1e7f5e5e10ea504e168d01b3db8be12c2f63b6d6 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 5 Apr 2022 15:31:45 +0900 Subject: [PATCH v23] Fix replay of create database records on standby Crash recovery on standby may encounter missing directories when replaying create database WAL records. Prior to this patch, the standby would fail to recover in such a case. However, the directories could be legitimately missing. Consider a sequence of WAL records as follows: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, the crash recovery must be able to move on. This patch allows missing tablespaces to be created during recovery before reaching consistency. The tablespaces are created as real directories that should not exists but will be removed until reaching consistency. CheckRecoveryConsistency is responsible to make sure they have disappeared. Similar to log_invalid_page mechanism, the GUC ignore_invalid_pages turns into PANIC errors detected by this patch into WARNING, which allows continueing recovery. Diagnosed-by: Paul Guo Author: Paul Guo Author: Kyotaro Horiguchi Author: Asim R Praveen Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27dn5xwj2p9-roh16e4...@mail.gmail.com --- doc/src/sgml/config.sgml| 5 +- src/backend/access/transam/xlogrecovery.c | 55 +++ src/backend/commands/dbcommands.c | 71 + src/backend/commands/tablespace.c | 28 +--- src/backend/utils/misc/guc.c| 8 +- src/include/access/xlogutils.h | 2 + src/test/recovery/t/029_replay_tsp_drops.pl | 155 7 files changed, 292 insertions(+), 32 deletions(-) create mode 100644 src/test/recovery/t/029_replay_tsp_drops.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 43e4ade83e..c9468b 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -11228,11 +11228,12 @@ LOG: CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1) If set to off (the default), detection of -WAL records having references to invalid pages during +WAL records having references to invalid pages or +WAL records resulting in invalid directory operations during recovery causes PostgreSQL to raise a PANIC-level error, aborting the recovery. Setting ignore_invalid_pages to on -causes the system to ignore invalid page references in WAL records +causes the system to ignore invalid actions caused by such WAL records (but still report a warning), and continue the recovery. This behavior may cause crashes, data loss, propagate or hide corruption, or other serious problems. diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 8d2395dae2..18dcc452ca 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -1986,6 +1986,53 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI) } } +/* + * Makes sure that ./pg_tblspc directory doesn't contain a real directory. + * + * This is intended to be called after reaching consistency. + * ignore_invalid_pages=on turns into the PANIC error into WARNING so that + * recovery can continue. + * + * Note that it is the normal behavior when allow_in_place_tablespaces=on, but + * we don't bother caring that case since it is a developer-only setting. + */ +static void +CheckTablespaceDirectory(void) +{ + char *tblspc_path = "./pg_tblspc"; + DIR *dir; + struct dirent *de; + + dir = AllocateDir(tblspc_path); + while ((de = ReadDir(dir, tblspc_path)) != NULL) + { + char path[MAXPGPATH]; + char *p; +#ifndef WIN32 + struct stat st; +#endif + + /* Skip entries of non-oid names */ + for (p = de->d_name; *p && i
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Tue, 05 Apr 2022 16:38:06 +0900 (JST), Kyotaro Horiguchi wrote in > > However, while working on it, I found that I found that recovery faces > > missing tablespace directories *after* reaching consistency. I'm > > examining that further. > > Okay, it was my thinko. But I faced another obstacle. I forgot to delete the second sentence. Please ingore it. regareds. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Tue, 05 Apr 2022 11:16:44 +0900 (JST), Kyotaro Horiguchi wrote in > So, I have the following points in my mind for now. > > - We create the directory "since we know it is just tentative state". > > - Then, check that no directory in pg_tblspc when reaching consistency > when allow_in_place_tablespaces is false. > > - Leave the log_invalid_page() mechanism alone as it is always result > in a corrpt page if a differential WAL record is applied on a newly > created page that should have been exist. > > However, while working on it, I found that I found that recovery faces > missing tablespace directories *after* reaching consistency. I'm > examining that further. Okay, it was my thinko. But I faced another obstacle. This is the first cut of the above. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From c4107b4953d251f7fab06400d2c2ae2e0a505759 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 5 Apr 2022 15:31:45 +0900 Subject: [PATCH v22] Fix replay of create database records on standby Crash recovery on standby may encounter missing directories when replaying create database WAL records. Prior to this patch, the standby would fail to recover in such a case. However, the directories could be legitimately missing. Consider a sequence of WAL records as follows: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, the crash recovery must be able to move on. This patch allows missing tablespaces to be created during recovery before reaching consistency. The tablespaces are created as real directories that should not exists but will be removed until reaching consistency. CheckRecoveryConsistency is responsible to make sure they have disappeared. Similar to log_invalid_page mechanism, the GUC ignore_invalid_pages turns into PANIC errors detected by this patch into WARNING, which allows continueing recovery. Diagnosed-by: Paul Guo Author: Paul Guo Author: Kyotaro Horiguchi Author: Asim R Praveen Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27dn5xwj2p9-roh16e4...@mail.gmail.com --- doc/src/sgml/config.sgml| 5 +- src/backend/access/transam/xlogrecovery.c | 53 +++ src/backend/commands/dbcommands.c | 71 + src/backend/commands/tablespace.c | 28 +--- src/backend/utils/misc/guc.c| 8 +- src/include/access/xlogutils.h | 2 + src/test/recovery/t/029_replay_tsp_drops.pl | 155 7 files changed, 290 insertions(+), 32 deletions(-) create mode 100644 src/test/recovery/t/029_replay_tsp_drops.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 43e4ade83e..c9468b 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -11228,11 +11228,12 @@ LOG: CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1) If set to off (the default), detection of -WAL records having references to invalid pages during +WAL records having references to invalid pages or +WAL records resulting in invalid directory operations during recovery causes PostgreSQL to raise a PANIC-level error, aborting the recovery. Setting ignore_invalid_pages to on -causes the system to ignore invalid page references in WAL records +causes the system to ignore invalid actions caused by such WAL records (but still report a warning), and continue the recovery. This behavior may cause crashes, data loss, propagate or hide corruption, or other serious problems. diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 8d2395dae2..0889ba4b47 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -1986,6 +1986,51 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI) } } +/* + * Makes sure that ./pg_tblspc directory doesn't contain a real directory. + * + * This is intended to be called after reaching consistency. + * ignore_invalid_pages=on turns into the PANIC error into WARNING so that + * recovery can continue. + * + * Note that it is the normal behavior when allow_in_place_tablespaces=on, but + * we don't bother caring that case since it is a developer-only setting. + */ +static void +CheckTablespaceDirectory(void) +{ + char *tblspc_path = "./pg_tblspc"; + DIR *dir; + struct dirent *de; + + dir = AllocateDir(tblspc_path); + while ((de = ReadDir(dir, tblspc_path)) != NULL) + { + char path[MAXPGPATH]; + char *p; + struct stat st; + + /* Skip entries of non-oid names */ + for (p = de->d_name; *p && isdigit(*p); p++); + if (*p) + continue; + + snprintf(path, MAXPGPATH, "%s/%s", tblspc_path, de->d_name); + +#ifndef WIN32 + if (lstat(path,
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Mon, 4 Apr 2022 21:14:27 +0530, Dilip Kumar wrote in > On Mon, Apr 4, 2022 at 2:25 PM Kyotaro Horiguchi > wrote: > > > > At Mon, 04 Apr 2022 17:29:48 +0900 (JST), Kyotaro Horiguchi > > wrote in > > > I haven't found how the patch caused creation of a relation file that > > > is to be removed soon. However, I find that v19 patch fails by maybe > > > due to some change in Cluster.pm. It takes a bit more time to check > > > that.. > > > > I was a bit away, of course the wal-logged create database interfares > > with the patch here. But I haven't found that why it stops creating > > database directory under pg_tblspc. > > I did not understand what is the exact problem here, but the database > directory and the version file are created under the default > tablespace of the target database. However, other than the default > tablespace of the database, the database directory will be created > along with the smgrcreate() so that we do not create an unnecessary > directory under the tablespace where we do not have any data to be > copied. Thanks. Yeah, I suspected something like that but I didn't find a difference in the code I suspected to be related with, but it's was wrong. I took wrong steps trying to reveal that state and faced the wrong error message. With the correct steps, I could see that Storage/CREATE creates pg_tblspc/. So, if we create missing tablespace directory, we have no way otherthan creating it directly in pg_tblspc, which is violating the rule that there shouldn't be real directory in pg_tblspc (when allow_in_place_tablespaces is false). So, I have the following points in my mind for now. - We create the directory "since we know it is just tentative state". - Then, check that no directory in pg_tblspc when reaching consistency when allow_in_place_tablespaces is false. - Leave the log_invalid_page() mechanism alone as it is always result in a corrpt page if a differential WAL record is applied on a newly created page that should have been exist. However, while working on it, I found that I found that recovery faces missing tablespace directories *after* reaching consistency. I'm examining that further. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Mon, Apr 4, 2022 at 2:25 PM Kyotaro Horiguchi wrote: > > At Mon, 04 Apr 2022 17:29:48 +0900 (JST), Kyotaro Horiguchi > wrote in > > I haven't found how the patch caused creation of a relation file that > > is to be removed soon. However, I find that v19 patch fails by maybe > > due to some change in Cluster.pm. It takes a bit more time to check > > that.. > > I was a bit away, of course the wal-logged create database interfares > with the patch here. But I haven't found that why it stops creating > database directory under pg_tblspc. I did not understand what is the exact problem here, but the database directory and the version file are created under the default tablespace of the target database. However, other than the default tablespace of the database, the database directory will be created along with the smgrcreate() so that we do not create an unnecessary directory under the tablespace where we do not have any data to be copied. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Mon, 04 Apr 2022 17:29:48 +0900 (JST), Kyotaro Horiguchi wrote in > I haven't found how the patch caused creation of a relation file that > is to be removed soon. However, I find that v19 patch fails by maybe > due to some change in Cluster.pm. It takes a bit more time to check > that.. I was a bit away, of course the wal-logged create database interfares with the patch here. But I haven't found that why it stops creating database directory under pg_tblspc. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Fri, 1 Apr 2022 14:51:58 -0400, Robert Haas wrote in > On Fri, Apr 1, 2022 at 12:22 AM Kyotaro Horiguchi > wrote: > > By the way, may I ask how do we fix this? The existing recovery code > > already generates just-to-be-delete files in a real directory in > > pg_tblspc sometimes, and elsewise skip applying WAL records on > > nonexistent heap pages. It is the "mixed" way. > > Can you be more specific about where we have each behavior now? They're done in XLogReadBufferExtended. The second behavior happens here, xlogutils.c: > /* hm, page doesn't exist in file */ > if (mode == RBM_NORMAL) > { > log_invalid_page(rnode, forknum, blkno, false); + Assert(0); > return InvalidBuffer; With the assertion, 015_promotion_pages.pl crashes. This prevents page creation and the following redo action on the page. The first behavior is described as the following comment: >* Create the target file if it doesn't already exist. This lets us > cope >* if the replay sequence contains writes to a relation that is later >* deleted. (The original coding of this routine would instead suppress >* the writes, but that seems like it risks losing valuable data if the >* filesystem loses an inode during a crash. Better to write the data >* until we are actually told to delete the file.) >*/ > smgrcreate(smgr, forknum, true); Without the smgrcreate call, make check-world fails due to missing files for FSM and visibility map, and init forks, which it's a bit doubtful that the cases fall into the category so-called "creates inexistent objects by redo access". In a few places, XLOG_FPI records are used to create the first page of a file including main and init forks. But I don't see a case of main fork during make check-world. # Most of the failure cases happen as standby freeze. I was a bit # annoyed that make check-world doesn't tell what is the module # currently being tested. In that case I had to deduce it from the # sequence of preceding script names, but if the first TAP script of a # module freezes, I had to use ps to find the module.. > > 1. stop XLogReadBufferForRedo creating a file in nonexistent > > directories then remember the failure (I'm not sure how big the > > impact is.) > > > > 2. unconditionally create all objects required for recovery to proceed.. > > 2.1 and igore the failures. > > 2.2 and remember the failures. > > > > 3. Any other? > > > > 2 needs to create a real directory in pg_tblspc. So 1? > > I think we could either do 1 or 2. My intuition is that getting 2 > working would be less scary and more likely to be something we would > feel comfortable back-patching, but 1 is probably a better design in > the long term. However, I might be wrong -- that's just a guess. Thanks. I forgot to mention in the previous mail (but mentioned somewhere upthread) but if we take 2, there's no way other than creating a real directory in pg_tblspc while recovery. I don't think it is neat. I haven't found how the patch caused creation of a relation file that is to be removed soon. However, I find that v19 patch fails by maybe due to some change in Cluster.pm. It takes a bit more time to check that.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Fri, Apr 1, 2022 at 12:22 AM Kyotaro Horiguchi wrote: > By the way, may I ask how do we fix this? The existing recovery code > already generates just-to-be-delete files in a real directory in > pg_tblspc sometimes, and elsewise skip applying WAL records on > nonexistent heap pages. It is the "mixed" way. Can you be more specific about where we have each behavior now? > 1. stop XLogReadBufferForRedo creating a file in nonexistent > directories then remember the failure (I'm not sure how big the > impact is.) > > 2. unconditionally create all objects required for recovery to proceed.. > 2.1 and igore the failures. > 2.2 and remember the failures. > > 3. Any other? > > 2 needs to create a real directory in pg_tblspc. So 1? I think we could either do 1 or 2. My intuition is that getting 2 working would be less scary and more likely to be something we would feel comfortable back-patching, but 1 is probably a better design in the long term. However, I might be wrong -- that's just a guess. -- Robert Haas EDB: http://www.enterprisedb.com
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Tue, 29 Mar 2022 09:31:42 -0400, Robert Haas wrote in > On Tue, Mar 29, 2022 at 9:28 AM Alvaro Herrera > wrote: > > OK, this is a bug that's been open for years. A fix can be committed > > after the feature freeze anyway. > > +1 By the way, may I ask how do we fix this? The existing recovery code already generates just-to-be-delete files in a real directory in pg_tblspc sometimes, and elsewise skip applying WAL records on nonexistent heap pages. It is the "mixed" way. 1. stop XLogReadBufferForRedo creating a file in nonexistent directories then remember the failure (I'm not sure how big the impact is.) 2. unconditionally create all objects required for recovery to proceed.. 2.1 and igore the failures. 2.2 and remember the failures. 3. Any other? 2 needs to create a real directory in pg_tblspc. So 1? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Tue, Mar 29, 2022 at 9:28 AM Alvaro Herrera wrote: > OK, this is a bug that's been open for years. A fix can be committed > after the feature freeze anyway. +1 -- Robert Haas EDB: http://www.enterprisedb.com
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Mar-29, Robert Haas wrote: > I'm fine with that approach, but I'd like to ask that we proceed > expeditiously, because I have another patch that I want to commit that > touches this area. I can commit to helping with whatever we decide to > do here, but I don't want to keep that patch on ice while we figure it > out and then have it miss the release. OK, this is a bug that's been open for years. A fix can be committed after the feature freeze anyway. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Tue, Mar 29, 2022 at 7:37 AM Alvaro Herrera wrote: > I think we should revert this patch and do it again using the other > approach: create a stub directory during recovery that can be deleted > later. I'm fine with that approach, but I'd like to ask that we proceed expeditiously, because I have another patch that I want to commit that touches this area. I can commit to helping with whatever we decide to do here, but I don't want to keep that patch on ice while we figure it out and then have it miss the release. -- Robert Haas EDB: http://www.enterprisedb.com
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Mar-29, Kyotaro Horiguchi wrote: > > That's going to result in a call to > > XLogReadBufferForRedoExtended() which will call > > XLogReadBufferExtended() which will do smgrcreate(smgr, forknum, true) > > which will in turn call TablespaceCreateDbspace() to fill in all the > > missing directories. > > Right. I thought that recovery avoids that but that's wrong. This > behavior creates a bare (non-linked) directly within pg_tblspc. The > directory would dissapear soon if recovery proceeds to the consistency > point, though. Hmm, this is not good. > No. I agree that mixing them is not good. On the other hand we > already doing that by heapam. AFAICS sometimes it avoid creating a > new page but sometimes creates it. But I don't mean to use the fact > for justifying this patch to do that, or denying to do that. I think we should revert this patch and do it again using the other approach: create a stub directory during recovery that can be deleted later. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Porque francamente, si para saber manejarse a uno mismo hubiera que rendir examen... ¿Quién es el machito que tendría carnet?" (Mafalda)
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Mon, 28 Mar 2022 12:17:50 -0400, Robert Haas wrote in > On Mon, Mar 21, 2022 at 3:02 PM Alvaro Herrera > wrote: > > > 2. Why not instead change the code so that the operation can succeed, > > > by creating the prerequisite parent directories? Do we not have enough > > > information for that? I'm not saying that we definitely should do it > > > that way rather than this way, but I think we do take that approach in > > > some cases. > > > > It seems we can choose freely between these two implementations -- I > > mean I don't see any upsides or downsides to either one. > > What got committed here feels inconsistent to me. Suppose we have a > checkpoint, and then a series of operations that touch a tablespace, > and then a drop database and drop tablespace. If the first operation > happens to be CREATE DATABASE, then this patch is going to fix it by > skipping the operation. However, if the first operation happens to be > almost anything else, the way it's going to reference the dropped > tablespace is via a block reference in a WAL record of a wide variety > of types. That's going to result in a call to > XLogReadBufferForRedoExtended() which will call > XLogReadBufferExtended() which will do smgrcreate(smgr, forknum, true) > which will in turn call TablespaceCreateDbspace() to fill in all the > missing directories. Right. I thought that recovery avoids that but that's wrong. This behavior creates a bare (non-linked) directly within pg_tblspc. The directory would dissapear soon if recovery proceeds to the consistency point, though. > I don't think that's very good. It would be reasonable to decide that > we're never going to create the missing directories and instead just > remember that they were not found so we can do a cross check. It's > also reasonable to just create the directories on the fly. But doing a > mix of those systems doesn't really seem like the right idea - > particularly because it means that the cross-check system is probably > not very effective at finding actual problems in the code. > > Am I missing something here? No. I agree that mixing them is not good. On the other hand we already doing that by heapam. AFAICS sometimes it avoid creating a new page but sometimes creates it. But I don't mean to use the fact for justifying this patch to do that, or denying to do that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Mon, Mar 28, 2022 at 02:34:44PM +1300, Thomas Munro wrote: > Just a thought: we could consider back-patching > allow_in_place_tablespaces, after a little while, if we're happy with > how that is working out, if it'd be useful for verifying bug fixes in > back branches. It's non-end-user-facing testing infrastructure. +1 for a backpatch on that. That would be useful. -- Michael signature.asc Description: PGP signature
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Mon, 28 Mar 2022 10:37:04 -0400, Robert Haas wrote in > On Fri, Mar 25, 2022 at 8:26 AM Alvaro Herrera > wrote: > > On 2022-Mar-21, Alvaro Herrera wrote: > > > I had a look at this latest version of the patch, and found some things > > > to tweak. Attached is v21 with three main changes from Kyotaro's v20: > > > > Pushed this, backpatching to 14 and 13. It would have been good to > > backpatch further, but there's an (textually trivial) merge conflict > > related to commit e6d8069522c8. Because that commit conceptually > > touches the same area that this bugfix is about, I'm not sure that > > backpatching further without a lot more thought is wise -- particularly > > so when there's no way to automate the test in branches older than > > master. > > > > This is quite annoying, considering that the bug was reported shortly > > before 12 went into beta. > > I think that the warnings this patch issues may cause some unnecessary > end-user alarm. It seems to me that they are basically warning about a > situation that is unusual but not scary. Isn't the appropriate level > for that DEBUG1, maybe without the errhint? log_invalid_page reports missing pages with DEBUG1 before reaching consistency. And since missing directory is not an issue if all of those reports are forgotten until reaching consistency, DEBUG1 sounds reasonable. Maybe we lower the DEBUG1 messages to DEBUG2 in XLogRememberMissingDir? -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Mon, Mar 21, 2022 at 3:02 PM Alvaro Herrera wrote: > > 2. Why not instead change the code so that the operation can succeed, > > by creating the prerequisite parent directories? Do we not have enough > > information for that? I'm not saying that we definitely should do it > > that way rather than this way, but I think we do take that approach in > > some cases. > > It seems we can choose freely between these two implementations -- I > mean I don't see any upsides or downsides to either one. What got committed here feels inconsistent to me. Suppose we have a checkpoint, and then a series of operations that touch a tablespace, and then a drop database and drop tablespace. If the first operation happens to be CREATE DATABASE, then this patch is going to fix it by skipping the operation. However, if the first operation happens to be almost anything else, the way it's going to reference the dropped tablespace is via a block reference in a WAL record of a wide variety of types. That's going to result in a call to XLogReadBufferForRedoExtended() which will call XLogReadBufferExtended() which will do smgrcreate(smgr, forknum, true) which will in turn call TablespaceCreateDbspace() to fill in all the missing directories. I don't think that's very good. It would be reasonable to decide that we're never going to create the missing directories and instead just remember that they were not found so we can do a cross check. It's also reasonable to just create the directories on the fly. But doing a mix of those systems doesn't really seem like the right idea - particularly because it means that the cross-check system is probably not very effective at finding actual problems in the code. Am I missing something here? -- Robert Haas EDB: http://www.enterprisedb.com
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Fri, Mar 25, 2022 at 8:26 AM Alvaro Herrera wrote: > On 2022-Mar-21, Alvaro Herrera wrote: > > I had a look at this latest version of the patch, and found some things > > to tweak. Attached is v21 with three main changes from Kyotaro's v20: > > Pushed this, backpatching to 14 and 13. It would have been good to > backpatch further, but there's an (textually trivial) merge conflict > related to commit e6d8069522c8. Because that commit conceptually > touches the same area that this bugfix is about, I'm not sure that > backpatching further without a lot more thought is wise -- particularly > so when there's no way to automate the test in branches older than > master. > > This is quite annoying, considering that the bug was reported shortly > before 12 went into beta. I think that the warnings this patch issues may cause some unnecessary end-user alarm. It seems to me that they are basically warning about a situation that is unusual but not scary. Isn't the appropriate level for that DEBUG1, maybe without the errhint? -- Robert Haas EDB: http://www.enterprisedb.com
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Mon, 28 Mar 2022 10:01:05 +0900 (JST), Kyotaro Horiguchi wrote in > At Fri, 25 Mar 2022 13:26:05 +0100, Alvaro Herrera > wrote in > > Pushed this, backpatching to 14 and 13. It would have been good to > > backpatch further, but there's an (textually trivial) merge conflict > > related to commit e6d8069522c8. Because that commit conceptually > > touches the same area that this bugfix is about, I'm not sure that > > backpatching further without a lot more thought is wise -- particularly > > so when there's no way to automate the test in branches older than > > master. > > Thaks for committing. > > > This is quite annoying, considering that the bug was reported shortly > > before 12 went into beta. > > Sure. I'm going to look into that. This is a preparatory patch and tentative (yes, it's just tentative) test. This is made for 12 but applies with some warnings to 10-11. (Hope the attachments are attached as "attachment", not "inline".) regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 3d5b24691517c1aac4b49728abb122c66a4e33be Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Mon, 28 Mar 2022 16:29:04 +0900 Subject: [PATCH 1/2] Tentative test for tsp replay fix --- src/test/perl/PostgresNode.pm | 342 +- src/test/recovery/t/011_crash_recovery.pl | 108 ++- 2 files changed, 447 insertions(+), 3 deletions(-) diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 7b2ec29bb7..88fa08b61d 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -104,6 +104,8 @@ use TestLib (); use Time::HiRes qw(usleep); use Scalar::Util qw(blessed); +my $windows_os = 0; + our @EXPORT = qw( get_new_node get_free_port @@ -323,6 +325,64 @@ sub archive_dir =pod +=item $node->tablespace_storage([, nocreate]) + +Diretory to store tablespace directories. +If nocreate is true, returns undef if not yet created. + +=cut + +sub tablespace_storage +{ + my ($self, $nocreate) = @_; + + if (!defined $self->{_tsproot}) + { + # tablespace is not used, return undef if nocreate is specified. + return undef if ($nocreate); + + # create and remember the tablespae root directotry. + $self->{_tsproot} = TestLib::tempdir_short(); + } + + return $self->{_tsproot}; +} + +=pod + +=item $node->tablespaces() + +Returns a hash from tablespace OID to tablespace directory name. For +example, an oid 16384 pointing to /tmp/jWAhkT_fs0/ts1 is stored as +$hash{16384} = "ts1". + +=cut + +sub tablespaces +{ + my ($self) = @_; + my $pg_tblspc = $self->data_dir . '/pg_tblspc'; + my %ret; + + # return undef if no tablespace is used + return undef if (!defined $self->tablespace_storage(1)); + + # collect tablespace entries in pg_tblspc directory + opendir(my $dir, $pg_tblspc); + while (my $oid = readdir($dir)) + { + next if ($oid !~ /^([0-9]+)$/); + my $linkpath = "$pg_tblspc/$oid"; + my $tsppath = dir_readlink($linkpath); + $ret{$oid} = File::Basename::basename($tsppath); + } + closedir($dir); + + return %ret; +} + +=pod + =item $node->backup_dir() The output path for backups taken with $node->backup() @@ -338,6 +398,77 @@ sub backup_dir =pod +=item $node->backup_tablespace_storage_path(backup_name) + +Returns tablespace location path for backup_name. +Retuns the parent directory if backup_name is not given. + +=cut + +sub backup_tablespace_storage_path +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_dir . '/__tsps'; + + $dir .= "/$backup_name" if (defined $backup_name); + + return $dir; +} + +=pod + +=item $node->backup_create_tablespace_storage(backup_name) + +Create tablespace location directory for backup_name if not yet. +Create the parent tablespace storage that holds all location +directories if backup_name is not supplied. + +=cut + +sub backup_create_tablespace_storage +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_tablespace_storage_path($backup_name); + + File::Path::make_path $dir if (! -d $dir); +} + +=pod + +=item $node->backup_tablespaces(backup_name) + +Returns a reference to hash from tablespace OID to tablespace +directory name of tablespace directory that the specified backup has. +For example, an oid 16384 pointing to ../tsps/backup1/ts1 is stored as +$hash{16384} = "ts1". + +=cut + +sub backup_tablespaces +{ + my ($self, $backup_name) = @_; + my $pg_tblspc = $self->backup_dir . '/' . $backup_name . '/pg_tblspc'; + my %ret; + + #return undef if this backup holds no tablespaces + return undef if (! -d $self->backup_tablespace_storage_path($backup_name)); + + # scan pg_tblspc directory of the backup + opendir(my $dir, $pg_tblspc); + while (my $oid = readdir($dir)) +
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Mon, 28 Mar 2022 14:34:44 +1300, Thomas Munro wrote in > On Mon, Mar 28, 2022 at 2:01 PM Kyotaro Horiguchi > wrote: > > At Fri, 25 Mar 2022 13:26:05 +0100, Alvaro Herrera > > wrote in > > > Pushed this, backpatching to 14 and 13. It would have been good to > > > backpatch further, but there's an (textually trivial) merge conflict > > > related to commit e6d8069522c8. Because that commit conceptually > > > touches the same area that this bugfix is about, I'm not sure that > > > backpatching further without a lot more thought is wise -- particularly > > > so when there's no way to automate the test in branches older than > > > master. > > Just a thought: we could consider back-patching > allow_in_place_tablespaces, after a little while, if we're happy with > how that is working out, if it'd be useful for verifying bug fixes in > back branches. It's non-end-user-facing testing infrastructure. I appreciate if we accept that. The patch is simple. And it now has the clear use-case for back-patching. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Mon, Mar 28, 2022 at 2:01 PM Kyotaro Horiguchi wrote: > At Fri, 25 Mar 2022 13:26:05 +0100, Alvaro Herrera > wrote in > > Pushed this, backpatching to 14 and 13. It would have been good to > > backpatch further, but there's an (textually trivial) merge conflict > > related to commit e6d8069522c8. Because that commit conceptually > > touches the same area that this bugfix is about, I'm not sure that > > backpatching further without a lot more thought is wise -- particularly > > so when there's no way to automate the test in branches older than > > master. Just a thought: we could consider back-patching allow_in_place_tablespaces, after a little while, if we're happy with how that is working out, if it'd be useful for verifying bug fixes in back branches. It's non-end-user-facing testing infrastructure.
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Fri, 25 Mar 2022 13:26:05 +0100, Alvaro Herrera wrote in > On 2022-Mar-21, Alvaro Herrera wrote: > > > I had a look at this latest version of the patch, and found some things > > to tweak. Attached is v21 with three main changes from Kyotaro's v20: > > Pushed this, backpatching to 14 and 13. It would have been good to > backpatch further, but there's an (textually trivial) merge conflict > related to commit e6d8069522c8. Because that commit conceptually > touches the same area that this bugfix is about, I'm not sure that > backpatching further without a lot more thought is wise -- particularly > so when there's no way to automate the test in branches older than > master. Thaks for committing. > This is quite annoying, considering that the bug was reported shortly > before 12 went into beta. Sure. I'm going to look into that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Mar-21, Alvaro Herrera wrote: > I had a look at this latest version of the patch, and found some things > to tweak. Attached is v21 with three main changes from Kyotaro's v20: Pushed this, backpatching to 14 and 13. It would have been good to backpatch further, but there's an (textually trivial) merge conflict related to commit e6d8069522c8. Because that commit conceptually touches the same area that this bugfix is about, I'm not sure that backpatching further without a lot more thought is wise -- particularly so when there's no way to automate the test in branches older than master. This is quite annoying, considering that the bug was reported shortly before 12 went into beta. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "If you have nothing to say, maybe you need just the right tool to help you not say it." (New York Times, about Microsoft PowerPoint)
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Mar-14, Robert Haas wrote: > 2. Why not instead change the code so that the operation can succeed, > by creating the prerequisite parent directories? Do we not have enough > information for that? I'm not saying that we definitely should do it > that way rather than this way, but I think we do take that approach in > some cases. It seems we can choose freely between these two implementations -- I mean I don't see any upsides or downsides to either one. The current one has the advantage that it never makes the datadir "dirty", to use Kyotaro's term. It verifies that the creation/drop form a pair. A possible downside is that if there's a bug, we could end up with a spurious PANIC at the end of recovery, and no way to recover. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
I had a look at this latest version of the patch, and found some things to tweak. Attached is v21 with three main changes from Kyotaro's v20: 1. the XLogFlush is only done if consistent state has not been reached. As I understand, it's not needed in normal mode. (In any case, if we do call XLogFlush in normal mode, what it does is not advance the recovery point, so the comment would be incorrect.) 2. use %u to print OIDs rather than %d 3. I changed the warning message wording to this: + ereport(WARNING, + (errmsg("skipping replay of database creation WAL record"), +errdetail("The source database directory \"%s\" was not found.", + src_path), +errhint("A future WAL record that removes the directory before reaching consistent mode is expected."))); I also renamed the function XLogReportMissingDir to XLogRememberMissingDir (which matches the "forget" part) and changed the DEBUG2 messages in that function to DEBUG1 (all the calls in other functions remain DEBUG2, because ISTM they are not as interesting). Finally, I made the TAP test search the WARNING line in the log. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "No tengo por qué estar de acuerdo con lo que pienso" (Carlos Caszeli) >From 6a6fc73a93768a44ec026720c115f77c67d5cda2 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 21 Mar 2022 12:34:34 +0100 Subject: [PATCH v21] Fix replay of create database records on standby Crash recovery on standby may encounter missing directories when replaying create database WAL records. Prior to this patch, the standby would fail to recover in such a case. However, the directories could be legitimately missing. Consider a sequence of WAL records as follows: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, the crash recovery must be able to move on. This patch adds mechanism similar to invalid page hash table, to track missing directories during crash recovery. If all the missing directory references are matched with corresponding drop records at the end of crash recovery, the standby can safely enter archive recovery. Diagnosed-by: Paul Guo Author: Paul Guo Author: Kyotaro Horiguchi Author: Asim R Praveen Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27dn5xwj2p9-roh16e4...@mail.gmail.com --- src/backend/access/transam/xlogrecovery.c | 6 + src/backend/access/transam/xlogutils.c | 159 +++- src/backend/commands/dbcommands.c | 57 +++ src/backend/commands/tablespace.c | 17 +++ src/include/access/xlogutils.h | 4 + src/test/recovery/t/029_replay_tsp_drops.pl | 67 + src/tools/pgindent/typedefs.list| 2 + 7 files changed, 311 insertions(+), 1 deletion(-) create mode 100644 src/test/recovery/t/029_replay_tsp_drops.pl diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 9feea3e6ec..f48d8d51fb 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -2043,6 +2043,12 @@ CheckRecoveryConsistency(void) */ XLogCheckInvalidPages(); + /* + * Check if the XLOG sequence contained any unresolved references to + * missing directories. + */ + XLogCheckMissingDirs(); + reachedConsistency = true; ereport(LOG, (errmsg("consistent recovery state reached at %X/%X", diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 511f2f186f..8c1b8216be 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -54,6 +54,164 @@ bool InRecovery = false; /* Are we in Hot Standby mode? Only valid in startup process, see xlogutils.h */ HotStandbyState standbyState = STANDBY_DISABLED; + +/* + * If a create database WAL record is being replayed more than once during + * crash recovery on a standby, it is possible that either the tablespace + * directory or the template database directory is missing. This happens when + * the directories are removed by replay of subsequent drop records. Note + * that this problem happens only on standby and not on master. On master, a + * checkpoint is created at the end of create database operation. On standby, + * however, such a strategy (creating restart points during replay) is not + * viable because it will slow down WAL replay. + * + * The alternative is to track references to each missing directory + * encountered when performing crash recovery in the following hash table. + * Similar to invalid page table above, the expectation is that each missing + * directory entry should be matched with a drop database or drop tablespace + * WAL record by the end
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2022-Mar-04, Michael Paquier wrote: > d6d317d as solved the issue of tablespace paths across multiple nodes > with the new GUC called allow_in_place_tablespaces, and is getting > successfully used in the recovery tests as of 027_stream_regress.pl. OK, but that means that the test suite is now not backpatchable. The implication here is that either we're going to commit the fix without any tests at all on older branches, or that we're going to fix it only in branch master. Are you thinking that it's okay to leave this bug unfixed in older branches? That seems embarrasing. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No me acuerdo, pero no es cierto. No es cierto, y si fuera cierto, no me acuerdo." (Augusto Pinochet a una corte de justicia)
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Mon, 14 Mar 2022 17:37:40 -0400, Robert Haas wrote in > On Mon, Mar 7, 2022 at 3:39 AM Kyotaro Horiguchi > wrote: > > So the new framework has been dropped in this version. > > The second test is removed as it is irrelevant to this bug. > > > > In this version the patch is a single file that contains the test. > > The status of this patch in the CommitFest was set to "Waiting for > Author." Since a new patch has been submitted since that status was > set, I have changed it to "Needs Review." Since this is now in its > 15th CommitFest, we really should get it fixed; that's kind of > ridiculous. (I am as much to blame as anyone.) It does seem to be a > legitimate bug. > > A few questions about the patch: Thanks for looking this! > 1. Why is it OK to just skip the operation without making it up later? Does "it" mean removal of directories? It is not okay, but in the first place it is out-of-scope of this patch to fix that. The patch leaves the existing code alone. This patch just has recovery ignore invalid accesses into eventually removed objects. Maybe, I don't understand you question.. > 2. Why not instead change the code so that the operation can succeed, > by creating the prerequisite parent directories? Do we not have enough > information for that? I'm not saying that we definitely should do it > that way rather than this way, but I think we do take that approach in > some cases. It is proposed first by Paul Guo [1] then changed so that it ignores failed directory creations in the very early stage in this thread. After that, it gets conscious of recovery consistency by managing invalid-access list. [1] https://www.postgresql.org/message-id/flat/20210327142316.GA32517%40alvherre.pgsql#a557bd47207a446ce206879676e0140a I think there was no strong reason for the current shape but I personally rather like the remembering-invalid-access way because it doesn't dirty the data directory and it is consistent with how we treat missing heap pages. I tried a slightly tweaked version (attached) of the first version and confirmed that it works for the current test script. It doesn't check recovery consistency but otherwise that way also seems fine. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index c37e3c9a9a..28aed8d296 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -47,6 +47,7 @@ #include "commands/defrem.h" #include "commands/seclabel.h" #include "commands/tablespace.h" +#include "common/file_perm.h" #include "mb/pg_wchar.h" #include "miscadmin.h" #include "pgstat.h" @@ -2382,6 +2383,7 @@ dbase_redo(XLogReaderState *record) xl_dbase_create_rec *xlrec = (xl_dbase_create_rec *) XLogRecGetData(record); char *src_path; char *dst_path; + char *parent_path; struct stat st; src_path = GetDatabasePath(xlrec->src_db_id, xlrec->src_tablespace_id); @@ -2401,6 +2403,41 @@ dbase_redo(XLogReaderState *record) dst_path))); } + /* +* It is possible that the tablespace was later dropped, but we are +* re-redoing database create before that. In that case, those +* directories are gone, and we do not create symlink. +*/ + if (stat(dst_path, &st) < 0 && errno == ENOENT) + { + parent_path = pstrdup(dst_path); + get_parent_directory(parent_path); + elog(WARNING, "creating missing directory: %s", parent_path); + if (stat(parent_path, &st) != 0 && pg_mkdir_p(parent_path, pg_dir_create_mode) != 0) + { + ereport(WARNING, + (errmsg("can not recursively create directory \"%s\"", + parent_path))); + } + } + + /* +* There's a case where the copy source directory is missing for the +* same reason above. Create the emtpy source directory so that +* copydir below doesn't fail. The directory will be dropped soon by +* recovery. +*/ + if (stat(src_path, &st) < 0 && errno == ENOENT) + { + elog(WARNING, "creating missing copy source directory: %s", src_path); + if (stat(src_path, &st) != 0 && pg_mkdir_p(src_path, pg_dir_create_mode) != 0) + { + ereport(WARNING, + (errmsg("can not recursively create directory \"%s\"", +
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Mon, Mar 7, 2022 at 3:39 AM Kyotaro Horiguchi wrote: > So the new framework has been dropped in this version. > The second test is removed as it is irrelevant to this bug. > > In this version the patch is a single file that contains the test. The status of this patch in the CommitFest was set to "Waiting for Author." Since a new patch has been submitted since that status was set, I have changed it to "Needs Review." Since this is now in its 15th CommitFest, we really should get it fixed; that's kind of ridiculous. (I am as much to blame as anyone.) It does seem to be a legitimate bug. A few questions about the patch: 1. Why is it OK to just skip the operation without making it up later? 2. Why not instead change the code so that the operation can succeed, by creating the prerequisite parent directories? Do we not have enough information for that? I'm not saying that we definitely should do it that way rather than this way, but I think we do take that approach in some cases. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
So the new framework has been dropped in this version. The second test is removed as it is irrelevant to this bug. In this version the patch is a single file that contains the test. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 43bb3ba8900edd53a1feb0acb1a72bdc22bb1627 Mon Sep 17 00:00:00 2001 From: P Date: Mon, 7 Mar 2022 17:10:07 +0900 Subject: [PATCH v20] Fix replay of create database records on standby Crash recovery on standby may encounter missing directories when replaying create database WAL records. Prior to this patch, the standby would fail to recover in such a case. However, the directories could be legitimately missing. Consider a sequence of WAL records as follows: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, the crash recovery must be able to move on. This patch adds mechanism similar to invalid page hash table, to track missing directories during crash recovery. If all the missing directory references are matched with corresponding drop records at the end of crash recovery, the standby can safely enter archive recovery. Bug identified by Paul Guo. Authored by Paul Guo, Kyotaro Horiguchi and Asim R P. --- src/backend/access/transam/xlogrecovery.c | 6 + src/backend/access/transam/xlogutils.c | 145 src/backend/commands/dbcommands.c | 56 src/backend/commands/tablespace.c | 16 +++ src/include/access/xlogutils.h | 4 + src/test/recovery/t/029_replay_tsp_drops.pl | 62 + 6 files changed, 289 insertions(+) create mode 100644 src/test/recovery/t/029_replay_tsp_drops.pl diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index f9f212680b..97fed1e04d 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -2043,6 +2043,12 @@ CheckRecoveryConsistency(void) */ XLogCheckInvalidPages(); + /* + * Check if the XLOG sequence contained any unresolved references to + * missing directories. + */ + XLogCheckMissingDirs(); + reachedConsistency = true; ereport(LOG, (errmsg("consistent recovery state reached at %X/%X", diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 54d5f20734..3f8f7dadac 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -79,6 +79,151 @@ typedef struct xl_invalid_page static HTAB *invalid_page_tab = NULL; +/* + * If a create database WAL record is being replayed more than once during + * crash recovery on a standby, it is possible that either the tablespace + * directory or the template database directory is missing. This happens when + * the directories are removed by replay of subsequent drop records. Note + * that this problem happens only on standby and not on master. On master, a + * checkpoint is created at the end of create database operation. On standby, + * however, such a strategy (creating restart points during replay) is not + * viable because it will slow down WAL replay. + * + * The alternative is to track references to each missing directory + * encountered when performing crash recovery in the following hash table. + * Similar to invalid page table above, the expectation is that each missing + * directory entry should be matched with a drop database or drop tablespace + * WAL record by the end of crash recovery. + */ +typedef struct xl_missing_dir_key +{ + Oid spcNode; + Oid dbNode; +} xl_missing_dir_key; + +typedef struct xl_missing_dir +{ + xl_missing_dir_key key; + char path[MAXPGPATH]; +} xl_missing_dir; + +static HTAB *missing_dir_tab = NULL; + +void +XLogReportMissingDir(Oid spcNode, Oid dbNode, char *path) +{ + xl_missing_dir_key key; + bool found; + xl_missing_dir *entry; + + /* + * Database OID may be invalid but tablespace OID must be valid. If + * dbNode is InvalidOid, we are logging a missing tablespace directory, + * otherwise we are logging a missing database directory. + */ + Assert(OidIsValid(spcNode)); + + if (missing_dir_tab == NULL) + { + /* create hash table when first needed */ + HASHCTL ctl; + + memset(&ctl, 0, sizeof(ctl)); + ctl.keysize = sizeof(xl_missing_dir_key); + ctl.entrysize = sizeof(xl_missing_dir); + + missing_dir_tab = hash_create("XLOG missing directory table", + 100, + &ctl, + HASH_ELEM | HASH_BLOBS); + } + + key.spcNode = spcNode; + key.dbNode = dbNode; + + entry = hash_search(missing_dir_tab, &key, HASH_ENTER, &found); + + if (found) + { + if (dbNode == InvalidOid) + elog(DEBUG2, "missing directory %s (tablespace %d) already exists: %s", + path, spcNode, entry->path); + else + elog(DEBUG2, "missing directory %s (tablespace %d database %d) already exists: %s", + path, spcN
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Thanks to look this! At Fri, 4 Mar 2022 13:51:12 +0900, Michael Paquier wrote i n > On Fri, Mar 04, 2022 at 09:10:48AM +0900, Kyotaro Horiguchi wrote: > > And same function contained a maybe-should-have-been-removed line > > which makes Windows build unhappy. > > > > This should make all platforms in the CI happy. > > d6d317d as solved the issue of tablespace paths across multiple nodes > with the new GUC called allow_in_place_tablespaces, and is getting > successfully used in the recovery tests as of 027_stream_regress.pl. The feature allows only one tablespace directory. but that uses (I'm not sure it needs, though) multiple tablespace directories so I think the feature doesn't work for the test. Maybe I'm missing something, but it doesn't use tablespaces. I see that in 002_tablespace.pl but but the test uses only one tablespace location. > Shouldn't we rely on that rather than extending more our test perl > modules? One tricky part is the emulation of readlink for junction > points on Windows (dir_readlink in your patch), and the root of the Yeah, I don't like that as I said before... > problem is that 0003 cares about the path structure of the > tablespaces so we have no need, as far as I can see, for any > dependency with link follow-up in the scope of this patch. I'm not sure how this related to 0001 but maybe I don't follow this. > This means that you should be able to simplify the patch set, as we > could entirely drop 0001 in favor of enforcing the new dev GUC in the > nodes created in the TAP test of 0002. Maybe it's possible by breaking the test into ones that need only one tablespace. I'll give it a try. > Speaking of 0002, perhaps this had better be in its own file rather > than extending more 011_crash_recovery.pl. 0003 looks like a good Ok, no problem. > idea to check after the consistency of the path structures created > during replay, and it touches paths I'd expect it to touch, as of > database and tbspace redos. > > + if (!reachedConsistency) > + XLogForgetMissingDir(xlrec->ts_id, InvalidOid); > + > + XLogFlush(record->EndRecPtr); > Not sure to understand why this is required. A comment may be in > order to explain the hows and the whys. Is it about XLogFlush? As my understanding it is to update minRecoveryPoint to that LSN. I'll add a comment like that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Fri, Mar 04, 2022 at 09:10:48AM +0900, Kyotaro Horiguchi wrote: > And same function contained a maybe-should-have-been-removed line > which makes Windows build unhappy. > > This should make all platforms in the CI happy. d6d317d as solved the issue of tablespace paths across multiple nodes with the new GUC called allow_in_place_tablespaces, and is getting successfully used in the recovery tests as of 027_stream_regress.pl. Shouldn't we rely on that rather than extending more our test perl modules? One tricky part is the emulation of readlink for junction points on Windows (dir_readlink in your patch), and the root of the problem is that 0003 cares about the path structure of the tablespaces so we have no need, as far as I can see, for any dependency with link follow-up in the scope of this patch. This means that you should be able to simplify the patch set, as we could entirely drop 0001 in favor of enforcing the new dev GUC in the nodes created in the TAP test of 0002. Speaking of 0002, perhaps this had better be in its own file rather than extending more 011_crash_recovery.pl. 0003 looks like a good idea to check after the consistency of the path structures created during replay, and it touches paths I'd expect it to touch, as of database and tbspace redos. + if (!reachedConsistency) + XLogForgetMissingDir(xlrec->ts_id, InvalidOid); + + XLogFlush(record->EndRecPtr); Not sure to understand why this is required. A comment may be in order to explain the hows and the whys. -- Michael signature.asc Description: PGP signature
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Wed, 02 Mar 2022 19:31:24 +0900 (JST), Kyotaro Horiguchi wrote in > A function added to Util.pm used perl2host, which has been removed > recently. And same function contained a maybe-should-have-been-removed line which makes Windows build unhappy. This should make all platforms in the CI happy. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From ee17f0f4400ce484cdba80c84744ae47d68c6fa4 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 11 Nov 2021 20:42:00 +0900 Subject: [PATCH v18 1/3] Add tablespace support to TAP framework TAP framework doesn't support nodes that have tablespaces. Especially backup and initialization from backups failed if the source node has tablespaces. This commit provides simple way to create tablespace directories and allows backup routines to handle tablespaces. --- src/test/perl/PostgreSQL/Test/Cluster.pm | 264 ++- src/test/perl/PostgreSQL/Test/Utils.pm | 42 2 files changed, 304 insertions(+), 2 deletions(-) diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index be05845248..15d57b9a71 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -298,6 +298,64 @@ sub archive_dir =pod +=item $node->tablespace_storage([, nocreate]) + +Diretory to store tablespace directories. +If nocreate is true, returns undef if not yet created. + +=cut + +sub tablespace_storage +{ + my ($self, $nocreate) = @_; + + if (!defined $self->{_tsproot}) + { + # tablespace is not used, return undef if nocreate is specified. + return undef if ($nocreate); + + # create and remember the tablespae root directotry. + $self->{_tsproot} = PostgreSQL::Test::Utils::tempdir_short(); + } + + return $self->{_tsproot}; +} + +=pod + +=item $node->tablespaces() + +Returns a hash from tablespace OID to tablespace directory name. For +example, an oid 16384 pointing to /tmp/jWAhkT_fs0/ts1 is stored as +$hash{16384} = "ts1". + +=cut + +sub tablespaces +{ + my ($self) = @_; + my $pg_tblspc = $self->data_dir . '/pg_tblspc'; + my %ret; + + # return undef if no tablespace is used + return undef if (!defined $self->tablespace_storage(1)); + + # collect tablespace entries in pg_tblspc directory + opendir(my $dir, $pg_tblspc); + while (my $oid = readdir($dir)) + { + next if ($oid !~ /^([0-9]+)$/); + my $linkpath = "$pg_tblspc/$oid"; + my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath); + $ret{$oid} = File::Basename::basename($tsppath); + } + closedir($dir); + + return %ret; +} + +=pod + =item $node->backup_dir() The output path for backups taken with $node->backup() @@ -313,6 +371,77 @@ sub backup_dir =pod +=item $node->backup_tablespace_storage_path(backup_name) + +Returns tablespace location path for backup_name. +Retuns the parent directory if backup_name is not given. + +=cut + +sub backup_tablespace_storage_path +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_dir . '/__tsps'; + + $dir .= "/$backup_name" if (defined $backup_name); + + return $dir; +} + +=pod + +=item $node->backup_create_tablespace_storage(backup_name) + +Create tablespace location directory for backup_name if not yet. +Create the parent tablespace storage that holds all location +directories if backup_name is not supplied. + +=cut + +sub backup_create_tablespace_storage +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_tablespace_storage_path($backup_name); + + File::Path::make_path $dir if (! -d $dir); +} + +=pod + +=item $node->backup_tablespaces(backup_name) + +Returns a reference to hash from tablespace OID to tablespace +directory name of tablespace directory that the specified backup has. +For example, an oid 16384 pointing to ../tsps/backup1/ts1 is stored as +$hash{16384} = "ts1". + +=cut + +sub backup_tablespaces +{ + my ($self, $backup_name) = @_; + my $pg_tblspc = $self->backup_dir . '/' . $backup_name . '/pg_tblspc'; + my %ret; + + #return undef if this backup holds no tablespaces + return undef if (! -d $self->backup_tablespace_storage_path($backup_name)); + + # scan pg_tblspc directory of the backup + opendir(my $dir, $pg_tblspc); + while (my $oid = readdir($dir)) + { + next if ($oid !~ /^([0-9]+)$/); + my $linkpath = "$pg_tblspc/$oid"; + my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath); + $ret{$oid} = File::Basename::basename($tsppath); + } + closedir($dir); + + return \%ret; +} + +=pod + =item $node->install_path() The configured install path (if any) for the node. @@ -370,6 +499,7 @@ sub info print $fh "Data directory: " . $self->data_dir . "\n"; print $fh "Backup directory: " . $self->backup_dir . "\n"; print $fh "Archive directory: " . $self->archive_dir . "\n"; + print $fh "Tablespace directory: " . $self->tablespace_storage . "\n"; print $fh "Connection string: " . $self->connstr . "\n"; print $fh "Log file: " . $self->logfile . "\n"; print $fh "Install Path: ", $self->{_install_path} .
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Wed, 02 Mar 2022 16:59:09 +0900 (JST), Kyotaro Horiguchi wrote in > At Thu, 20 Jan 2022 17:19:04 +0900 (JST), Kyotaro Horiguchi > wrote in > > At Thu, 20 Jan 2022 15:07:22 +0900 (JST), Kyotaro Horiguchi > > wrote in > > CI now likes this version for all platforms. > > An xlog.c refactoring happend recently hit this. > Just rebased on the change. A function added to Util.pm used perl2host, which has been removed recently. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From bb714659adcde5265974c46b061966e5dfc556be Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 11 Nov 2021 20:42:00 +0900 Subject: [PATCH v17 1/3] Add tablespace support to TAP framework TAP framework doesn't support nodes that have tablespaces. Especially backup and initialization from backups failed if the source node has tablespaces. This commit provides simple way to create tablespace directories and allows backup routines to handle tablespaces. --- src/test/perl/PostgreSQL/Test/Cluster.pm | 264 ++- src/test/perl/PostgreSQL/Test/Utils.pm | 42 2 files changed, 304 insertions(+), 2 deletions(-) diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index be05845248..15d57b9a71 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -298,6 +298,64 @@ sub archive_dir =pod +=item $node->tablespace_storage([, nocreate]) + +Diretory to store tablespace directories. +If nocreate is true, returns undef if not yet created. + +=cut + +sub tablespace_storage +{ + my ($self, $nocreate) = @_; + + if (!defined $self->{_tsproot}) + { + # tablespace is not used, return undef if nocreate is specified. + return undef if ($nocreate); + + # create and remember the tablespae root directotry. + $self->{_tsproot} = PostgreSQL::Test::Utils::tempdir_short(); + } + + return $self->{_tsproot}; +} + +=pod + +=item $node->tablespaces() + +Returns a hash from tablespace OID to tablespace directory name. For +example, an oid 16384 pointing to /tmp/jWAhkT_fs0/ts1 is stored as +$hash{16384} = "ts1". + +=cut + +sub tablespaces +{ + my ($self) = @_; + my $pg_tblspc = $self->data_dir . '/pg_tblspc'; + my %ret; + + # return undef if no tablespace is used + return undef if (!defined $self->tablespace_storage(1)); + + # collect tablespace entries in pg_tblspc directory + opendir(my $dir, $pg_tblspc); + while (my $oid = readdir($dir)) + { + next if ($oid !~ /^([0-9]+)$/); + my $linkpath = "$pg_tblspc/$oid"; + my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath); + $ret{$oid} = File::Basename::basename($tsppath); + } + closedir($dir); + + return %ret; +} + +=pod + =item $node->backup_dir() The output path for backups taken with $node->backup() @@ -313,6 +371,77 @@ sub backup_dir =pod +=item $node->backup_tablespace_storage_path(backup_name) + +Returns tablespace location path for backup_name. +Retuns the parent directory if backup_name is not given. + +=cut + +sub backup_tablespace_storage_path +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_dir . '/__tsps'; + + $dir .= "/$backup_name" if (defined $backup_name); + + return $dir; +} + +=pod + +=item $node->backup_create_tablespace_storage(backup_name) + +Create tablespace location directory for backup_name if not yet. +Create the parent tablespace storage that holds all location +directories if backup_name is not supplied. + +=cut + +sub backup_create_tablespace_storage +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_tablespace_storage_path($backup_name); + + File::Path::make_path $dir if (! -d $dir); +} + +=pod + +=item $node->backup_tablespaces(backup_name) + +Returns a reference to hash from tablespace OID to tablespace +directory name of tablespace directory that the specified backup has. +For example, an oid 16384 pointing to ../tsps/backup1/ts1 is stored as +$hash{16384} = "ts1". + +=cut + +sub backup_tablespaces +{ + my ($self, $backup_name) = @_; + my $pg_tblspc = $self->backup_dir . '/' . $backup_name . '/pg_tblspc'; + my %ret; + + #return undef if this backup holds no tablespaces + return undef if (! -d $self->backup_tablespace_storage_path($backup_name)); + + # scan pg_tblspc directory of the backup + opendir(my $dir, $pg_tblspc); + while (my $oid = readdir($dir)) + { + next if ($oid !~ /^([0-9]+)$/); + my $linkpath = "$pg_tblspc/$oid"; + my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath); + $ret{$oid} = File::Basename::basename($tsppath); + } + closedir($dir); + + return \%ret; +} + +=pod + =item $node->install_path() The configured install path (if any) for the node. @@ -370,6 +499,7 @@ sub info print $fh "Data directory: " . $self->data_dir . "\n"; print $fh "Backup directory: " . $self->backup_dir . "\n"; print $fh "Archive directory: " . $self->archive_dir . "\n"; + print $fh "Tablespace directory: " . $self->tablespace_storage . "\n"; print $fh "Connection stri
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Thu, 20 Jan 2022 17:19:04 +0900 (JST), Kyotaro Horiguchi wrote in > At Thu, 20 Jan 2022 15:07:22 +0900 (JST), Kyotaro Horiguchi > wrote in > CI now likes this version for all platforms. An xlog.c refactoring happend recently hit this. Just rebased on the change. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 35958b17c62cd14f81efa26a097c32c273028f77 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 11 Nov 2021 20:42:00 +0900 Subject: [PATCH v16 1/3] Add tablespace support to TAP framework TAP framework doesn't support nodes that have tablespaces. Especially backup and initialization from backups failed if the source node has tablespaces. This commit provides simple way to create tablespace directories and allows backup routines to handle tablespaces. --- src/test/perl/PostgreSQL/Test/Cluster.pm | 264 ++- src/test/perl/PostgreSQL/Test/Utils.pm | 43 2 files changed, 305 insertions(+), 2 deletions(-) diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index be05845248..15d57b9a71 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -298,6 +298,64 @@ sub archive_dir =pod +=item $node->tablespace_storage([, nocreate]) + +Diretory to store tablespace directories. +If nocreate is true, returns undef if not yet created. + +=cut + +sub tablespace_storage +{ + my ($self, $nocreate) = @_; + + if (!defined $self->{_tsproot}) + { + # tablespace is not used, return undef if nocreate is specified. + return undef if ($nocreate); + + # create and remember the tablespae root directotry. + $self->{_tsproot} = PostgreSQL::Test::Utils::tempdir_short(); + } + + return $self->{_tsproot}; +} + +=pod + +=item $node->tablespaces() + +Returns a hash from tablespace OID to tablespace directory name. For +example, an oid 16384 pointing to /tmp/jWAhkT_fs0/ts1 is stored as +$hash{16384} = "ts1". + +=cut + +sub tablespaces +{ + my ($self) = @_; + my $pg_tblspc = $self->data_dir . '/pg_tblspc'; + my %ret; + + # return undef if no tablespace is used + return undef if (!defined $self->tablespace_storage(1)); + + # collect tablespace entries in pg_tblspc directory + opendir(my $dir, $pg_tblspc); + while (my $oid = readdir($dir)) + { + next if ($oid !~ /^([0-9]+)$/); + my $linkpath = "$pg_tblspc/$oid"; + my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath); + $ret{$oid} = File::Basename::basename($tsppath); + } + closedir($dir); + + return %ret; +} + +=pod + =item $node->backup_dir() The output path for backups taken with $node->backup() @@ -313,6 +371,77 @@ sub backup_dir =pod +=item $node->backup_tablespace_storage_path(backup_name) + +Returns tablespace location path for backup_name. +Retuns the parent directory if backup_name is not given. + +=cut + +sub backup_tablespace_storage_path +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_dir . '/__tsps'; + + $dir .= "/$backup_name" if (defined $backup_name); + + return $dir; +} + +=pod + +=item $node->backup_create_tablespace_storage(backup_name) + +Create tablespace location directory for backup_name if not yet. +Create the parent tablespace storage that holds all location +directories if backup_name is not supplied. + +=cut + +sub backup_create_tablespace_storage +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_tablespace_storage_path($backup_name); + + File::Path::make_path $dir if (! -d $dir); +} + +=pod + +=item $node->backup_tablespaces(backup_name) + +Returns a reference to hash from tablespace OID to tablespace +directory name of tablespace directory that the specified backup has. +For example, an oid 16384 pointing to ../tsps/backup1/ts1 is stored as +$hash{16384} = "ts1". + +=cut + +sub backup_tablespaces +{ + my ($self, $backup_name) = @_; + my $pg_tblspc = $self->backup_dir . '/' . $backup_name . '/pg_tblspc'; + my %ret; + + #return undef if this backup holds no tablespaces + return undef if (! -d $self->backup_tablespace_storage_path($backup_name)); + + # scan pg_tblspc directory of the backup + opendir(my $dir, $pg_tblspc); + while (my $oid = readdir($dir)) + { + next if ($oid !~ /^([0-9]+)$/); + my $linkpath = "$pg_tblspc/$oid"; + my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath); + $ret{$oid} = File::Basename::basename($tsppath); + } + closedir($dir); + + return \%ret; +} + +=pod + =item $node->install_path() The configured install path (if any) for the node. @@ -370,6 +499,7 @@ sub info print $fh "Data directory: " . $self->data_dir . "\n"; print $fh "Backup directory: " . $self->backup_dir . "\n"; print $fh "Archive directory: " . $self->archive_dir . "\n"; + print $fh "Tablespace directory: " . $self->tablespace_storage . "\n"; print $fh "Connection string: " . $self->connstr . "\n"; print $fh "Log file: " . $self->logfile . "\n"; print $fh "Install Path: ", $self->{_install_path} . "\n" @@ -600,6 +730,43 @@ sub
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Thu, 20 Jan 2022 15:07:22 +0900 (JST), Kyotaro Horiguchi wrote in > This version works for Unixen but still doesn't for Windows. I'm > searching for a fix for Windows. And this version works for Windows. Maybe I've took a wrong version to post. dir_readlink manipulated target file (junction) name in the wrong way. CI now likes this version for all platforms. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 0423d2b9aae0620c07b522632a8074ecd8ffef64 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 11 Nov 2021 20:42:00 +0900 Subject: [PATCH v15 1/3] Add tablespace support to TAP framework TAP framework doesn't support nodes that have tablespaces. Especially backup and initialization from backups failed if the source node has tablespaces. This commit provides simple way to create tablespace directories and allows backup routines to handle tablespaces. --- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 2 +- src/test/perl/PostgreSQL/Test/Cluster.pm | 264 ++- src/test/perl/PostgreSQL/Test/Utils.pm | 42 +++ 3 files changed, 305 insertions(+), 3 deletions(-) diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index f0243f28d4..c139b5e000 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -257,7 +257,7 @@ $node->safe_psql('postgres', "CREATE TABLESPACE tblspc1 LOCATION '$realTsDir';"); $node->safe_psql('postgres', "CREATE TABLE test1 (a int) TABLESPACE tblspc1;" - . "INSERT INTO test1 VALUES (1234);"); + . "INSERT INTO test1 VALUES (1234);"); $node->backup('tarbackup2', backup_options => ['-Ft']); # empty test1, just so that it's different from the to-be-restored data $node->safe_psql('postgres', "TRUNCATE TABLE test1;"); diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index b7d4c24553..d433ccf610 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -298,6 +298,64 @@ sub archive_dir =pod +=item $node->tablespace_storage([, nocreate]) + +Diretory to store tablespace directories. +If nocreate is true, returns undef if not yet created. + +=cut + +sub tablespace_storage +{ + my ($self, $nocreate) = @_; + + if (!defined $self->{_tsproot}) + { + # tablespace is not used, return undef if nocreate is specified. + return undef if ($nocreate); + + # create and remember the tablespae root directotry. + $self->{_tsproot} = PostgreSQL::Test::Utils::tempdir_short(); + } + + return $self->{_tsproot}; +} + +=pod + +=item $node->tablespaces() + +Returns a hash from tablespace OID to tablespace directory name. For +example, an oid 16384 pointing to /tmp/jWAhkT_fs0/ts1 is stored as +$hash{16384} = "ts1". + +=cut + +sub tablespaces +{ + my ($self) = @_; + my $pg_tblspc = $self->data_dir . '/pg_tblspc'; + my %ret; + + # return undef if no tablespace is used + return undef if (!defined $self->tablespace_storage(1)); + + # collect tablespace entries in pg_tblspc directory + opendir(my $dir, $pg_tblspc); + while (my $oid = readdir($dir)) + { + next if ($oid !~ /^([0-9]+)$/); + my $linkpath = "$pg_tblspc/$oid"; + my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath); + $ret{$oid} = File::Basename::basename($tsppath); + } + closedir($dir); + + return %ret; +} + +=pod + =item $node->backup_dir() The output path for backups taken with $node->backup() @@ -313,6 +371,77 @@ sub backup_dir =pod +=item $node->backup_tablespace_storage_path(backup_name) + +Returns tablespace location path for backup_name. +Retuns the parent directory if backup_name is not given. + +=cut + +sub backup_tablespace_storage_path +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_dir . '/__tsps'; + + $dir .= "/$backup_name" if (defined $backup_name); + + return $dir; +} + +=pod + +=item $node->backup_create_tablespace_storage(backup_name) + +Create tablespace location directory for backup_name if not yet. +Create the parent tablespace storage that holds all location +directories if backup_name is not supplied. + +=cut + +sub backup_create_tablespace_storage +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_tablespace_storage_path($backup_name); + + File::Path::make_path $dir if (! -d $dir); +} + +=pod + +=item $node->backup_tablespaces(backup_name) + +Returns a reference to hash from tablespace OID to tablespace +directory name of tablespace directory that the specified backup has. +For example, an oid 16384 pointing to ../tsps/backup1/ts1 is stored as +$hash{16384} = "ts1". + +=cut + +sub backup_tablespaces +{ + my ($self, $backup_name) = @_; + my $pg_tblspc = $self->backup_dir . '/' . $backup_name . '/pg_tblspc'; + my %ret; + + #return undef if this backup holds no tablespaces + return undef if (! -d $self->backup_tablespace_storage_path($backup_name)); + + # scan pg_tblspc directory of the backup + opendir(my
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Sun, 16 Jan 2022 12:43:03 +0800, Julien Rouhaud wrote in > Other than that, I see that the TAP tests are failing on all the environment, > due to Perl errors. For instance: Perl seems to have changed its behavior for undef hash. It is said that "if (%undef_hash)" is false but actually it is true and "keys %undef_hash" is 1.. Finally I had to make backup_tablespaces() to return a hash reference. The test of pg_basebackup takes a backup with tar mode, which broke the test infrastructure. Cluster::backup now skips symlink adjustment when the backup contains "/base.tar". I gave up testing on Windows on my own environment and used Cirrus CI. # However, it works for confirmation of a established code. TAT of CI # is still long to do trial and error of unestablished code.. This version works for Unixen but still doesn't for Windows. I'm searching for a fix for Windows. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 5f88a80b9a585ca611ab6424f035330a47b2449f Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 11 Nov 2021 20:42:00 +0900 Subject: [PATCH v15 1/3] Add tablespace support to TAP framework TAP framework doesn't support nodes that have tablespaces. Especially backup and initialization from backups failed if the source node has tablespaces. This commit provides simple way to create tablespace directories and allows backup routines to handle tablespaces. --- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 2 +- src/test/perl/PostgreSQL/Test/Cluster.pm | 264 ++- src/test/perl/PostgreSQL/Test/Utils.pm | 43 +++ 3 files changed, 306 insertions(+), 3 deletions(-) diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index f0243f28d4..c139b5e000 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -257,7 +257,7 @@ $node->safe_psql('postgres', "CREATE TABLESPACE tblspc1 LOCATION '$realTsDir';"); $node->safe_psql('postgres', "CREATE TABLE test1 (a int) TABLESPACE tblspc1;" - . "INSERT INTO test1 VALUES (1234);"); + . "INSERT INTO test1 VALUES (1234);"); $node->backup('tarbackup2', backup_options => ['-Ft']); # empty test1, just so that it's different from the to-be-restored data $node->safe_psql('postgres', "TRUNCATE TABLE test1;"); diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index b7d4c24553..d433ccf610 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -298,6 +298,64 @@ sub archive_dir =pod +=item $node->tablespace_storage([, nocreate]) + +Diretory to store tablespace directories. +If nocreate is true, returns undef if not yet created. + +=cut + +sub tablespace_storage +{ + my ($self, $nocreate) = @_; + + if (!defined $self->{_tsproot}) + { + # tablespace is not used, return undef if nocreate is specified. + return undef if ($nocreate); + + # create and remember the tablespae root directotry. + $self->{_tsproot} = PostgreSQL::Test::Utils::tempdir_short(); + } + + return $self->{_tsproot}; +} + +=pod + +=item $node->tablespaces() + +Returns a hash from tablespace OID to tablespace directory name. For +example, an oid 16384 pointing to /tmp/jWAhkT_fs0/ts1 is stored as +$hash{16384} = "ts1". + +=cut + +sub tablespaces +{ + my ($self) = @_; + my $pg_tblspc = $self->data_dir . '/pg_tblspc'; + my %ret; + + # return undef if no tablespace is used + return undef if (!defined $self->tablespace_storage(1)); + + # collect tablespace entries in pg_tblspc directory + opendir(my $dir, $pg_tblspc); + while (my $oid = readdir($dir)) + { + next if ($oid !~ /^([0-9]+)$/); + my $linkpath = "$pg_tblspc/$oid"; + my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath); + $ret{$oid} = File::Basename::basename($tsppath); + } + closedir($dir); + + return %ret; +} + +=pod + =item $node->backup_dir() The output path for backups taken with $node->backup() @@ -313,6 +371,77 @@ sub backup_dir =pod +=item $node->backup_tablespace_storage_path(backup_name) + +Returns tablespace location path for backup_name. +Retuns the parent directory if backup_name is not given. + +=cut + +sub backup_tablespace_storage_path +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_dir . '/__tsps'; + + $dir .= "/$backup_name" if (defined $backup_name); + + return $dir; +} + +=pod + +=item $node->backup_create_tablespace_storage(backup_name) + +Create tablespace location directory for backup_name if not yet. +Create the parent tablespace storage that holds all location +directories if backup_name is not supplied. + +=cut + +sub backup_create_tablespace_storage +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_tablespace_storage_path($backup_name); + + File::Path::make_path $dir if (! -d $dir); +} + +=pod + +=item $node->backup_tablespaces(backup_name) + +Returns a reference to hash from tablespace OID to t
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Mon, 17 Jan 2022 17:24:43 +0900 (JST), Kyotaro Horiguchi wrote in > At Sun, 16 Jan 2022 12:43:03 +0800, Julien Rouhaud wrote > in > > I'm not very familiar with windows, but maybe using strawberry perl instead > > ([1]) would fix your problem? I think it's also quite popular and is > > commonly > > used to run pgBadger on Windows. > > Thanks! I'll try it later. Build is stopped by some unresolvable symbols. Strawberry perl is 5.28, which doesn't expose new_ctype, new_collate and new_numeric according the past discussion. (Active perl is 5.32). https://www.postgresql.org/message-id/20200501134711.08750c5f%40antares.wagner.home However, the patch provided revealed other around 70 unresolved symbol errors... # Hmm. perl on CentOS 8 is 5.26.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Sun, 16 Jan 2022 12:43:03 +0800, Julien Rouhaud wrote in > Hi, > > On Fri, Dec 24, 2021 at 07:21:59PM +0900, Kyotaro Horiguchi wrote: > > Just a complaint.. > > > > At Fri, 12 Nov 2021 16:43:27 +0900 (JST), Kyotaro Horiguchi > > wrote in > > > "" on Japanese (CP-932) environment. I didn't actually > > > tested it on Windows and msys environment ...yet. > > > > Active perl cannot be installed because of (perhaps) a powershell > > version issue... Annoying.. > > > > https://community.activestate.com/t/please-update-your-powershell-install-scripts/7897 > > I'm not very familiar with windows, but maybe using strawberry perl instead > ([1]) would fix your problem? I think it's also quite popular and is commonly > used to run pgBadger on Windows. Thanks! I'll try it later. > Other than that, I see that the TAP tests are failing on all the environment, > due to Perl errors. For instance: > > [04:06:00.848] [04:05:54] t/003_promote.pl . > [04:06:00.848] Dubious, test returned 2 (wstat 512, 0x200) > https://api.cirrus-ci.com/v1/artifact/task/4751213722861568/tap/src/bin/pg_basebackup/tmp_check/log/regress_log_020_pg_receivewal > # Initializing node "standby" from backup "my_backup" of node "primary" > Odd number of elements in hash assignment at > /tmp/cirrus-ci-build/src/bin/pg_ctl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm > line 996. > Use of uninitialized value in list assignment at > /tmp/cirrus-ci-build/src/bin/pg_ctl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm > line 996. > Use of uninitialized value $tsp in concatenation (.) or string at > /tmp/cirrus-ci-build/src/bin/pg_ctl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm > line 1008. > Use of uninitialized value $tsp in concatenation (.) or string at > /tmp/cirrus-ci-build/src/bin/pg_ctl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm > line 1009. > > That's apparently the same problem on every failure reported. > > Can you send a fixed patchset? In the meantime I will switch the cf entry to > Waiting on Author. I guess that failure came from a recent change that allows in-place tablespace directory. I'll check it out. Thanks! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Hi, On Fri, Dec 24, 2021 at 07:21:59PM +0900, Kyotaro Horiguchi wrote: > Just a complaint.. > > At Fri, 12 Nov 2021 16:43:27 +0900 (JST), Kyotaro Horiguchi > wrote in > > "" on Japanese (CP-932) environment. I didn't actually > > tested it on Windows and msys environment ...yet. > > Active perl cannot be installed because of (perhaps) a powershell > version issue... Annoying.. > > https://community.activestate.com/t/please-update-your-powershell-install-scripts/7897 I'm not very familiar with windows, but maybe using strawberry perl instead ([1]) would fix your problem? I think it's also quite popular and is commonly used to run pgBadger on Windows. Other than that, I see that the TAP tests are failing on all the environment, due to Perl errors. For instance: [04:06:00.848] [04:05:54] t/003_promote.pl . [04:06:00.848] Dubious, test returned 2 (wstat 512, 0x200) https://api.cirrus-ci.com/v1/artifact/task/4751213722861568/tap/src/bin/pg_basebackup/tmp_check/log/regress_log_020_pg_receivewal # Initializing node "standby" from backup "my_backup" of node "primary" Odd number of elements in hash assignment at /tmp/cirrus-ci-build/src/bin/pg_ctl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 996. Use of uninitialized value in list assignment at /tmp/cirrus-ci-build/src/bin/pg_ctl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 996. Use of uninitialized value $tsp in concatenation (.) or string at /tmp/cirrus-ci-build/src/bin/pg_ctl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 1008. Use of uninitialized value $tsp in concatenation (.) or string at /tmp/cirrus-ci-build/src/bin/pg_ctl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 1009. That's apparently the same problem on every failure reported. Can you send a fixed patchset? In the meantime I will switch the cf entry to Waiting on Author. [1] https://strawberryperl.com/
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Just a complaint.. At Fri, 12 Nov 2021 16:43:27 +0900 (JST), Kyotaro Horiguchi wrote in > "" on Japanese (CP-932) environment. I didn't actually > tested it on Windows and msys environment ...yet. Active perl cannot be installed because of (perhaps) a powershell version issue... Annoying.. https://community.activestate.com/t/please-update-your-powershell-install-scripts/7897 regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Thu, 11 Nov 2021 11:13:52 +0900 (JST), Kyotaro Horiguchi wrote in > At Wed, 10 Nov 2021 09:14:30 -0300, Alvaro Herrera > wrote in > > Can you use PostgreSQL::Test::Utils::tempdir_short() for those > > tablespaces? > > Thanks for the suggestion! > > It works for a live cluster. But doesn't work for backups, since I > find no way to relate a tablespace directory with a backup directory > not using a symlink. One way would be taking a backup with tentative > tablespace directory in the short-named temporary directory then move > it into the backup direcotry. I'm going that way for now. This is that. 0001 adds several routines to handle tablespace directories, and adds tablespace support to backup/_backup_fs. We don't know an oid corresponding to a tablespace directory before actually assigning the oid to the tablespace. So we cannot name a tablespace directory after the oid. On the other hand, after defining the tablespace, cold data files don't tell the real directory name of the tablespace directory for an oid or a tablespace name, unless we have readlink. The function dir_readlink added to Utils.pm is that. Honestly I don't like the way function works. It uses "cmd /c "dir /A:L $dir"" to collect information of junctions. I'm not sure that the type label "" is immutable among locales but at least it is shown as "" on Japanese (CP-932) environment. I didn't actually tested it on Windows and msys environment ...yet. Premising the availability of the function, we can name tablespace directories from meaningful words. The directory to store tablespace directories can be a temporary directory, but with that way it is needed to create a symlink to find those directories from a backup. I chose to place tablespace directories directly under backup directory. The attached first file is a revised (or remade) version of tablespace support for TAP test. The second is the version adapted to the revised framework. (I confirmed that the test actually detects the error.) The third is not changed at all. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 5381df72dff0f326ffd20ae212bc43aa54ee8a86 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 11 Nov 2021 20:42:00 +0900 Subject: [PATCH v14 1/3] Add tablespace support to TAP framework TAP framework doesn't support nodes that have tablespaces. Especially backup and initialization from backups failed if the source node has tablespaces. This commit provides simple way to create tablespace directories and allows backup routines to handle tablespaces. --- src/test/perl/PostgreSQL/Test/Cluster.pm | 262 ++- src/test/perl/PostgreSQL/Test/Utils.pm | 43 2 files changed, 303 insertions(+), 2 deletions(-) diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 9467a199c8..e195f11a23 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -287,6 +287,64 @@ sub archive_dir =pod +=item $node->tablespace_storage([, nocreate]) + +Diretory to store tablespace directories. +If nocreate is true, returns undef if not yet created. + +=cut + +sub tablespace_storage +{ + my ($self, $nocreate) = @_; + + if (!defined $self->{_tsproot}) + { + # tablespace is not used, return undef if nocreate is specified. + return undef if ($nocreate); + + # create and remember the tablespae root directotry. + $self->{_tsproot} = PostgreSQL::Test::Utils::tempdir_short(); + } + + return $self->{_tsproot}; +} + +=pod + +=item $node->tablespaces() + +Returns a hash from tablespace OID to tablespace directory name. For +example, an oid 16384 pointing to /tmp/jWAhkT_fs0/ts1 is stored as +$hash{16384} = "ts1". + +=cut + +sub tablespaces +{ + my ($self) = @_; + my $pg_tblspc = $self->data_dir . '/pg_tblspc'; + my %ret; + + # return undef if no tablespace is used + return undef if (!defined $self->tablespace_storage(1)); + + # collect tablespace entries in pg_tblspc directory + opendir(my $dir, $pg_tblspc); + while (my $oid = readdir($dir)) + { + next if ($oid !~ /^([0-9]+)$/); + my $linkpath = "$pg_tblspc/$oid"; + my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath); + $ret{$oid} = File::Basename::basename($tsppath); + } + closedir($dir); + + return %ret; +} + +=pod + =item $node->backup_dir() The output path for backups taken with $node->backup() @@ -302,6 +360,77 @@ sub backup_dir =pod +=item $node->backup_tablespace_storage_path(backup_name) + +Returns tablespace location path for backup_name. +Retuns the parent directory if backup_name is not given. + +=cut + +sub backup_tablespace_storage_path +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_dir . '/__tsps'; + + $dir .= "/$backup_name" if (defined $backup_name); + + return $dir; +} + +=pod + +=item $node->backup_create_tablespace_storage(backup_name) + +Create tablespace location directory for backup_name if not yet. +Create the parent tables
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Wed, 10 Nov 2021 09:14:30 -0300, Alvaro Herrera wrote in > Can you use PostgreSQL::Test::Utils::tempdir_short() for those > tablespaces? Thanks for the suggestion! It works for a live cluster. But doesn't work for backups, since I find no way to relate a tablespace directory with a backup directory not using a symlink. One way would be taking a backup with tentative tablespace directory in the short-named temporary directory then move it into the backup direcotry. I'm going that way for now. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2021-Nov-10, Kyotaro Horiguchi wrote: > I bumped into the good-old 100-byte limit of the (v7?) tar format on > which pg_basebackup is depending. It is unlikely in the real world but > I think it is quite common in developping environment. The tablespace > directory path in my dev environment was 110 chacters-long. As small > as 10 bytes but it's quite annoying to chip off that number of bytes > from the path.. Can you use PostgreSQL::Test::Utils::tempdir_short() for those tablespaces? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Tue, 09 Nov 2021 17:05:49 +0900 (JST), Kyotaro Horiguchi wrote in > At Tue, 9 Nov 2021 12:51:15 +0900, Michael Paquier > wrote in > > Look at Utils.pm where we have dir_symlink, then. symlink() does not > > work on WIN32, so we have a wrapper that uses junction points. FWIW, > > I don't like much the behavior you are enforcing in init_from_backup > > when coldly copying a source path, but I have not looked enough at the > > patch set to have a strong opinion about this part, either. > > Thanks for the info. If we can handle symlink on Windows, we don't > need to have a cold copy. I bumped into the good-old 100-byte limit of the (v7?) tar format on which pg_basebackup is depending. It is unlikely in the real world but I think it is quite common in developping environment. The tablespace directory path in my dev environment was 110 chacters-long. As small as 10 bytes but it's quite annoying to chip off that number of bytes from the path.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Tue, 9 Nov 2021 12:51:15 +0900, Michael Paquier wrote in > On Mon, Nov 08, 2021 at 05:55:16PM +0900, Kyotaro Horiguchi wrote: > > I have quickly looked at the patch set. > > > 0001: (I don't remember about this, though) I don't see how to make it > > work on Windows. Anyway the next step would be to write comments. > > Look at Utils.pm where we have dir_symlink, then. symlink() does not > work on WIN32, so we have a wrapper that uses junction points. FWIW, > I don't like much the behavior you are enforcing in init_from_backup > when coldly copying a source path, but I have not looked enough at the > patch set to have a strong opinion about this part, either. Thanks for the info. If we can handle symlink on Windows, we don't need to have a cold copy. > > 0002: I didn't see it in details and didn't check if it finds the > > issue but it actually scceeds with the fix. The change to > > poll_query_until is removed since it doesn't seem actually used. > > +# Create tablespace > +my $dropme_ts_master1 = PostgreSQL::Test::Utils::tempdir(); > +$dropme_ts_master1 = > PostgreSQL::Test::Utils::perl2host($dropme_ts_master1); > +my $dropme_ts_master2 = PostgreSQL::Test::Utils::tempdir(); > +$dropme_ts_master2 = > PostgreSQL::Test::Utils::perl2host($dropme_ts_master2); > +my $source_ts_master = PostgreSQL::Test::Utils::tempdir(); > +$source_ts_master = > PostgreSQL::Test::Utils::perl2host($source_ts_master); > +my $target_ts_master = PostgreSQL::Test::Utils::tempdir(); > +$target_ts_master = > PostgreSQL::Test::Utils::perl2host($target_ts_master); > > Rather than creating N temporary directories, it would be simpler to > create only one, and have subdirs in it for the rest? It seems to me > that it would make debugging much easier. The uses of perl2host() > seem sufficient. Thanks for the suggestion. My eyeballs got hopping around looking that part so I gave up looking there in more detail:p I agree to that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Mon, Nov 08, 2021 at 05:55:16PM +0900, Kyotaro Horiguchi wrote: I have quickly looked at the patch set. > 0001: (I don't remember about this, though) I don't see how to make it > work on Windows. Anyway the next step would be to write comments. Look at Utils.pm where we have dir_symlink, then. symlink() does not work on WIN32, so we have a wrapper that uses junction points. FWIW, I don't like much the behavior you are enforcing in init_from_backup when coldly copying a source path, but I have not looked enough at the patch set to have a strong opinion about this part, either. > 0002: I didn't see it in details and didn't check if it finds the > issue but it actually scceeds with the fix. The change to > poll_query_until is removed since it doesn't seem actually used. +# Create tablespace +my $dropme_ts_master1 = PostgreSQL::Test::Utils::tempdir(); +$dropme_ts_master1 = PostgreSQL::Test::Utils::perl2host($dropme_ts_master1); +my $dropme_ts_master2 = PostgreSQL::Test::Utils::tempdir(); +$dropme_ts_master2 = PostgreSQL::Test::Utils::perl2host($dropme_ts_master2); +my $source_ts_master = PostgreSQL::Test::Utils::tempdir(); +$source_ts_master = PostgreSQL::Test::Utils::perl2host($source_ts_master); +my $target_ts_master = PostgreSQL::Test::Utils::tempdir(); +$target_ts_master = PostgreSQL::Test::Utils::perl2host($target_ts_master); Rather than creating N temporary directories, it would be simpler to create only one, and have subdirs in it for the rest? It seems to me that it would make debugging much easier. The uses of perl2host() seem sufficient. -- Michael signature.asc Description: PGP signature
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Thu, 4 Nov 2021 13:34:33 +0100, Daniel Gustafsson wrote in > This patch again fails to apply (seemingly from the Perl namespace work on the > testcode), and needs a few updates as per the above review. Rebased the latest patch removing some of the chages. 0001: (I don't remember about this, though) I don't see how to make it work on Windows. Anyway the next step would be to write comments. 0002: I didin't see it in details and didn't check if it finds the issue but it actually scceeds with the fix. The change to poll_query_until is removed since it doesn't seem actually used. 0003: The fix. I didn't touch this. 0004: Removed at all. I agree to Tom. (And I faintly remember that I said something like that.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From aa6b0b94e42550f23c4cecfa23ea1face7d71bc6 Mon Sep 17 00:00:00 2001 From: Asim R P Date: Mon, 8 Nov 2021 17:32:30 +0900 Subject: [PATCH v13 1/3] Support node initialization from backup with tablespaces User defined tablespaces appear as symlinks in in the backup. This commit tweaks recursive copy subroutine to allow for symlinks specific to tablespaces. --- src/test/perl/PostgreSQL/Test/Cluster.pm | 29 +++- .../perl/PostgreSQL/Test/RecursiveCopy.pm | 45 --- 2 files changed, 66 insertions(+), 8 deletions(-) diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 9467a199c8..19a667ebe4 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -634,6 +634,32 @@ sub backup_fs_cold return; } +sub _srcsymlink +{ + my ($srcpath, $destpath) = @_; + + croak "Cannot operate on symlink \"$srcpath\"" + if ($srcpath !~ qr{/(pg_tblspc/[0-9]+)$}); + + # We have mapped tablespaces. Copy them individually + my $tmpdir = PostgreSQL::Test::Utils::tempdir(); + my $dstrealdir = PostgreSQL::Test::Utils::perl2host($tmpdir); + my $srcrealdir = readlink($srcpath); + + opendir(my $dh, $srcrealdir); + while (my $entry = (readdir $dh)) + { + next if ($entry eq '.' or $entry eq '..'); + my $spath = "$srcrealdir/$entry"; + my $dpath = "$dstrealdir/$entry"; + PostgreSQL::Test::RecursiveCopy::copypath($spath, $dpath); + } + closedir $dh; + + symlink $dstrealdir, $destpath; + + return 1; +} # Common sub of backup_fs_hot and backup_fs_cold sub _backup_fs @@ -743,7 +769,8 @@ sub init_from_backup else { rmdir($data_path); - PostgreSQL::Test::RecursiveCopy::copypath($backup_path, $data_path); + PostgreSQL::Test::RecursiveCopy::copypath($backup_path, $data_path, +srcsymlinkfn => \&_srcsymlink); } chmod(0700, $data_path); diff --git a/src/test/perl/PostgreSQL/Test/RecursiveCopy.pm b/src/test/perl/PostgreSQL/Test/RecursiveCopy.pm index dd320a605e..2a636cef84 100644 --- a/src/test/perl/PostgreSQL/Test/RecursiveCopy.pm +++ b/src/test/perl/PostgreSQL/Test/RecursiveCopy.pm @@ -49,6 +49,11 @@ This subroutine will be called for each entry in the source directory with its relative path as only parameter; if the subroutine returns true the entry is copied, otherwise the file is skipped. +If the B parameter is given, it must be a subroutine reference. +This subroutine will be called when the source directory is a symlink. It +determines the mechanism that copies files from the source directory to the +destination directory. + On failure the target directory may be in some incomplete state; no cleanup is attempted. @@ -68,6 +73,7 @@ sub copypath { my ($base_src_dir, $base_dest_dir, %params) = @_; my $filterfn; + my $srcsymlinkfn; if (defined $params{filterfn}) { @@ -82,31 +88,55 @@ sub copypath $filterfn = sub { return 1; }; } + if (defined $params{srcsymlinkfn}) + { + croak "if specified, srcsymlinkfn must be a subroutine reference" + unless defined(ref $params{srcsymlinkfn}) + and (ref $params{srcsymlinkfn} eq 'CODE'); + + $srcsymlinkfn = $params{srcsymlinkfn}; + } + else + { + $srcsymlinkfn = undef; + } + # Complain if original path is bogus, because _copypath_recurse won't. croak "\"$base_src_dir\" does not exist" if !-e $base_src_dir; # Start recursive copy from current directory - return _copypath_recurse($base_src_dir, $base_dest_dir, "", $filterfn); + return _copypath_recurse($base_src_dir, $base_dest_dir, "", $filterfn, $srcsymlinkfn); } # Recursive private guts of copypath sub _copypath_recurse { - my ($base_src_dir, $base_dest_dir, $curr_path, $filterfn) = @_; + my ($base_src_dir, $base_dest_dir, $curr_path, $filterfn, + $srcsymlinkfn) = @_; my $srcpath = "$base_src_dir/$curr_path"; my $destpath = "$base_dest_dir/$curr_path"; # invoke the filter and skip all further operation if it returns false return 1 unless &$filterfn($curr_path); - # Check for symlink -- needed only on source dir - # (note: this will fall through quietly if file is already gone) - croak "Cannot operate on symlink \"$srcpath\"" if -l $srcpath; - # Abort if destina
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
> On 24 Sep 2021, at 20:14, Tom Lane wrote: > > Robert Haas writes: >> The commit message for 0001 is not clear enough for me to understand >> what problem it's supposed to be fixing. The code comments aren't >> really either. They make it sound like there's some problem with >> copying symlinks but mostly they just talk about callbacks, which >> doesn't really help me understand what problem we'd have if we just >> didn't commit this (or reverted it later). > >> I am not really convinced by Álvaro's claim that 0004 is a "fix"; I >> think I'd call it an improvement. But either way I agree that could >> just be committed. > >> I haven't analyzed 0002 and 0003 yet. > > I took a quick look through this: > > * I don't like 0001 either, though it seems like the issue is mostly > documentation. sub _srcsymlink should have a comment explaining > what it's doing and why. The documentation of copypath's new parameter > seems like gobbledegook too --- I suppose it should read more like > "By default, copypath fails if a source item is a symlink. But if > B is provided, that subroutine is called to process any > symlink." > > * I'm allergic to 0002's completely undocumented changes to > poll_query_until, especially since I don't see anything in the > patch that actually uses them. Can't we just drop these diffs > in PostgresNode.pm? BTW, the last error message in the patch, > talking about a 5-second timeout, seems wrong. With or without > these changes, poll_query_until's default timeout is 180 sec. > The actual test case might be okay other than that nit and a > comment typo or two. > > * 0003 might actually be okay. I've not read it line-by-line, > but it seems like it's implementing a sane solution and it's > adequately commented. > > * I'm inclined to reject 0004 out of hand, because I don't > agree with what it's doing. The purpose of the rmgrdesc > functions is to show you what is in the WAL records, and > everywhere else we interpret that as "show the verbatim, > numeric field contents". heapdesc.c, for example, doesn't > attempt to look up the name of the table being operated on. > 0004 isn't adhering to that style, and aside from being > inconsistent I'm afraid that it's adding failure modes > we don't want. This patch again fails to apply (seemingly from the Perl namespace work on the testcode), and needs a few updates as per the above review. -- Daniel Gustafsson https://vmware.com/
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Robert Haas writes: > The commit message for 0001 is not clear enough for me to understand > what problem it's supposed to be fixing. The code comments aren't > really either. They make it sound like there's some problem with > copying symlinks but mostly they just talk about callbacks, which > doesn't really help me understand what problem we'd have if we just > didn't commit this (or reverted it later). > I am not really convinced by Álvaro's claim that 0004 is a "fix"; I > think I'd call it an improvement. But either way I agree that could > just be committed. > I haven't analyzed 0002 and 0003 yet. I took a quick look through this: * I don't like 0001 either, though it seems like the issue is mostly documentation. sub _srcsymlink should have a comment explaining what it's doing and why. The documentation of copypath's new parameter seems like gobbledegook too --- I suppose it should read more like "By default, copypath fails if a source item is a symlink. But if B is provided, that subroutine is called to process any symlink." * I'm allergic to 0002's completely undocumented changes to poll_query_until, especially since I don't see anything in the patch that actually uses them. Can't we just drop these diffs in PostgresNode.pm? BTW, the last error message in the patch, talking about a 5-second timeout, seems wrong. With or without these changes, poll_query_until's default timeout is 180 sec. The actual test case might be okay other than that nit and a comment typo or two. * 0003 might actually be okay. I've not read it line-by-line, but it seems like it's implementing a sane solution and it's adequately commented. * I'm inclined to reject 0004 out of hand, because I don't agree with what it's doing. The purpose of the rmgrdesc functions is to show you what is in the WAL records, and everywhere else we interpret that as "show the verbatim, numeric field contents". heapdesc.c, for example, doesn't attempt to look up the name of the table being operated on. 0004 isn't adhering to that style, and aside from being inconsistent I'm afraid that it's adding failure modes we don't want. regards, tom lane
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Wed, Aug 11, 2021 at 3:59 AM Paul Guo wrote: > Thanks for reviewing. Let me explain a bit. The patch series includes > four patches. > > 0001 and 0002 are test changes for the fix (0003). >- 0001 is the test framework change that's needed by 0002. >- 0002 is the test for the code fix (0003). > 0003 is the code change and the commit message explains the issue in detail. > 0004 as said is a small enhancement which is a bit independent of the > previous patches. > > Basically the issue is that without the fix crash recovery might fail > relevant to tablespace. > Here is the log after I run the tests in 0001/0002 without the 0003 fix. I do understand all of this, but I (or whoever might commit this) needs to also be able to understand specifically what each patch is doing. -- Robert Haas EDB: http://www.enterprisedb.com
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Wed, Aug 11, 2021 at 4:56 AM Robert Haas wrote: > > On Thu, Aug 5, 2021 at 6:20 AM Paul Guo wrote: > > Rebased. > > The commit message for 0001 is not clear enough for me to understand > what problem it's supposed to be fixing. The code comments aren't > really either. They make it sound like there's some problem with > copying symlinks but mostly they just talk about callbacks, which > doesn't really help me understand what problem we'd have if we just > didn't commit this (or reverted it later). Thanks for reviewing. Let me explain a bit. The patch series includes four patches. 0001 and 0002 are test changes for the fix (0003). - 0001 is the test framework change that's needed by 0002. - 0002 is the test for the code fix (0003). 0003 is the code change and the commit message explains the issue in detail. 0004 as said is a small enhancement which is a bit independent of the previous patches. Basically the issue is that without the fix crash recovery might fail relevant to tablespace. Here is the log after I run the tests in 0001/0002 without the 0003 fix. 2021-08-04 10:00:42.231 CST [875] FATAL: could not create directory "pg_tblspc/16385/PG_15_202107261/16390": No such file or directory 2021-08-04 10:00:42.231 CST [875] CONTEXT: WAL redo at 0/3001320 for Database/CREATE: copy dir base/1 to pg_tblspc/16385/PG_15_202107261/16390 > > I am not really convinced by Álvaro's claim that 0004 is a "fix"; I > think I'd call it an improvement. But either way I agree that could > just be committed. > > I haven't analyzed 0002 and 0003 yet. > > -- > Robert Haas > EDB: http://www.enterprisedb.com > > -- Paul Guo (Vmware)
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Thu, Aug 5, 2021 at 6:20 AM Paul Guo wrote: > Rebased. The commit message for 0001 is not clear enough for me to understand what problem it's supposed to be fixing. The code comments aren't really either. They make it sound like there's some problem with copying symlinks but mostly they just talk about callbacks, which doesn't really help me understand what problem we'd have if we just didn't commit this (or reverted it later). I am not really convinced by Álvaro's claim that 0004 is a "fix"; I think I'd call it an improvement. But either way I agree that could just be committed. I haven't analyzed 0002 and 0003 yet. -- Robert Haas EDB: http://www.enterprisedb.com
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Rebased. v12-0002-Tests-to-replay-create-database-operation-on-sta.patch Description: v12-0002-Tests-to-replay-create-database-operation-on-sta.patch v12-0001-Support-node-initialization-from-backup-with-tab.patch Description: v12-0001-Support-node-initialization-from-backup-with-tab.patch v12-0003-Fix-replay-of-create-database-records-on-standby.patch Description: v12-0003-Fix-replay-of-create-database-records-on-standby.patch v12-0004-Fix-database-create-drop-wal-description.patch Description: v12-0004-Fix-database-create-drop-wal-description.patch
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Tue, Mar 30, 2021 at 12:12 PM Paul Guo wrote: > On 2021/3/27, 10:23 PM, "Alvaro Herrera" wrote: > > >Hmm, can you post a rebased set, where the points under discussion > > are marked in XXX comments explaining what the issue is? This thread > is > >long and old ago that it's pretty hard to navigate the whole thing in > >order to find out exactly what is being questioned. > > OK. Attached are the rebased version that includes the change I discussed > in my previous reply. Also added POD documentation change for > RecursiveCopy, > and modified the patch to use the backup_options introduced in > 081876d75ea15c3bd2ee5ba64a794fd8ea46d794 for tablespace mapping. > > >I think 0004 can be pushed without further ado, since it's a clear and > >simple fix. 0001 needs a comment about the new parameter in > >RecursiveCopy's POD documentation. > > Yeah, 0004 is no any risky. One concern seemed to be the compatibility of > some > WAL dump/analysis tools(?). I have no idea about this. But if we do not > backport > 0004 we do not seem to need to worry about this. > > >As I understand, this is a backpatchable bug-fix. > > Yes. > > Thanks. > > Patch does not apply successfully, http://cfbot.cputube.org/patch_33_2161.log Can you please rebase the patch. -- Ibrar Ahmed
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2021/3/27, 10:23 PM, "Alvaro Herrera" wrote: >Hmm, can you post a rebased set, where the points under discussion > are marked in XXX comments explaining what the issue is? This thread is >long and old ago that it's pretty hard to navigate the whole thing in >order to find out exactly what is being questioned. OK. Attached are the rebased version that includes the change I discussed in my previous reply. Also added POD documentation change for RecursiveCopy, and modified the patch to use the backup_options introduced in 081876d75ea15c3bd2ee5ba64a794fd8ea46d794 for tablespace mapping. >I think 0004 can be pushed without further ado, since it's a clear and >simple fix. 0001 needs a comment about the new parameter in >RecursiveCopy's POD documentation. Yeah, 0004 is no any risky. One concern seemed to be the compatibility of some WAL dump/analysis tools(?). I have no idea about this. But if we do not backport 0004 we do not seem to need to worry about this. >As I understand, this is a backpatchable bug-fix. Yes. Thanks. v11-0001-Support-node-initialization-from-backup-with-tab.patch Description: v11-0001-Support-node-initialization-from-backup-with-tab.patch v11-0002-Tests-to-replay-create-database-operation-on-sta.patch Description: v11-0002-Tests-to-replay-create-database-operation-on-sta.patch v11-0003-Fix-replay-of-create-database-records-on-standby.patch Description: v11-0003-Fix-replay-of-create-database-records-on-standby.patch v11-0004-Fix-database-create-drop-wal-description.patch Description: v11-0004-Fix-database-create-drop-wal-description.patch
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2021-Jan-27, Paul Guo wrote: > Here is a git diff against the previous patch. I’ll send out the new > rebased patches after the consensus is reached. Hmm, can you post a rebased set, where the points under discussion are marked in XXX comments explaining what the issue is? This thread is long and old ago that it's pretty hard to navigate the whole thing in order to find out exactly what is being questioned. I think 0004 can be pushed without further ado, since it's a clear and simple fix. 0001 needs a comment about the new parameter in RecursiveCopy's POD documentation. As I understand, this is a backpatchable bug-fix. -- Álvaro Herrera39°49'30"S 73°17'W
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Thanks for the review, please see the replies below. > On Jan 5, 2021, at 9:07 AM, Kyotaro Horiguchi wrote: > > At Wed, 8 Jul 2020 12:56:44 +, Paul Guo wrote in >> On 2020/01/15 19:18, Paul Guo wrote: >> I further fixed the last test failure (due to a small bug in the test, not >> in code). Attached are the new patch series. Let's see the CI pipeline >> result. >> >> Thanks for updating the patches! >> >> I started reading the 0003 patch. >> >> The approach that the 0003 patch uses is not the perfect solution. >> If the standby crashes after tblspc_redo() removes the directory and before >> its subsequent COMMIT record is replayed, PANIC error would occur since >> there can be some unresolved missing directory entries when we reach the >> consistent state. The problem would very rarely happen, though... >> Just idea; calling XLogFlush() to update the minimum recovery point just >> before tblspc_redo() performs destroy_tablespace_directories() may be >> safe and helpful for the problem? > > It seems to me that what the current patch does is too complex. What > we need to do here is to remember every invalid operation then forget > it when the prerequisite object is dropped. > > When a table space is dropped before consistency is established, we > don't need to care what has been performed inside the tablespace. In > this perspective, it is enough to remember tablespace ids when failed > to do something inside it due to the absence of the tablespace and > then forget it when we remove it. We could remember individual > database id to show them in error messages, but I'm not sure it's > useful. The reason log_invalid_page records block numbers is to allow > the machinery handle partial table truncations, but this is not the > case since dropping tablespace cannot leave some of containing > databases. > > As the result, we won't see an unresolved invalid operations in a > dropped tablespace. > > Am I missing something? Yes, removing the database id from the hash key in the log/forget code should be usually fine, but the previous code does stricter/safer checking. Consider the scenario: CREATE DATABASE newdb1 TEMPLATE template_db1; CREATE DATABASE newdb2 TEMPLATE template_db2; <- in case the template_db2 database directory is missing abnormally somehow. DROP DATABASE template_db1; The previous code could detect this but if we remove the database id in the code, this bad scenario is skipped. > > > dbase_redo: > + if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode))) > + { > +XLogRecordMissingDir(xlrec->tablespace_id, InvalidOid, parent_path); > > This means "record the belonging table space directory if it is not > found OR it is not a directory". The former can be valid but the > latter is unconditionally can not (I don't think we bother considering > symlinks there). Again this is a safer check, in the case the parent_path is a file for example somehow, we should panic finally for the case and let the user checks and then does recovery again. > > +/* > + * Source directory may be missing. E.g. the template database used > + * for creating this database may have been dropped, due to reasons > + * noted above. Moving a database from one tablespace may also be a > + * partner in the crime. > + */ > +if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode))) > +{ > + XLogLogMissingDir(xlrec->src_tablespace_id, xlrec->src_db_id, > src_path); > > This is a part of *creation* of the target directory. Lack of the > source directory cannot be valid even if the source directory is > dropped afterwards in the WAL stream and we can allow that if the > *target* tablespace is dropped afterwards. As the result, as I > mentioned above, we don't need to record about the database directory. > > By the way the name XLogLogMiss.. is somewhat confusing. How about > XLogReportMissingDir (named after report_invalid_page). Agree with you. Also your words remind me that we should skip the checking if the consistency point is reached. Here is a git diff against the previous patch. I’ll send out the new rebased patches after the consensus is reached. diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 7ade385965..c8fe3fe228 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -90,7 +90,7 @@ typedef struct xl_missing_dir static HTAB *missing_dir_tab = NULL; void -XLogLogMissingDir(Oid spcNode, Oid dbNode, char *path) +XLogReportMissingDir(Oid spcNode, Oid dbNode, char *path) { xl_missing_dir_key key; bool found; @@ -103,16 +103,6 @@ XLogLogMissingDir(Oid spcNode, Oid dbNode, char *path) */ Assert(OidIsValid(spcNode)); - if (reachedConsistency) - { - if (dbNode == InvalidOid) - elog(PANIC, "cannot find directory %s (tablespace %d)", -
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Wed, 8 Jul 2020 12:56:44 +, Paul Guo wrote in > On 2020/01/15 19:18, Paul Guo wrote: > I further fixed the last test failure (due to a small bug in the test, not in > code). Attached are the new patch series. Let's see the CI pipeline result. > > Thanks for updating the patches! > > I started reading the 0003 patch. > > The approach that the 0003 patch uses is not the perfect solution. > If the standby crashes after tblspc_redo() removes the directory and before > its subsequent COMMIT record is replayed, PANIC error would occur since > there can be some unresolved missing directory entries when we reach the > consistent state. The problem would very rarely happen, though... > Just idea; calling XLogFlush() to update the minimum recovery point just > before tblspc_redo() performs destroy_tablespace_directories() may be > safe and helpful for the problem? It seems to me that what the current patch does is too complex. What we need to do here is to remember every invalid operation then forget it when the prerequisite object is dropped. When a table space is dropped before consistency is established, we don't need to care what has been performed inside the tablespace. In this perspective, it is enough to remember tablespace ids when failed to do something inside it due to the absence of the tablespace and then forget it when we remove it. We could remember individual database id to show them in error messages, but I'm not sure it's useful. The reason log_invalid_page records block numbers is to allow the machinery handle partial table truncations, but this is not the case since dropping tablespace cannot leave some of containing databases. As the result, we won't see an unresolved invalid operations in a dropped tablespace. Am I missing something? dbase_redo: + if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode))) + { +XLogRecordMissingDir(xlrec->tablespace_id, InvalidOid, parent_path); This means "record the belonging table space directory if it is not found OR it is not a directory". The former can be valid but the latter is unconditionally can not (I don't think we bother considering symlinks there). +/* + * Source directory may be missing. E.g. the template database used + * for creating this database may have been dropped, due to reasons + * noted above. Moving a database from one tablespace may also be a + * partner in the crime. + */ +if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode))) +{ + XLogLogMissingDir(xlrec->src_tablespace_id, xlrec->src_db_id, src_path); This is a part of *creation* of the target directory. Lack of the source directory cannot be valid even if the source directory is dropped afterwards in the WAL stream and we can allow that if the *target* tablespace is dropped afterwards. As the result, as I mentioned above, we don't need to record about the database directory. By the way the name XLogLogMiss.. is somewhat confusing. How about XLogReportMissingDir (named after report_invalid_page). regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Looks like my previous reply was held for moderation (maybe due to my new email address). I configured my pg account today using the new email address. I guess this email would be held for moderation. I’m now replying my previous reply email and attaching the new patch series. On Jul 6, 2020, at 10:18 AM, Paul Guo mailto:gu...@vmware.com>> wrote: Thanks for the review. I’m now re-picking up the work. I modified the code following the comments. Besides, I tweaked the test code a bit. There are several things I’m not 100% sure. Please see my replies below. On Jan 27, 2020, at 11:24 PM, Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote: On 2020/01/15 19:18, Paul Guo wrote: I further fixed the last test failure (due to a small bug in the test, not in code). Attached are the new patch series. Let's see the CI pipeline result. Thanks for updating the patches! I started reading the 0003 patch. The approach that the 0003 patch uses is not the perfect solution. If the standby crashes after tblspc_redo() removes the directory and before its subsequent COMMIT record is replayed, PANIC error would occur since there can be some unresolved missing directory entries when we reach the consistent state. The problem would very rarely happen, though... Just idea; calling XLogFlush() to update the minimum recovery point just before tblspc_redo() performs destroy_tablespace_directories() may be safe and helpful for the problem? Yes looks like an issue. My understanding is the below scenario. XLogLogMissingDir() XLogFlush() in redo (e.g. in a commit redo). <- create a minimum recovery point (we call it LSN_A). tblspc_redo()->XLogForgetMissingDir() <- If we panic immediately after we remove the directory in tblspc_redo() <- when we do replay during crash-recovery, we will check consistency at LSN_A and thus PANIC inXLogCheckMissingDirs() commit We should add a XLogFlush() in tblspc_redo(). This brings several other questions to my minds also. 1. Should we call XLogFlush() in dbase_redo() for XLOG_DBASE_DROP also? It calls both XLogDropDatabase() and XLogForgetMissingDir, which seem to have this issue also? 2. xact_redo_abort() calls DropRelationFiles() also. Why do not we call XLogFlush() there? - appendStringInfo(buf, "copy dir %u/%u to %u/%u", - xlrec->src_tablespace_id, xlrec->src_db_id, - xlrec->tablespace_id, xlrec->db_id); + dbpath1 = GetDatabasePath(xlrec->src_db_id, xlrec->src_tablespace_id); + dbpath2 = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id); + appendStringInfo(buf, "copy dir %s to %s", dbpath1, dbpath2); + pfree(dbpath2); + pfree(dbpath1); If the patch is for the bug fix and would be back-ported, the above change would lead to change pg_waldump's output for CREATE/DROP DATABASE between minor versions. IMO it's better to avoid such change and separate the above as a separate patch only for master. I know we do not want wal format between minor releases, but does wal description string change between minor releases affect users? Anyone I’ll extract this part into a separate patch in the series since this change is actually independent of the other changes.. - appendStringInfo(buf, " %u/%u", - xlrec->tablespace_ids[i], xlrec->db_id); + { + dbpath1 = GetDatabasePath(xlrec->db_id, xlrec->tablespace_ids[i]); + appendStringInfo(buf, "%s", dbpath1); + pfree(dbpath1); + } Same as above. BTW, the above "%s" should be " %s", i.e., a space character needs to be appended to the head of "%s”. OK + get_parent_directory(parent_path); + if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode))) + { + XLogLogMissingDir(xlrec->tablespace_id, InvalidOid, dst_path); The third argument of XLogLogMissingDir() should be parent_path instead of dst_path? The argument is for debug message printing so both should be fine, but admittedly we are logging for the tablespace directory so parent_path might be better. + if (hash_search(missing_dir_tab, &key, HASH_REMOVE, NULL) == NULL) + elog(DEBUG2, "dir %s tablespace %d database %d is not missing", + path, spcNode, dbNode); I think that this elog() is useless and rather confusing. OK. Modified. + XLogForgetMissingDir(xlrec->ts_id, InvalidOid, ""); The third argument should be set to the actual path instead of an empty string. Otherwise XLogForgetMissingDir() may emit a confusing DEBUG2 message. Or the third argument of XLogForgetMissingDir() should be removed and the path in the DEBUG2 message should be calculated from the spcNode and dbNode in the hash entry in XLogForgetMissingDir(). I’m now removing the third argument. Use GetDatabasePath() to get the path if database did I snot InvalidOid. +#include "common/file_perm.h" This seems not necessary. Right. v10-0001-Support-node-initialization-from-backup-with-tab.patch Description: v10-0001-Support-node-initialization-from-backup-with-tab.patch v10-0002-Tests-to-replay-create-database-operation-on-sta.patch Description: v10-0002-Tests-to-repla
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
> On 25 Mar 2020, at 06:52, Fujii Masao wrote: > > On 2020/01/28 0:24, Fujii Masao wrote: >> On 2020/01/15 19:18, Paul Guo wrote: >>> I further fixed the last test failure (due to a small bug in the test, not >>> in code). Attached are the new patch series. Let's see the CI pipeline >>> result. >> Thanks for updating the patches! >> I started reading the 0003 patch. > > I marked this patch as Waiting on Author in CF because there is no update > since my last review comments. Could you mark it as Needs Review again > if you post the updated version of the patch. This thread has been stalled since effectively January, so I'm marking this patch Returned with Feedback. Feel free to open a new entry once the review comments have been addressed. cheers ./daniel
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2020/01/28 0:24, Fujii Masao wrote: On 2020/01/15 19:18, Paul Guo wrote: I further fixed the last test failure (due to a small bug in the test, not in code). Attached are the new patch series. Let's see the CI pipeline result. Thanks for updating the patches! I started reading the 0003 patch. I marked this patch as Waiting on Author in CF because there is no update since my last review comments. Could you mark it as Needs Review again if you post the updated version of the patch. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2020/01/15 19:18, Paul Guo wrote: I further fixed the last test failure (due to a small bug in the test, not in code). Attached are the new patch series. Let's see the CI pipeline result. Thanks for updating the patches! I started reading the 0003 patch. The approach that the 0003 patch uses is not the perfect solution. If the standby crashes after tblspc_redo() removes the directory and before its subsequent COMMIT record is replayed, PANIC error would occur since there can be some unresolved missing directory entries when we reach the consistent state. The problem would very rarely happen, though... Just idea; calling XLogFlush() to update the minimum recovery point just before tblspc_redo() performs destroy_tablespace_directories() may be safe and helpful for the problem? - appendStringInfo(buf, "copy dir %u/%u to %u/%u", -xlrec->src_tablespace_id, xlrec->src_db_id, -xlrec->tablespace_id, xlrec->db_id); + dbpath1 = GetDatabasePath(xlrec->src_db_id, xlrec->src_tablespace_id); + dbpath2 = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id); + appendStringInfo(buf, "copy dir %s to %s", dbpath1, dbpath2); + pfree(dbpath2); + pfree(dbpath1); If the patch is for the bug fix and would be back-ported, the above change would lead to change pg_waldump's output for CREATE/DROP DATABASE between minor versions. IMO it's better to avoid such change and separate the above as a separate patch only for master. - appendStringInfo(buf, " %u/%u", -xlrec->tablespace_ids[i], xlrec->db_id); + { + dbpath1 = GetDatabasePath(xlrec->db_id, xlrec->tablespace_ids[i]); + appendStringInfo(buf, "%s", dbpath1); + pfree(dbpath1); + } Same as above. BTW, the above "%s" should be " %s", i.e., a space character needs to be appended to the head of "%s". + get_parent_directory(parent_path); + if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode))) + { + XLogLogMissingDir(xlrec->tablespace_id, InvalidOid, dst_path); The third argument of XLogLogMissingDir() should be parent_path instead of dst_path? + if (hash_search(missing_dir_tab, &key, HASH_REMOVE, NULL) == NULL) + elog(DEBUG2, "dir %s tablespace %d database %d is not missing", +path, spcNode, dbNode); I think that this elog() is useless and rather confusing. + XLogForgetMissingDir(xlrec->ts_id, InvalidOid, ""); The third argument should be set to the actual path instead of an empty string. Otherwise XLogForgetMissingDir() may emit a confusing DEBUG2 message. Or the third argument of XLogForgetMissingDir() should be removed and the path in the DEBUG2 message should be calculated from the spcNode and dbNode in the hash entry in XLogForgetMissingDir(). +#include "common/file_perm.h" This seems not necessary. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
I further fixed the last test failure (due to a small bug in the test, not in code). Attached are the new patch series. Let's see the CI pipeline result. v9-0001-Support-node-initialization-from-backup-with-tabl.patch Description: Binary data v9-0003-Fix-replay-of-create-database-records-on-standby.patch Description: Binary data v9-0002-Tests-to-replay-create-database-operation-on-stan.patch Description: Binary data
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Fri, Jan 10, 2020 at 9:43 PM Alvaro Herrera wrote: > On 2020-Jan-09, Alvaro Herrera wrote: > > > I looked at this a little while and was bothered by the perl changes; it > > seems out of place to have RecursiveCopy be thinking about tablespaces, > > which is way out of its league. So I rewrote that to use a callback: > > the PostgresNode code passes a callback that's in charge to handle the > > case of a symlink. Things look much more in place with that. I didn't > > verify that all places that should use this are filled. > > > > In 0002 I found adding a new function unnecessary: we can keep backwards > > compat by checking 'ref' of the third argument. With that we don't have > > to add a new function. (POD changes pending.) > > I forgot to add that something in these changes is broken (probably the > symlink handling callback) so the tests fail, but I couldn't stay away > from my daughter's birthday long enough to figure out what or how. I'm > on something else today, so if one of you can research and submit fixed > versions, that'd be great. > > Thanks, > I spent some time on this before getting off work today. With below fix, the 4th test is now ok but the 5th (last one) hangs due to panic. (gdb) bt #0 0x003397e32625 in raise () from /lib64/libc.so.6 #1 0x003397e33e05 in abort () from /lib64/libc.so.6 #2 0x00a90506 in errfinish (dummy=0) at elog.c:590 #3 0x00a92b4b in elog_finish (elevel=22, fmt=0xb2d580 "cannot find directory %s tablespace %d database %d") at elog.c:1465 #4 0x0057aa0a in XLogLogMissingDir (spcNode=16384, dbNode=0, path=0x1885100 "pg_tblspc/16384/PG_13_202001091/16389") at xlogutils.c:104 #5 0x0065e92e in dbase_redo (record=0x1841568) at dbcommands.c:2225 #6 0x0056ac94 in StartupXLOG () at xlog.c:7200 diff --git a/src/include/commands/dbcommands.h b/src/include/commands/dbcommands.h index b71b400e700..f8f6d5ffd03 100644 --- a/src/include/commands/dbcommands.h +++ b/src/include/commands/dbcommands.h @@ -19,8 +19,6 @@ #include "lib/stringinfo.h" #include "nodes/parsenodes.h" -extern void CheckMissingDirs4DbaseRedo(void); - extern Oid createdb(ParseState *pstate, const CreatedbStmt *stmt); extern void dropdb(const char *dbname, bool missing_ok, bool force); extern void DropDatabase(ParseState *pstate, DropdbStmt *stmt); diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index e6e7ea505d9..4eef8bb1985 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -615,11 +615,11 @@ sub _srcsymlink my $srcrealdir = readlink($srcpath); opendir(my $dh, $srcrealdir); - while (readdir $dh) + while (my $entry = (readdir $dh)) { - next if (/^\.\.?$/); - my $spath = "$srcrealdir/$_"; - my $dpath = "$dstrealdir/$_"; + next if ($entry eq '.' or $entry eq '..'); + my $spath = "$srcrealdir/$entry"; + my $dpath = "$dstrealdir/$entry"; RecursiveCopy::copypath($spath, $dpath); } closedir $dh;