Hello postgres hackers, Recently my colleagues and I encountered an issue: a standby can not recover after an unclean shutdown and it's related to tablespace. The issue is that the standby re-replay some xlog that needs tablespace directories (e.g. create a database with tablespace), but the tablespace directories has already been removed in the previous replay.
In details, the standby normally finishes replaying for the below operations, but due to unclean shutdown, the redo lsn is not updated in pg_control and is still kept a value before the 'create db with tabspace' xlog, however since the tablespace directories were removed so it reports error when repay the database create wal. create db with tablespace drop database drop tablespace. Here is the log on the standby. 2019-04-17 14:52:14.926 CST [23029] LOG: starting PostgreSQL 12devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit 2019-04-17 14:52:14.927 CST [23029] LOG: listening on IPv4 address "192.168.35.130", port 5432 2019-04-17 14:52:14.929 CST [23029] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2019-04-17 14:52:14.943 CST [23030] LOG: database system was interrupted while in recovery at log time 2019-04-17 14:48:27 CST 2019-04-17 14:52:14.943 CST [23030] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2019-04-17 14:52:14.949 CST [23030] LOG: entering standby mode 2019-04-17 14:52:14.950 CST [23030] LOG: redo starts at 0/30105B8 2019-04-17 14:52:14.951 CST [23030] FATAL: could not create directory "pg_tblspc/65546/PG_12_201904072/65547": No such file or directory 2019-04-17 14:52:14.951 CST [23030] CONTEXT: WAL redo at 0/3011650 for Database/CREATE: copy dir 1663/1 to 65546/65547 2019-04-17 14:52:14.951 CST [23029] LOG: startup process (PID 23030) exited with exit code 1 2019-04-17 14:52:14.951 CST [23029] LOG: terminating any other active server processes 2019-04-17 14:52:14.953 CST [23029] LOG: database system is shut down Steps to reprodce: 1. setup a master and standby. 2. On both side, run: mkdir /tmp/some_isolation2_pg_basebackup_tablespace 3. Run SQLs: drop tablespace if exists some_isolation2_pg_basebackup_tablespace; create tablespace some_isolation2_pg_basebackup_tablespace location '/tmp/some_isolation2_pg_basebackup_tablespace'; 3. Clean shutdown and restart both postgres instances. 4. Run the following SQLs: drop database if exists some_database_with_tablespace; create database some_database_with_tablespace tablespace some_isolation2_pg_basebackup_tablespace; drop database some_database_with_tablespace; drop tablespace some_isolation2_pg_basebackup_tablespace; \! pkill -9 postgres; ssh host70 pkill -9 postgres Note immediate shutdown via pg_ctl should also be able to reproduce and the above steps probably does not 100% reproduce. I created an initial patch for this issue (see the attachment). The idea is re-creating those directories recursively. The above issue exists in dbase_redo(), but TablespaceCreateDbspace (for relation file create redo) is probably buggy also so I modified that function also. Even there is no bug in that function, it seems that using simple pg_mkdir_p() is cleaner. Note reading TablespaceCreateDbspace(), I found it seems that this issue has already be thought though insufficient but frankly this solution (directory recreation) seems to be not perfect given actually this should have been the responsibility of tablespace creation (also tablespace creation does more like symlink creation, etc). Also, I'm not sure whether we need to use invalid page mechanism (see xlogutils.c). Another solution is that, actually, we create a checkpoint when createdb/movedb/dropdb/droptablespace, maybe we should enforce to create restartpoint on standby for such special kind of checkpoint wal - that means we need to set a flag in checkpoing wal and let checkpoint redo code to create restartpoint if that flag is set. This solution seems to be safer. Thanks, Paul
0001-Recursively-create-tablespace-directories-if-those-a.patch
Description: Binary data