On 2015-01-26 22:03:03 +0100, Andres Freund wrote:
> Attached is a patch trying to this. Doesn't look too bad and lead me to
> discover missing recovery conflicts during a AD ST.
>
> But: It doesn't actually work on standbys, because lock.c prevents any
> stronger lock than RowExclusive from being acquired. And we need need a
> lock that can conflict with WAL replay of DBASE_CREATE, to handle base
> backups that are executed on the primary. Those obviously can't detect
> whether any standby is currently doing a base backup...
>
> I currently don't have a good idea how to mangle lock.c to allow
> this. I've played with doing it like in the second patch, but that
> doesn't actually work because of some asserts around ProcSleep - leading
> to locks on database objects not working in the startup process (despite
> already being used).
>
> The easiest thing would be to just use a lwlock instead of a heavyweight
> lock - but those aren't canceleable...
As Stephen noticed on irc I forgot to attach those. Caused by the
generic HS issue mentioned in [email protected].
Attached now.
As mentioned above, this isn't really isn't ready (e.g. it'll Assert() on a
standby due to the HS issue).
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
>From e5d46c73955e7647912c2625e31027a6fef7c880 Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Mon, 26 Jan 2015 21:03:35 +0100
Subject: [PATCH] Prevent ALTER DATABASE SET TABLESPACE from running
concurrently with a base backup.
The combination can cause a corrupt base backup.
---
src/backend/access/transam/xlog.c | 33 +++++++++++++++++++++++++++++
src/backend/commands/dbcommands.c | 41 ++++++++++++++++++++++++++++++++++++
src/backend/replication/basebackup.c | 11 ++++++++++
src/include/access/xlog.h | 1 +
4 files changed, 86 insertions(+)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 629a457..1daa10d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -53,6 +53,7 @@
#include "storage/ipc.h"
#include "storage/large_object.h"
#include "storage/latch.h"
+#include "storage/lmgr.h"
#include "storage/pmsignal.h"
#include "storage/predicate.h"
#include "storage/proc.h"
@@ -9329,6 +9330,14 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
else
XLogCtl->Insert.nonExclusiveBackups++;
XLogCtl->Insert.forcePageWrites = true;
+
+ /*
+ * Acquire lock on pg datababase just before releasing the WAL insert lock
+ * and entering the ENSURE_ERROR_CLEANUP block. That way it's sufficient
+ * to release it in the error cleanup callback.
+ */
+ LockRelationOid(DatabaseRelationId, ShareLock);
+
WALInsertLockRelease();
/* Ensure we release forcePageWrites if fail below */
@@ -9523,6 +9532,8 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
}
PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
+ UnlockRelationOid(DatabaseRelationId, ShareLock);
+
/*
* We're done. As a convenience, return the starting WAL location.
*/
@@ -9537,6 +9548,8 @@ pg_start_backup_callback(int code, Datum arg)
{
bool exclusive = DatumGetBool(arg);
+ UnlockRelationOid(DatabaseRelationId, ShareLock);
+
/* Update backup counters and forcePageWrites on failure */
WALInsertLockAcquireExclusive();
if (exclusive)
@@ -9937,6 +9950,26 @@ do_pg_abort_backup(void)
}
/*
+ * Is a (exclusive or nonexclusive) base backup running?
+ *
+ * Note that this does not check whether any standby of this node has a
+ * basebackup running, or whether any upstream master (if this is a standby)
+ * has one in progress
+ */
+bool
+LocalBaseBackupInProgress(void)
+{
+ bool ret;
+
+ WALInsertLockAcquire();
+ ret = XLogCtl->Insert.exclusiveBackup ||
+ XLogCtl->Insert.nonExclusiveBackups > 0;
+ WALInsertLockRelease();
+
+ return ret;
+}
+
+/*
* Get latest redo apply position.
*
* Exported to allow WALReceiver to read the pointer directly.
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 5e66961..6184c83 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1087,6 +1087,30 @@ movedb(const char *dbname, const char *tblspcname)
errmsg("cannot change the tablespace of the currently open database")));
/*
+ * Prevent SET TABLESPACE from running concurrently with a base
+ * backup. Without that check a base backup would potentially copy a
+ * partially removed source database; which WAL replay then would copy
+ * over the new database...
+ *
+ * Starting a base backup takes a SHARE lock on pg_database. In addition a
+ * streaming basebackup takes the same lock for the entirety of the copy
+ * of the data directory. That, combined with this check, prevents base
+ * backups from being taken at the same time a SET TABLESPACE is in
+ * progress.
+ *
+ * Note that this check here will not trigger if a standby currently has a
+ * base backup ongoing; instead WAL replay will have a recovery conflict
+ * when replaying the DBASE_CREATE record. That is only safe because
+ * standbys can only do nonexclusive base backups which hold the lock over
+ * the entire runtime - otherwise no lock would prevent replay after
+ * pg_start_backup().
+ */
+ if (LocalBaseBackupInProgress())
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot change a database's tablespace while a base backup is in progress")));
+
+ /*
* Get tablespace's oid
*/
dst_tblspcoid = get_tablespace_oid(tblspcname, false);
@@ -2052,6 +2076,17 @@ dbase_redo(XLogReaderState *record)
src_path = GetDatabasePath(xlrec->src_db_id, xlrec->src_tablespace_id);
dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id);
+ if (InHotStandby)
+ {
+ /* Lock source database, do avoid concurrent hint bit writes et al. */
+ LockSharedObjectForSession(DatabaseRelationId, xlrec->src_db_id, 0, AccessExclusiveLock);
+ ResolveRecoveryConflictWithDatabase(xlrec->src_db_id);
+
+ /* Lock target database, it'll be overwritten in a second */
+ LockSharedObjectForSession(DatabaseRelationId, xlrec->db_id, 0, AccessExclusiveLock);
+ ResolveRecoveryConflictWithDatabase(xlrec->db_id);
+ }
+
/*
* Our theory for replaying a CREATE is to forcibly drop the target
* subdirectory if present, then re-copy the source data. This may be
@@ -2078,6 +2113,12 @@ dbase_redo(XLogReaderState *record)
* We don't need to copy subdirectories
*/
copydir(src_path, dst_path, false);
+
+ if (InHotStandby)
+ {
+ UnlockSharedObjectForSession(DatabaseRelationId, xlrec->src_db_id, 0, AccessExclusiveLock);
+ UnlockSharedObjectForSession(DatabaseRelationId, xlrec->db_id, 0, AccessExclusiveLock);
+ }
}
else if (info == XLOG_DBASE_DROP)
{
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 3058ce9..0e76497 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -19,6 +19,7 @@
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
#include "catalog/catalog.h"
+#include "catalog/pg_database.h"
#include "catalog/pg_type.h"
#include "lib/stringinfo.h"
#include "libpq/libpq.h"
@@ -32,6 +33,8 @@
#include "replication/walsender_private.h"
#include "storage/fd.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
+#include "storage/lock.h"
#include "utils/builtins.h"
#include "utils/elog.h"
#include "utils/ps_status.h"
@@ -110,6 +113,8 @@ static void
base_backup_cleanup(int code, Datum arg)
{
do_pg_abort_backup();
+
+ UnlockRelationOid(DatabaseRelationId, ShareLock);
}
/*
@@ -134,6 +139,9 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
&labelfile);
+
+ LockRelationOid(DatabaseRelationId, ShareLock);
+
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
* us to abort the backup so we don't "leak" a backup counter. For this reason,
@@ -304,6 +312,9 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
}
PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ /* release lock again, before stop_backup, as that can error out */
+ UnlockRelationOid(DatabaseRelationId, ShareLock);
+
endptr = do_pg_stop_backup(labelfile, !opt->nowait, &endtli);
if (opt->includewal)
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 138deaf..f3893f3 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -253,6 +253,7 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
TimeLineID *stoptli_p);
extern void do_pg_abort_backup(void);
+extern bool LocalBaseBackupInProgress(void);
/* File path names (all relative to $PGDATA) */
#define BACKUP_LABEL_FILE "backup_label"
--
2.0.0.rc2.4.g1dc51c6.dirty
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1daa10d..5c70567 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9336,7 +9336,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
* and entering the ENSURE_ERROR_CLEANUP block. That way it's sufficient
* to release it in the error cleanup callback.
*/
- LockRelationOid(DatabaseRelationId, ShareLock);
+ LockRelationOidForSession(DatabaseRelationId, ShareLock);
WALInsertLockRelease();
@@ -9532,7 +9532,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
}
PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
- UnlockRelationOid(DatabaseRelationId, ShareLock);
+ UnlockRelationOidForSession(DatabaseRelationId, ShareLock);
/*
* We're done. As a convenience, return the starting WAL location.
@@ -9548,7 +9548,7 @@ pg_start_backup_callback(int code, Datum arg)
{
bool exclusive = DatumGetBool(arg);
- UnlockRelationOid(DatabaseRelationId, ShareLock);
+ UnlockRelationOidForSession(DatabaseRelationId, ShareLock);
/* Update backup counters and forcePageWrites on failure */
WALInsertLockAcquireExclusive();
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 6184c83..d312e23 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -2085,6 +2085,9 @@ dbase_redo(XLogReaderState *record)
/* Lock target database, it'll be overwritten in a second */
LockSharedObjectForSession(DatabaseRelationId, xlrec->db_id, 0, AccessExclusiveLock);
ResolveRecoveryConflictWithDatabase(xlrec->db_id);
+
+ /* Lock pg_database, to conflict with base backups */
+ LockRelationOidForSession(DatabaseRelationId, RowExclusiveLock);
}
/*
@@ -2116,8 +2119,9 @@ dbase_redo(XLogReaderState *record)
if (InHotStandby)
{
- UnlockSharedObjectForSession(DatabaseRelationId, xlrec->src_db_id, 0, AccessExclusiveLock);
+ UnlockRelationOidForSession(DatabaseRelationId, RowExclusiveLock);
UnlockSharedObjectForSession(DatabaseRelationId, xlrec->db_id, 0, AccessExclusiveLock);
+ UnlockSharedObjectForSession(DatabaseRelationId, xlrec->src_db_id, 0, AccessExclusiveLock);
}
}
else if (info == XLOG_DBASE_DROP)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0e76497..481ba18 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -114,7 +114,7 @@ base_backup_cleanup(int code, Datum arg)
{
do_pg_abort_backup();
- UnlockRelationOid(DatabaseRelationId, ShareLock);
+ UnlockRelationOidForSession(DatabaseRelationId, ShareLock);
}
/*
@@ -140,7 +140,7 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
&labelfile);
- LockRelationOid(DatabaseRelationId, ShareLock);
+ LockRelationOidForSession(DatabaseRelationId, ShareLock);
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
@@ -313,7 +313,7 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
/* release lock again, before stop_backup, as that can error out */
- UnlockRelationOid(DatabaseRelationId, ShareLock);
+ UnlockRelationOidForSession(DatabaseRelationId, ShareLock);
endptr = do_pg_stop_backup(labelfile, !opt->nowait, &endtli);
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index d13a167..c428b38 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -248,6 +248,37 @@ UnlockRelation(Relation relation, LOCKMODE lockmode)
}
/*
+ * LockRelationOidForSession
+ *
+ * Lock a relation in session mode, without requiring having it opened it in
+ * transaction local mode. That's likely only useful during WAL replay.
+ */
+void
+LockRelationOidForSession(Oid relid, LOCKMODE lockmode)
+{
+ LOCKTAG tag;
+
+ SetLocktagRelationOid(&tag, relid);
+
+ (void) LockAcquire(&tag, lockmode, true, false);
+}
+
+/*
+ * UnlockRelationOidForSession
+ *
+ * Unlock lock acquired by LockRelationOidForSession.
+ */
+void
+UnlockRelationOidForSession(Oid relid, LOCKMODE lockmode)
+{
+ LOCKTAG tag;
+
+ SetLocktagRelationOid(&tag, relid);
+
+ LockRelease(&tag, lockmode, true);
+}
+
+/*
* LockHasWaitersRelation
*
* This is a functiion to check if someone else is waiting on a
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 61c8d21..3ebbe9c 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -710,13 +710,16 @@ LockAcquireExtended(const LOCKTAG *locktag,
if (RecoveryInProgress() && !InRecovery &&
(locktag->locktag_type == LOCKTAG_OBJECT ||
locktag->locktag_type == LOCKTAG_RELATION) &&
- lockmode > RowExclusiveLock)
+ lockmode > RowExclusiveLock &&
+ !sessionLock)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot acquire lock mode %s on database objects while recovery is in progress",
lockMethodTable->lockModeNames[lockmode]),
errhint("Only RowExclusiveLock or less can be acquired on database objects during recovery.")));
+ Assert(!InRecovery || (sessionLock && dontWait));
+
#ifdef LOCK_DEBUG
if (LOCK_DEBUG_ENABLED(locktag))
elog(LOG, "LockAcquire: lock [%u,%u] %s",
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index f5d70e5..2397f11 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -50,6 +50,9 @@ extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
extern void UnlockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
+extern void LockRelationOidForSession(Oid relid, LOCKMODE lockmode);
+extern void UnlockRelationOidForSession(Oid relid, LOCKMODE lockmode);
+
/* Lock a relation for extension */
extern void LockRelationForExtension(Relation relation, LOCKMODE lockmode);
extern void UnlockRelationForExtension(Relation relation, LOCKMODE lockmode);
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers