Tom, * Tom Lane ([email protected]) wrote: > Don't see what. The main reason we've not yet attempted a global fix is > that the most straightforward way (take a new snapshot each time we > start a new SnapshotNow scan) seems too expensive. But CREATE DATABASE > is so expensive that the cost of an extra snapshot there ain't gonna > matter.
Patch attached. Passes the regression tests and our internal testing
shows that it eliminates the error we were seeing (and doesn't cause
blocking, which is even better).
We have a workaround in place for our build system (more-or-less
"don't do that" approach), but it'd really be great if this was
back-patched and in the next round of point releases. Glancing
through the branches, looks like it should apply pretty cleanly.
Thanks,
Stephen
colordiff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
new file mode 100644
index 1f6e02d..94e1a5d
*** a/src/backend/commands/dbcommands.c
--- b/src/backend/commands/dbcommands.c
*************** createdb(const CreatedbStmt *stmt)
*** 131,136 ****
--- 131,137 ----
int notherbackends;
int npreparedxacts;
createdb_failure_params fparms;
+ Snapshot snapshot;
/* Extract options from the statement node tree */
foreach(option, stmt->options)
*************** createdb(const CreatedbStmt *stmt)
*** 535,540 ****
--- 536,555 ----
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
/*
+ * Take a snapshot to use while scanning through pg_tablespace. As we
+ * create directories and copy files around, this process could take a
+ * while, so also register that snapshot.
+ *
+ * Traversing pg_tablespace with an MVCC snapshot is necessary to provide
+ * us with a consistent view of the tablespaces that exist. Using
+ * SnapshotNow here would risk seeing the same tablespace multiple times
+ * or, worse, not seeing a tablespace at all if its tuple is moved around
+ * by other concurrent operations (eg: due to the ACL on the tablespace
+ * being updated).
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ /*
* Once we start copying subdirectories, we need to be able to clean 'em
* up if we fail. Use an ENSURE block to make sure this happens. (This
* is not a 100% solution, because of the possibility of failure during
*************** createdb(const CreatedbStmt *stmt)
*** 551,557 ****
* each one to the new database.
*/
rel = heap_open(TableSpaceRelationId, AccessShareLock);
! scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid srctablespace = HeapTupleGetOid(tuple);
--- 566,572 ----
* each one to the new database.
*/
rel = heap_open(TableSpaceRelationId, AccessShareLock);
! scan = heap_beginscan(rel, snapshot, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Oid srctablespace = HeapTupleGetOid(tuple);
*************** createdb(const CreatedbStmt *stmt)
*** 656,661 ****
--- 671,679 ----
PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
PointerGetDatum(&fparms));
+ /* Free our snapshot */
+ UnregisterSnapshot(snapshot);
+
return dboid;
}
signature.asc
Description: Digital signature
