Tom, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Don't see what. The main reason we've not yet attempted a global fix is > that the most straightforward way (take a new snapshot each time we > start a new SnapshotNow scan) seems too expensive. But CREATE DATABASE > is so expensive that the cost of an extra snapshot there ain't gonna > matter.
Patch attached. Passes the regression tests and our internal testing shows that it eliminates the error we were seeing (and doesn't cause blocking, which is even better). We have a workaround in place for our build system (more-or-less "don't do that" approach), but it'd really be great if this was back-patched and in the next round of point releases. Glancing through the branches, looks like it should apply pretty cleanly. Thanks, Stephen
colordiff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c new file mode 100644 index 1f6e02d..94e1a5d *** a/src/backend/commands/dbcommands.c --- b/src/backend/commands/dbcommands.c *************** createdb(const CreatedbStmt *stmt) *** 131,136 **** --- 131,137 ---- int notherbackends; int npreparedxacts; createdb_failure_params fparms; + Snapshot snapshot; /* Extract options from the statement node tree */ foreach(option, stmt->options) *************** createdb(const CreatedbStmt *stmt) *** 535,540 **** --- 536,555 ---- RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT); /* + * Take a snapshot to use while scanning through pg_tablespace. As we + * create directories and copy files around, this process could take a + * while, so also register that snapshot. + * + * Traversing pg_tablespace with an MVCC snapshot is necessary to provide + * us with a consistent view of the tablespaces that exist. Using + * SnapshotNow here would risk seeing the same tablespace multiple times + * or, worse, not seeing a tablespace at all if its tuple is moved around + * by other concurrent operations (eg: due to the ACL on the tablespace + * being updated). + */ + snapshot = RegisterSnapshot(GetTransactionSnapshot()); + + /* * Once we start copying subdirectories, we need to be able to clean 'em * up if we fail. Use an ENSURE block to make sure this happens. (This * is not a 100% solution, because of the possibility of failure during *************** createdb(const CreatedbStmt *stmt) *** 551,557 **** * each one to the new database. */ rel = heap_open(TableSpaceRelationId, AccessShareLock); ! scan = heap_beginscan(rel, SnapshotNow, 0, NULL); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { Oid srctablespace = HeapTupleGetOid(tuple); --- 566,572 ---- * each one to the new database. */ rel = heap_open(TableSpaceRelationId, AccessShareLock); ! scan = heap_beginscan(rel, snapshot, 0, NULL); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { Oid srctablespace = HeapTupleGetOid(tuple); *************** createdb(const CreatedbStmt *stmt) *** 656,661 **** --- 671,679 ---- PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback, PointerGetDatum(&fparms)); + /* Free our snapshot */ + UnregisterSnapshot(snapshot); + return dboid; }
signature.asc
Description: Digital signature