Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Don't see what.  The main reason we've not yet attempted a global fix is
> that the most straightforward way (take a new snapshot each time we
> start a new SnapshotNow scan) seems too expensive.  But CREATE DATABASE
> is so expensive that the cost of an extra snapshot there ain't gonna
> matter.

  Patch attached.  Passes the regression tests and our internal testing
  shows that it eliminates the error we were seeing (and doesn't cause
  blocking, which is even better).

  We have a workaround in place for our build system (more-or-less
  "don't do that" approach), but it'd really be great if this was
  back-patched and in the next round of point releases.  Glancing
  through the branches, looks like it should apply pretty cleanly.

        Thanks,

                Stephen
colordiff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
new file mode 100644
index 1f6e02d..94e1a5d
*** a/src/backend/commands/dbcommands.c
--- b/src/backend/commands/dbcommands.c
*************** createdb(const CreatedbStmt *stmt)
*** 131,136 ****
--- 131,137 ----
  	int			notherbackends;
  	int			npreparedxacts;
  	createdb_failure_params fparms;
+ 	Snapshot	snapshot;
  
  	/* Extract options from the statement node tree */
  	foreach(option, stmt->options)
*************** createdb(const CreatedbStmt *stmt)
*** 535,540 ****
--- 536,555 ----
  	RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
  
  	/*
+ 	 * Take a snapshot to use while scanning through pg_tablespace.  As we
+ 	 * create directories and copy files around, this process could take a
+ 	 * while, so also register that snapshot.
+ 	 *
+ 	 * Traversing pg_tablespace with an MVCC snapshot is necessary to provide
+ 	 * us with a consistent view of the tablespaces that exist.  Using
+ 	 * SnapshotNow here would risk seeing the same tablespace multiple times
+ 	 * or, worse, not seeing a tablespace at all if its tuple is moved around
+ 	 * by other concurrent operations (eg: due to the ACL on the tablespace
+ 	 * being updated).
+ 	 */
+ 	snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ 
+ 	/*
  	 * Once we start copying subdirectories, we need to be able to clean 'em
  	 * up if we fail.  Use an ENSURE block to make sure this happens.  (This
  	 * is not a 100% solution, because of the possibility of failure during
*************** createdb(const CreatedbStmt *stmt)
*** 551,557 ****
  		 * each one to the new database.
  		 */
  		rel = heap_open(TableSpaceRelationId, AccessShareLock);
! 		scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
  		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
  		{
  			Oid			srctablespace = HeapTupleGetOid(tuple);
--- 566,572 ----
  		 * each one to the new database.
  		 */
  		rel = heap_open(TableSpaceRelationId, AccessShareLock);
! 		scan = heap_beginscan(rel, snapshot, 0, NULL);
  		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
  		{
  			Oid			srctablespace = HeapTupleGetOid(tuple);
*************** createdb(const CreatedbStmt *stmt)
*** 656,661 ****
--- 671,679 ----
  	PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
  								PointerGetDatum(&fparms));
  
+ 	/* Free our snapshot */
+ 	UnregisterSnapshot(snapshot);
+ 
  	return dboid;
  }
  

Attachment: signature.asc
Description: Digital signature

Reply via email to