[HACKERS] findDependentObjects() mutual exclusion vs. MVCC catalog scans

2013-07-16 Thread Noah Misch
Consider this sequence of commands:

create type rowtype as (c int, d int);
create temp table t of rowtype;
\c -
drop type rowtype cascade;

Since the switch to MVCC catalog scans, it exhibits the following error about
10% of the time on my system:

CREATE TYPE
CREATE TABLE
You are now connected to database test as user nm.
ERROR:  XX000: cache lookup failed for relation 17009
LOCATION:  getRelationDescription, objectaddress.c:2186

With \c, in general, you may end up executing commands under the new session
before the old backend has finished exiting.  For this test case specifically,
the two backends' attempts to drop table t regularly overlap.  The old
backend would drop it within RemoveTempRelationsCallback(), and the new
backend would cascade from rowtype to drop it.  findDependentObjects() deals
with concurrent deletion attempts by acquiring a lock on each object it will
delete, then calling systable_recheck_tuple() to determine whether another
deleter was successful while the current backend was waiting for the lock.
systable_recheck_tuple() uses the scan snapshot, which really only works if
that snapshot is SnapshotNow or some other that changes its decision in
response to concurrent transaction commits.  The switch to MVCC snapshots left
this mutual exclusion protocol ineffective.

Let's fix this by having systable_recheck_tuple() acquire a fresh catalog MVCC
snapshot and recheck against that.  I believe it would also be fully safe to
use SnapshotNow here; however, I'm assuming we would otherwise manage to
remove SnapshotNow entirely.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 2bfe78a..2fecc88 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -362,7 +362,8 @@ systable_getnext(SysScanDesc sysscan)
  * systable_recheck_tuple --- recheck visibility of most-recently-fetched tuple
  *
  * This is useful to test whether an object was deleted while we waited to
- * acquire lock on it.
+ * acquire lock on it.  We recheck visibility in the style of SnapshotNow by
+ * checking against a fresh catalog snapshot.
  *
  * Note: we don't actually *need* the tuple to be passed in, but it's a
  * good crosscheck that the caller is interested in the right tuple.
@@ -370,30 +371,37 @@ systable_getnext(SysScanDesc sysscan)
 bool
 systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
 {
+   Snapshotfreshsnap;
boolresult;
 
+   /*
+* For a scan using a non-MVCC snapshot like SnapshotSelf, we would 
simply
+* reuse the old snapshot.  So far, the only caller uses MVCC snapshots.
+*/
+   freshsnap = GetCatalogSnapshot(RelationGetRelid(sysscan-heap_rel));
+
if (sysscan-irel)
{
IndexScanDesc scan = sysscan-iscan;
 
+   Assert(IsMVCCSnapshot(scan-xs_snapshot));
Assert(tup == scan-xs_ctup);
Assert(BufferIsValid(scan-xs_cbuf));
/* must hold a buffer lock to call HeapTupleSatisfiesVisibility 
*/
LockBuffer(scan-xs_cbuf, BUFFER_LOCK_SHARE);
-   result = HeapTupleSatisfiesVisibility(tup, scan-xs_snapshot,
-   
  scan-xs_cbuf);
+   result = HeapTupleSatisfiesVisibility(tup, freshsnap, 
scan-xs_cbuf);
LockBuffer(scan-xs_cbuf, BUFFER_LOCK_UNLOCK);
}
else
{
HeapScanDesc scan = sysscan-scan;
 
+   Assert(IsMVCCSnapshot(scan-rs_snapshot));
Assert(tup == scan-rs_ctup);
Assert(BufferIsValid(scan-rs_cbuf));
/* must hold a buffer lock to call HeapTupleSatisfiesVisibility 
*/
LockBuffer(scan-rs_cbuf, BUFFER_LOCK_SHARE);
-   result = HeapTupleSatisfiesVisibility(tup, scan-rs_snapshot,
-   
  scan-rs_cbuf);
+   result = HeapTupleSatisfiesVisibility(tup, freshsnap, 
scan-rs_cbuf);
LockBuffer(scan-rs_cbuf, BUFFER_LOCK_UNLOCK);
}
return result;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] findDependentObjects() mutual exclusion vs. MVCC catalog scans

2013-07-16 Thread Robert Haas
On Tue, Jul 16, 2013 at 9:50 AM, Noah Misch n...@leadboat.com wrote:
 Consider this sequence of commands:

 create type rowtype as (c int, d int);
 create temp table t of rowtype;
 \c -
 drop type rowtype cascade;

 Since the switch to MVCC catalog scans, it exhibits the following error about
 10% of the time on my system:

 CREATE TYPE
 CREATE TABLE
 You are now connected to database test as user nm.
 ERROR:  XX000: cache lookup failed for relation 17009
 LOCATION:  getRelationDescription, objectaddress.c:2186

 With \c, in general, you may end up executing commands under the new session
 before the old backend has finished exiting.  For this test case specifically,
 the two backends' attempts to drop table t regularly overlap.  The old
 backend would drop it within RemoveTempRelationsCallback(), and the new
 backend would cascade from rowtype to drop it.  findDependentObjects() deals
 with concurrent deletion attempts by acquiring a lock on each object it will
 delete, then calling systable_recheck_tuple() to determine whether another
 deleter was successful while the current backend was waiting for the lock.
 systable_recheck_tuple() uses the scan snapshot, which really only works if
 that snapshot is SnapshotNow or some other that changes its decision in
 response to concurrent transaction commits.  The switch to MVCC snapshots left
 this mutual exclusion protocol ineffective.

 Let's fix this by having systable_recheck_tuple() acquire a fresh catalog MVCC
 snapshot and recheck against that.  I believe it would also be fully safe to
 use SnapshotNow here; however, I'm assuming we would otherwise manage to
 remove SnapshotNow entirely.

I recommend reworking the header comment to avoid mention of
SnapshotNow, since if we get rid of SnapshotNow, the reference might
not be too clear to far-future hackers.

+   /*
+* For a scan using a non-MVCC snapshot like SnapshotSelf, we would 
simply
+* reuse the old snapshot.  So far, the only caller uses MVCC snapshots.
+*/
+   freshsnap = GetCatalogSnapshot(RelationGetRelid(sysscan-heap_rel));

This comment is not very clear, because it doesn't describe what the
code actually does, but rather speculates about what the code could do
if the intention of some future caller were different.  I recommend
adding Assert(IsMVCCSnapshot(scan-xs_snapshot)) and changing the
comment to something like this: For now, we don't handle the case of
a non-MVCC scan snapshot.  This is adequate for existing uses of this
function, but might need to be changed in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] findDependentObjects() mutual exclusion vs. MVCC catalog scans

2013-07-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 16, 2013 at 9:50 AM, Noah Misch n...@leadboat.com wrote:
 Let's fix this by having systable_recheck_tuple() acquire a fresh catalog 
 MVCC
 snapshot and recheck against that.  I believe it would also be fully safe to
 use SnapshotNow here; however, I'm assuming we would otherwise manage to
 remove SnapshotNow entirely.

I agree with Robert's comments, and in addition suggest that this code
needs a comment about why it's safe to use the snapshot without doing
RegisterSnapshot or equivalent.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] findDependentObjects() mutual exclusion vs. MVCC catalog scans

2013-07-16 Thread Andres Freund
On 2013-07-16 09:50:07 -0400, Noah Misch wrote:
 With \c, in general, you may end up executing commands under the new session
 before the old backend has finished exiting.  For this test case specifically,
 the two backends' attempts to drop table t regularly overlap.  The old
 backend would drop it within RemoveTempRelationsCallback(), and the new
 backend would cascade from rowtype to drop it.  findDependentObjects() deals
 with concurrent deletion attempts by acquiring a lock on each object it will
 delete, then calling systable_recheck_tuple() to determine whether another
 deleter was successful while the current backend was waiting for the lock.
 systable_recheck_tuple() uses the scan snapshot, which really only works if
 that snapshot is SnapshotNow or some other that changes its decision in
 response to concurrent transaction commits.  The switch to MVCC snapshots left
 this mutual exclusion protocol ineffective.

Nice catch.

I wonder though, isn't that code unsafe in other ways as well? What if
the pg_depend entry was rewritten inbetween? Consider somebody doing
something like REASSIGN OWNED concurrently with a DROP. The DROP
possibly will cascade to an entry which changed the owner already. And
the recheck will then report that the object doesn't exist anymore and
abort since it does a simple HeapTupleSatisfiesVisibility() and doesn't
follow the ctid chain if the tuple has been updated...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] findDependentObjects() mutual exclusion vs. MVCC catalog scans

2013-07-16 Thread Noah Misch
On Tue, Jul 16, 2013 at 05:56:10PM +0200, Andres Freund wrote:
 On 2013-07-16 09:50:07 -0400, Noah Misch wrote:
  With \c, in general, you may end up executing commands under the new 
  session
  before the old backend has finished exiting.  For this test case 
  specifically,
  the two backends' attempts to drop table t regularly overlap.  The old
  backend would drop it within RemoveTempRelationsCallback(), and the new
  backend would cascade from rowtype to drop it.  findDependentObjects() 
  deals
  with concurrent deletion attempts by acquiring a lock on each object it will
  delete, then calling systable_recheck_tuple() to determine whether another
  deleter was successful while the current backend was waiting for the lock.
  systable_recheck_tuple() uses the scan snapshot, which really only works if
  that snapshot is SnapshotNow or some other that changes its decision in
  response to concurrent transaction commits.  The switch to MVCC snapshots 
  left
  this mutual exclusion protocol ineffective.
 
 Nice catch.
 
 I wonder though, isn't that code unsafe in other ways as well? What if
 the pg_depend entry was rewritten inbetween? Consider somebody doing
 something like REASSIGN OWNED concurrently with a DROP. The DROP
 possibly will cascade to an entry which changed the owner already. And
 the recheck will then report that the object doesn't exist anymore and
 abort since it does a simple HeapTupleSatisfiesVisibility() and doesn't
 follow the ctid chain if the tuple has been updated...

I'm not seeing a problem with that particular route.  Say we're examining a
pg_depend tuple where the referencing object is a table and the referenced
object is a role, the table's owner.  We're dropping the role and cascade to
the table.  If the REASSIGNED OWNED assigns the table to a different role,
then we are correct to treat the dependency as gone.  If it's the same role
(REASSIGNED OWNED BY alice TO alice, pointless but permitted), several of
the rename implementations short-circuit that case and don't change catalog
entries.  But even if that optimization were omitted, shdepChangeDep() will
have blocked against our previously-acquired deletion lock on the role.  Code
that adds or updates a dependency without locking both objects of the new
pg_depend tuple is buggy independently.

That being said, there may well be a related mechanism that can slip past the
locking here.  My brain turns to mush when I ponder findDependentObjects() too
thoroughly.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] findDependentObjects() mutual exclusion vs. MVCC catalog scans

2013-07-16 Thread Noah Misch
On Tue, Jul 16, 2013 at 11:27:02AM -0400, Robert Haas wrote:
 I recommend reworking the header comment to avoid mention of
 SnapshotNow, since if we get rid of SnapshotNow, the reference might
 not be too clear to far-future hackers.
 
 + /*
 +  * For a scan using a non-MVCC snapshot like SnapshotSelf, we would 
 simply
 +  * reuse the old snapshot.  So far, the only caller uses MVCC snapshots.
 +  */
 + freshsnap = GetCatalogSnapshot(RelationGetRelid(sysscan-heap_rel));
 
 This comment is not very clear, because it doesn't describe what the
 code actually does, but rather speculates about what the code could do
 if the intention of some future caller were different.  I recommend
 adding Assert(IsMVCCSnapshot(scan-xs_snapshot)) and changing the
 comment to something like this: For now, we don't handle the case of
 a non-MVCC scan snapshot.  This is adequate for existing uses of this
 function, but might need to be changed in the future.

On Tue, Jul 16, 2013 at 11:35:48AM -0400, Tom Lane wrote:
 I agree with Robert's comments, and in addition suggest that this code
 needs a comment about why it's safe to use the snapshot without doing
 RegisterSnapshot or equivalent.

Committed with hopefully-better comments.  Thanks.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers