Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

Robert Haas Tue, 29 Nov 2011 20:42:46 -0800

On Tue, Nov 29, 2011 at 11:50 AM, Bruce Momjian <br...@momjian.us> wrote:
> Just confirming we decided not to persue this.


Doesn't sound like it.

I've been thinking a lot about the more general problem here - namely,
that allowing catalog changes without an access exclusive lock is
unsafe - and trying to come up with a good solution, because I'd
really like to see us make this work.  It strikes me that there are
really two separate problems here.

1. If you are scanning a system catalog using SnapshotNow, and a
commit happens at just the right moment, you might see two versions of
the row or zero rather than one.  You'll see two if you scan the old
row version, then the concurrent transaction commits, and then you
scan the new one.  You'll see zero if you scan the new row (ignoring
it, since it isn't committed yet) and then the concurrent transaction
commits, and then you scan the old row.  On a related note, if you
scan several interrelated catalogs (e.g. when building a relcache
entry) and a transaction that has made matching modifications to
multiple catalogs commits midway through, you might see the old
version of the data from one table and the new version of the data
from some other table.  Using AccessExclusiveLock fixes the problem
because we guarantee that anybody who wants to read those rows first
takes some kind of lock, which they won't be able to do while the
updater holds AccessExclusiveLock.

2. Other backends may have data in the relcache or catcaches which
won't get invalidated until they do AcceptInvalidationMessages().
That will always happen no later than the next time they lock the
relation, so if you are holding AccessExclusiveLock then you should be
OK: no one else holds any lock, and they'll have to go get one before
doing anything interesting.  But if you are holding any weaker lock,
there's nothing to force AcceptInvalidationMessages to happen before
you reuse those cache entries.

In both cases, name lookups are an exception: we don't know what OID
to lock until we've done the name lookup.

Random ideas for solving the first problem:

1-A. Take a fresh MVCC snapshot for each catalog scan.  I think we've
rejected this before because of the cost involved, but maybe with
Pavan's patch and some of the other improvements that are on the way
we could make this cheap enough to be acceptable.
1-B. Don't allow transactions that have modified system catalog X to
commit while we're scanning system catalog X; make the scans take some
kind of shared lock and commits an exclusive lock.  This seems awfully
bad for concurrency, though, not to mention the overhead of taking and
releasing all those locks even when nothing conflicts.
1-C. Wrap a retry loop around every place where we scan a system
catalog.  If a transaction that has written rows in catalog X commits
while we're scanning catalog X, discard the results of the first scan
and declare a do-over (or maybe something more coarse-grained, or at
the other extreme even more fine-grained).  If we're doing several
interrelated scans then the retry loop must surround the entire unit,
and enforce a do-over of the whole operation if commits occur in any
of the catalogs before the scan completes.

Unfortunately, any of these seem likely to require a Lot of Work,
probably more than can be done in time for 9.2.  However, perhaps it
would be feasible to hone in on a limited subset of the cases (e.g.
name lookups, which are mainly done in only a handful of places, and
represent an existing bug) and implement the necessary code changes
just for those cases.  Then we could generalize that pattern to other
cases as time and energy permit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

Reply via email to