On Tue, Nov 29, 2011 at 11:50 AM, Bruce Momjian <br...@momjian.us> wrote: > Just confirming we decided not to persue this.
Doesn't sound like it. I've been thinking a lot about the more general problem here - namely, that allowing catalog changes without an access exclusive lock is unsafe - and trying to come up with a good solution, because I'd really like to see us make this work. It strikes me that there are really two separate problems here. 1. If you are scanning a system catalog using SnapshotNow, and a commit happens at just the right moment, you might see two versions of the row or zero rather than one. You'll see two if you scan the old row version, then the concurrent transaction commits, and then you scan the new one. You'll see zero if you scan the new row (ignoring it, since it isn't committed yet) and then the concurrent transaction commits, and then you scan the old row. On a related note, if you scan several interrelated catalogs (e.g. when building a relcache entry) and a transaction that has made matching modifications to multiple catalogs commits midway through, you might see the old version of the data from one table and the new version of the data from some other table. Using AccessExclusiveLock fixes the problem because we guarantee that anybody who wants to read those rows first takes some kind of lock, which they won't be able to do while the updater holds AccessExclusiveLock. 2. Other backends may have data in the relcache or catcaches which won't get invalidated until they do AcceptInvalidationMessages(). That will always happen no later than the next time they lock the relation, so if you are holding AccessExclusiveLock then you should be OK: no one else holds any lock, and they'll have to go get one before doing anything interesting. But if you are holding any weaker lock, there's nothing to force AcceptInvalidationMessages to happen before you reuse those cache entries. In both cases, name lookups are an exception: we don't know what OID to lock until we've done the name lookup. Random ideas for solving the first problem: 1-A. Take a fresh MVCC snapshot for each catalog scan. I think we've rejected this before because of the cost involved, but maybe with Pavan's patch and some of the other improvements that are on the way we could make this cheap enough to be acceptable. 1-B. Don't allow transactions that have modified system catalog X to commit while we're scanning system catalog X; make the scans take some kind of shared lock and commits an exclusive lock. This seems awfully bad for concurrency, though, not to mention the overhead of taking and releasing all those locks even when nothing conflicts. 1-C. Wrap a retry loop around every place where we scan a system catalog. If a transaction that has written rows in catalog X commits while we're scanning catalog X, discard the results of the first scan and declare a do-over (or maybe something more coarse-grained, or at the other extreme even more fine-grained). If we're doing several interrelated scans then the retry loop must surround the entire unit, and enforce a do-over of the whole operation if commits occur in any of the catalogs before the scan completes. Unfortunately, any of these seem likely to require a Lot of Work, probably more than can be done in time for 9.2. However, perhaps it would be feasible to hone in on a limited subset of the cases (e.g. name lookups, which are mainly done in only a handful of places, and represent an existing bug) and implement the necessary code changes just for those cases. Then we could generalize that pattern to other cases as time and energy permit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers