On 4 March 2014 21:37, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> On Tue, Mar 4, 2014 at 2:39 PM, Simon Riggs <si...@2ndquadrant.com> wrote: >>> Your earlier claim that the dump is inconsistent just isn't accurate. >>> We now have MVCC catalogs, so any dump is going to see a perfectly >>> consistent set of data plus DDL. OK the catalogs may change AFTER the >>> snapshot was taken for the dump, but then so can the data change - >>> that's just MVCC. > >> Unfortunately, this isn't correct. The MVCC snapshots taken for >> catalog scans are "instantaneous"; that is, we take a new, current >> snapshot for each catalog scan. If all of the ruleutils.c stuff were >> using the transaction snapshot rather than instantaneous snapshots, >> this would be right. But as has been previously discussed, that's not >> the case. > > Yeah. And that's *necessary* for catalog lookups in a normally > functioning backend, because we have to see latest data (eg, it wouldn't > do for a backend to fail to enforce a just-added CHECK constraint because > it was committed after the backend's transaction started).
OK, thanks for explaining. A valuable point to note for us all. > However, it seems possible that we could have a mode in which a read-only > session did all its catalog fetches according to the transaction snapshot. > That would get us to a situation where the backend-internal lookups that > ruleutils relies on would give the same answers as queries done by > pg_dump. Robert's work on getting rid of SnapshotNow has probably moved > that much closer than it was before, but it's still not exactly a trivial > patch. > > Meanwhile, Andres claimed upthread that none of the currently-proposed > reduced-lock ALTER commands affect data that pg_dump is using ruleutils > to fetch. If that's the case, then maybe this is a problem that we can > punt till later. I've not gone through the list to verify it though. So that returns us to solving the catalog consistency problem in pg_dump and similar applications. We could (1) change the lock taken by pg_dump to be ShareUpdateExclusive. As discussed, this would be optional. (Trivial implementation) The catalog accesses are all in a rather isolated piece of code in pg_dump and run for a short period. That allows us to consider locking *always* at ShareUpdateExclusive but only for the period of catalog access and then release the higher level lock before transaction end. Since pg_dump is a client program any action we take to resolve this would need to be done in a user accessible way. That is acceptable since there may be other user programs that wish/need to read a consistent view of the definition of a table. This can be implemented in a few ways: (2) Implement a server function that allows you to lock a table for a short duration. e.g. pg_lock_catalog(Oid) and pg_unlock_catalog(Oid). We can already do this in server-side code, so this is simply a matter of exposing that same functionality for users. (3) A new variant of the LOCK command: LOCK CATALOG FOR tablename IN lock mode MODE NOWAIT, which then would have a matching UNLOCK CATALOG FOR tablename command. This is just a sugar coated version of (2). (4) Implement functions to suspend invalidation message handling for a short period. That's basically the same as (2) in profile. My feeling is that sounds rather dangerous and not something I'd want to go near now in in the future. We tried to avoid locking the catalog some years back, which is how we went off down this MVCC catalog access, which now seems to have been something of a red-shifted herring. ISTM that the user would need to specifically request a "consistent catalog". Using (2) in pg_dump is pretty easy - patch attached. So we can solve this problem completely in about another 1 hour of work, so I suggest we implement (2) and be done. Happy to document this in a new subsection of docs to describe how to dump a consistent view of a database object from a user application. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Description: Binary data
-- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers