Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

Tom Lane Fri, 17 Jun 2011 10:24:00 -0700

Robert Haas <robertmh...@gmail.com> writes:
> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> I believe that this is fundamentally unavoidable so long as we use
>> SnapshotNow to read catalogs --- which is something we've talked about
>> changing, but it will require a pretty major R&D effort to make it
>> happen.


> Ouch.

> I wonder if we could avoid this anomaly by taking a throwaway MVCC
> snapshot at the beginning of each system catalog scan and using it
> just for the duration of that scan.  If nothing that has touched the
> catalog commits while the scan is open, then this is logically
> equivalent to SnapshotNow.  If something does commit in mid-scan, then
> we might not get the latest version of the row, but we should end up
> with exactly one.  If it's not the latest one, we'll do the rebuild
> again upon seeing the next sinval message; in the meantime, the
> version we're using mustn't be too intolerably bad or it was an error
> not to use AccessExclusiveLock in the first place.

Yeah, this seems like a possibly workable direction to explore.  I like
this better than what Simon is proposing, because it would fix the
generic issue for all types of catalog SnapshotNow scans.

> IIUC, the problem with this approach is not correctness but
> performance.  Taking snapshots is (currently) expensive.

Yeah.  After mulling it for awhile, what about this idea: we could
redefine SnapshotNow as a snapshot type that includes a list of
transactions-in-progress, somewhat like an MVCC snapshot, but we don't
fill that list from the PGPROC array.  Instead, while running a scan
with SnapshotNow, anytime we determine that a particular XID is
still-in-progress, we add that XID to the snapshot's list.
Subsequently, the SnapshotNow code assumes that XID to be
still-in-progress without consulting its actual state.  We reset the XID
list to empty when starting a new SnapshotNow scan.  (We might be able
to do so less often than that, like only when we do
AcceptInvalidationMessages, but it's not clear to me that there's any
real benefit in hanging onto the state longer.)

This costs no performance; if anything it should be faster than now,
because we'll be replacing expensive transaction state probes with
relatively-cheap searches of an XID array that should almost always
be quite short.

With this approach, we would have no serialization anomalies from single
transactions committing while a scan is in progress.  There could be
anomalies resulting from considering an earlier XID to be in-progress
while a later XID is considered committed (because we didn't observe
it until later).  So far as I can see offhand, the impact of that would
be that there might be multiple versions of a tuple that are considered
good, but never that there would be no version considered good (so long
as the other XIDs simply updated the tuple and didn't delete it).  I
think this would be all right, since the scan would just seize on the
first good version it finds.  As you argue above, if that's not good
enough for our purposes then the updater(s) should have taken a stronger
lock.

I am not, however, particularly pleased with the idea of trying to make
this work in 9.1.  I still think that we should back off the attempt
to reduce lock strength in 9.1, and take it up for 9.2.  We need to be
stabilizing 9.1 for release, not introducing new untested mechanisms in
it.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

Reply via email to