On 2016-05-10 08:09:02 -0400, Robert Haas wrote: > On Tue, May 10, 2016 at 3:05 AM, Andres Freund <and...@anarazel.de> wrote: > > The easy way to trigger this problem would be to have an oid wraparound > > - but the WAL shows that that's not the case here. I've not figured > > that one out entirely (and won't tonight). But I do see WAL records > > like: > > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: > > 2/12004018, prev 2/12003288, desc: NEXTOID 4302693 > > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: > > 2/1327EA08, prev 2/1327DC60, desc: NEXTOID 4302693 > > i.e. two NEXTOID records allocating the same range, which obviously > > doesn't seem right. There's also every now and then close by ranges: > > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: > > 1/9A404DB8, prev 1/9A404270, desc: NEXTOID 3311455 > > rmgr: XLOG len (rec/tot): 4/ 30, tx: 7814505, lsn: > > 1/9A4EC888, prev 1/9A4EB9D0, desc: NEXTOID 3311461 > > > > > > As far as I can see something like the above, or an oid wraparound, are > > pretty much deadly for toast. > > > > Is anybody ready with a good defense for SatisfiesToast not doing any > > actual liveliness checks? > > I assume that this was installed as a performance optimization, and I > don't really see why it shouldn't be or be able to be made safe. I > assume that the wraparound case was deemed safe because at that time > the idea of 4 billion OIDs getting used with old transactions still > active seemed inconceivable.
It's not super likely, yea. But you don't really need to "use" 4 billion oids to get a wraparound. Once you have a significant number of values in various toast tables, the oid counter progresses really rather fast, without many writes. That's because the oid counter is global, but each individual toast write (and other things), perform checks via GetNewOidWithIndex(). I'm not sure why you think it's safe? Consider the following scenario: BEGIN; -- nextoid: 1 INSERT toastval = chunk_id = 1; ROLLBACK: ... -- oid counter wraps around -- nextoid: 1 INSERT toastval = chunk_id = 1; -- crash, loosing all pending hint bits SELECT toastval; The last SELECT might find either of the toasted data chunks, depending on heap ordering. As they're not hinted as invalid due to the crash, HeapTupleSatisfiesToast() will return both as visible. To make that safe we'd at least make hint bit writes by the scan in GetNewOidWithIndex() durable, and likely also disable the killtuples optimization; to avoid a plain SELECT of the toast table to make some tuples unreachable, but not durably hinted. That seems fairly fragile. I've a significant amount of doubt that toast reads are bottlenecked by visibility routines. > It seems to me that the real question > here is how you're getting two calls to XLogPutNextOid() with the same > value of ShmemVariableCache->nextOid, and the answer, as it seems to > me, must be that LWLocks are broken. There likely were a bunch of crashes in between, Jeff's test suite triggers them at a high rate. It seems a lot more likely than that an lwlock bug only materializes in the oid counter. Investigating. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers