On 2016-05-10 13:17:52 -0700, Jeff Janes wrote:
> On Tue, May 10, 2016 at 9:19 AM, Andres Freund <and...@anarazel.de> wrote:
> > On 2016-05-10 08:09:02 -0400, Robert Haas wrote:
> >> On Tue, May 10, 2016 at 3:05 AM, Andres Freund <and...@anarazel.de> wrote:
> >> > The easy way to trigger this problem would be to have an oid wraparound
> >> > - but the WAL shows that that's not the case here.  I've not figured
> >> > that one out entirely (and won't tonight). But I do see WAL records
> >> > like:
> >> > rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 
> >> > 2/12004018, prev 2/12003288, desc: NEXTOID 4302693
> >> > rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 
> >> > 2/1327EA08, prev 2/1327DC60, desc: NEXTOID 4302693
> Were there any CHECKPOINT_SHUTDOWN records, or any other NEXTOID
> records, between those two records you show?

Yes, check 

I think the explanation about how the bug is occuring there makes sense.

> My current test harness updates the scalar count field on every
> iteration, but changes the (probably toasted) text_array field with a
> probability of only 1% each time.  Perhaps making that more likely (by
> changing line 186 of count.pl) would make it easier to trigger the
> bug.  I'll try that in my next iteration of tests.

So my current theory about why the whole thing is kinda hard to
reproduce is that "luck" determines how aggressively the toast table is
vacuumed, and how often it actually succeeds in being vacuumed. You also
need a good bit of bad luck for the hint bits by GetNewOidWithIndex() to
not survive, given that shared_buffers is pretty small *and* checksums
are enabled.

I guess testing with a bigger shared memory and without checksums will
make it easier to hit the bug.



Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to