[HACKERS] atomic pin/unpin causing errors

Jeff Janes Fri, 29 Apr 2016 10:39:45 -0700

I've bisected the errors I was seeing, discussed in
http://www.postgresql.org/message-id/CAMkU=1xqehc0ok4d+tkjfq1nvuho37pyrkhjp6q8oxifmx7...@mail.gmail.com


It look like they first appear in:

commit 48354581a49c30f5757c203415aa8412d85b0f70
Author: Andres Freund <[email protected]>
Date:   Sun Apr 10 20:12:32 2016 -0700

    Allow Pin/UnpinBuffer to operate in a lockfree manner.


I get the errors:

ERROR:  attempted to delete invisible tuple
STATEMENT:  update foo set count=count+1,text_array=$1 where text_array @> $2

And also:

ERROR:  unexpected chunk number 1 (expected 2) for toast value
85223889 in pg_toast_16424
STATEMENT:  update foo set count=count+1 where text_array @> $1

Once these errors start occurring, they happen often.  Usually the
"attempted to delete invisible tuple" happens first.

These errors show up after about 9 hours of run time.  The timing is
predictable enough that I don't think it is a purely stochastic race
condition.  It seems like some counter variable is overflowing.  But
it is not the ShmemVariableCache->nextXid counter, as I previously
speculated.  This test does not advance that fast enough to for it to
wrap around within 9 hours of run time.  But I am at a loss of what
other variable it might be. Since the system goes through a crash and
recovery every few seconds, any backend-local counters or
shared-memory counters would get reset upon recovery.  Right?

I think the invisible tuple referred to might be a tuple in the toast
table, not in the parent table.

I don't see the problem with an cassert-enabled, probably because it
is just too slow to ever reach the point where the problem occurs.

Any suggestions about where or how to look?  I don't know if the
"attempted to delete invisible tuple" is the bug itself, or is just
tripping over corruption left behind by someone else.

(This was all run using Teodor's test-enabling patch
gin_alone_cleanup-4.patch, so as not to change horses in midstream.
Now that a version of that patch has been committed, I will try to
repeat this in HEAD)

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] atomic pin/unpin causing errors

Reply via email to