> On Mar 15, 2021, at 11:57 AM, Tom Lane <[email protected]> wrote: > > Not sure that I believe the theory that this is from bad luck of > concurrent autovacuum timing, though. The fact that we're seeing > this on just those two animals suggests strongly to me that it's > architecture-correlated, instead.
I find it a little hard to see how amcheck is tripping over a toasted value
just in this one table, pg_statistic, and not in any of the others. The error
message says the problem is in attribute 27, which I believe means it is
stavalues2. The comment in the header for this catalog is intriguing:
/*
* Values in these arrays are values of the column's data type, or of some
* related type such as an array element type. We presently have to cheat
* quite a bit to allow polymorphic arrays of this kind, but perhaps
* someday it'll be a less bogus facility.
*/
anyarray stavalues1;
anyarray stavalues2;
anyarray stavalues3;
anyarray stavalues4;
anyarray stavalues5;
This is hard to duplicate in a test, because you can't normally create tables
with pseudo-type columns. However, if amcheck is walking the tuple and does
not correctly update the offset with the length of attribute 26, it may try to
read attribute 27 at the wrong offset, unsurprisingly leading to garbage,
perhaps a garbage toast pointer. The attached patch v7-0004 adds a check to
verify_heapam to see if the va_toastrelid matches the expected toast table oid
for the table we're reading. That check almost certainly should have been
included in the initial version of verify_heapam, so even if it does nothing to
help us in this issue, it's good that it be committed, I think.
It is unfortunate that the failing test only runs pg_amcheck after creating
numerous corruptions, as we can't know if pg_amcheck would have complained
about pg_statistic before the corruptions were created in other tables, or if
it only does so after. The attached patch v7-0003 adds a call to pg_amcheck
after all tables are created and populated, but before any corruptions are
caused. This should help narrow down what is happening, and doesn't hurt to
leave in place long-term.
I don't immediately see anything wrong with how pg_statistic uses a
pseudo-type, but it leads me to want to poke a bit more at pg_statistic on
hornet and tern, though I don't have any regression tests specifically for
doing so.
Tests v7-0001 and v7-0002 are just repeats of the tests posted previously.
v7-0001-Turning-off-autovacuum-during-corruption-tests.patch
Description: Binary data
v7-0002-Fixing-a-confusing-amcheck-corruption-message.patch
Description: Binary data
v7-0003-Extend-pg_amcheck-test-suite.patch
Description: Binary data
v7-0004-Add-extra-check-of-toast-pointer-in-amcheck.patch
Description: Binary data
— Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
