Testing partial-write crash-recovery in 9.4 (e12d7320ca494fd05134847e30)
with foreign keys, I found some btree index corruption.

28807 VACUUM 2014-05-21 15:33:46.878 PDT:ERROR:  right sibling 4044 of
block 460 is not next child 23513 of block 1264 in index "foo_p_id_idx"
28807 VACUUM 2014-05-21 15:33:46.878 PDT:STATEMENT:  VACUUM;

It took ~8 hours on 8 cores to encounter this problem.  This is a single
occurrence, it has not yet been reproduced.

I don't know that the partial-writes, or the crash recovery, or the foreign
key, parts of this test are important--it could be a more generic problem
that only happened to be observed here.  Nor do I know yet if it occurs in
9_3_STABLE.

Below is the testing harness and the data directory (massively bloated at
3.7GB once uncompressed).  It is currently in wrap-around shutdown, but
that is the effect of persistent vacuum failures, not the cause of them.
 You can restart the data directory and it will repeat the above sibling
error once autovac kicks in.  I don't know if the bloat is due to the
vacuum failure or if it was already in process before the failures started.
 I've cranked up the logging on that front future efforts.

I'm using some fast-foward code on the xid consumption so that freezing
occurs more often, and some people have expressed reservations that the
code might be imperfect, and I can't rule that out as the cause (but I've
never traced any other problems back to that code).  But it did make it
through 4 complete wraps before this problem was encountered, so if that is
the problem it must be probabilistic rather than deterministic.

https://drive.google.com/folderview?id=0Bzqrh1SO9FcENWd6ZXlwVWpxU0E&usp=sharing

Cheers,

Jeff

Reply via email to