On Fri, Jan 16, 2015 at 6:21 AM, Heikki Linnakangas <hlinnakan...@vmware.com> wrote: > It looks very much like that a page has for some reason been moved to a > different block number. And that's exactly what Peter found out in his > investigation too; an index page was mysteriously copied to a different > block with identical content.
What I found suspicious about that was that the spuriously identical pages were not physically adjacent, but logically adjacent (i.e. the bad page was considered the B-Tree right link of the good page by the good, spuriously-copied-by-bad page). It also seems likely that that small catalog index on pg_class(oid) was well cached in shared_buffers. So I agree that it's unlikely that this is actually a hardware or filesystem problem. Beyond that, if I had to guess, I'd say that the problem is more likely to be in the B-Tree code than it is in the buffer manager or whatever (so the "logically adjacent" thing is probably not an artifact of the order that the pages were accessed, since it appears there was a downlink to the bad page. This downlink was not added recently. Also, this logical adjacency is unlikely to be mere coincidence - Postgres seemed to fairly consistently break this way). Does anyone have a better developed sense of where the ultimate problem here is than I do? I guess I've never thought too much about how the system fails when a catalog index is this thoroughly broken. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers