Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

Peter Geoghegan Fri, 16 Jan 2015 16:11:29 -0800

On Fri, Jan 16, 2015 at 6:21 AM, Heikki Linnakangas
<hlinnakan...@vmware.com> wrote:
> It looks very much like that a page has for some reason been moved to a
> different block number. And that's exactly what Peter found out in his
> investigation too; an index page was mysteriously copied to a different
> block with identical content.


What I found suspicious about that was that the spuriously identical
pages were not physically adjacent, but logically adjacent (i.e. the
bad page was considered the B-Tree right link of the good page by the
good, spuriously-copied-by-bad page). It also seems likely that that
small catalog index on pg_class(oid) was well cached in
shared_buffers. So I agree that it's unlikely that this is actually a
hardware or filesystem problem. Beyond that, if I had to guess, I'd
say that the problem is more likely to be in the B-Tree code than it
is in the buffer manager or whatever (so the "logically adjacent"
thing is probably not an artifact of the order that the pages were
accessed, since it appears there was a downlink to the bad page. This
downlink was not added recently. Also, this logical adjacency is
unlikely to be mere coincidence - Postgres seemed to fairly
consistently break this way).

Does anyone have a better developed sense of where the ultimate
problem here is than I do? I guess I've never thought too much about
how the system fails when a catalog index is this thoroughly broken.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

Reply via email to