I wrote:
> What I speculate right at the moment is that we are not looking at index
> corruption at all, but at heap corruption: somehow, the first insertion
> into ctid (27806,2) got lost and the same ctid got re-used for the next
> inserted row.  We fixed one bug like this before ...

Further study makes this look more probable.  It seems that most of the
activity around the trouble spot consists of transactions inserting
exactly 13 rows into the table; for instance here are traces of two
successive transactions:

Block   Item    XMIN            CMIN

27800   7       600921860       3
27800   8       600921860       7
27800   9       600921860       11
27800   10      600921860       15
27800   11      600921860       19
27800   12      600921860       23
27800   13      600921860       27
27800   14      600921860       31
27800   15      600921860       35
27800   16      600921860       39
27800   17      600921860       43
27800   18      600921860       47
27800   19      600921860       51

27800   20      600921870       3
27800   21      600921870       7
27800   22      600921870       11
27800   23      600921870       15
27800   24      600921870       19
27800   25      600921870       23
27800   26      600921870       27
27800   27      600921870       31
27800   28      600921870       35
27800   29      600921870       39
27800   30      600921870       43
27800   31      600921870       47
27800   32      600921870       51

The pattern of CMIN values is the same for all these transactions.
But look at the rows inserted by 600921856 and 600921858, the two
transactions that seem to be involved with the problem on page
27086:

27787   50      600921856       3
27795   41      600921858       3
27795   42      600921858       7
27795   43      600921858       11
27795   44      600921858       15
27795   45      600921858       19
27795   46      600921858       23
27795   47      600921858       27
27795   48      600921858       31
27795   49      600921858       35
27795   50      600921858       39
27795   51      600921858       43
27806   1       600921858       47
27806   2       600921856       15
27806   3       600921856       19
27806   4       600921856       23
27806   5       600921856       27
27806   6       600921856       31
27806   7       600921856       35
27806   8       600921856       39
27806   9       600921856       43
27806   10      600921856       47
27806   11      600921856       51
27806   12      600921858       51

(27787,50) and (27795,51) are both the last rows on their pages.  What's
evidently happened is that the two transactions filled those pages and
then both seized on 27806 as the next page to insert into.  I think that
600921856 tried to insert its CMIN 7 and 11 rows as items 1 and 2 on
that page, and then something wiped the page, then 600921858 inserted
its CMIN 47 row as item 1, and then 600921856 got control back and
finished inserting its rows.  Further study of the indexes shows that
there are two entries in each index pointing to each of (27806,1) and
(27806,2) --- but since the xxid values are different for the two
(27806,1) entries, those didn't show up as duplicates in my first look.

Given the apparent connection to vacuuming, this is looking a WHOLE
lot like this bug fixed in 8.0.3:

2005-05-07 17:32  tgl

        * src/backend/: access/heap/hio.c, commands/vacuumlazy.c
        (REL7_3_STABLE), access/heap/hio.c, access/nbtree/nbtpage.c,
        access/nbtree/nbtree.c, commands/vacuumlazy.c (REL7_4_STABLE),
        access/heap/hio.c, commands/vacuumlazy.c (REL7_2_STABLE),
        access/heap/hio.c, access/nbtree/nbtpage.c, access/nbtree/nbtree.c,
        commands/vacuumlazy.c (REL8_0_STABLE), access/heap/hio.c,
        access/nbtree/nbtpage.c, access/nbtree/nbtree.c,
        commands/vacuumlazy.c: Repair very-low-probability race condition
        between relation extension and VACUUM: in the interval between
        adding a new page to the relation and formatting it, it was
        possible for VACUUM to come along and decide it should format the
        page too.  Though not harmful in itself, this would cause data loss
        if a third transaction were able to insert tuples into the vacuumed
        page before the original extender got control back.

Are you *certain* this slave isn't running 8.0.2 or older?  If you can
verify that, then I guess we need to look for another mechanism that
could cause the same kind of thing.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Reply via email to