Re: [HACKERS] BRIN desummarization writes junk WAL records

2017-04-07 Thread Alvaro Herrera
Tom Lane wrote:

> The proximate cause of the exception seems to be that
> brinSetHeapBlockItemptr is being passed pagesPerRange = 0,
> which is problematic since HEAPBLK_TO_REVMAP_INDEX tries to
> divide by that.  Looking one level down, the bogus value
> seems to be coming out of an xl_brin_desummarize WAL record:
> 
> (gdb) f 1
> #1  0x00478cdc in brin_xlog_desummarize_page (record=0x2403ac8)
> at brin_xlog.c:274
> 274 brinSetHeapBlockItemptr(buffer, xlrec->pagesPerRange, 
> xlrec->heapBlk, iptr);
> (gdb) p *xlrec
> $1 = {pagesPerRange = 0, heapBlk = 0, regOffset = 1}
> 
> This is, perhaps, not unrelated to the fact that
> brinRevmapDesummarizeRange doesn't seem to be bothering to fill
> that field of the record.

Absolutely.

> BTW, is it actually sensible that xl_brin_desummarize's heapBlk
> is declared OffsetNumber and not BlockNumber?  If there's a reason
> why that's correct, the field name seems damn misleading.

Nah, just an oversight (against which the compiler doesn't protect.)

Fixed both problems.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BRIN desummarization writes junk WAL records

2017-04-06 Thread Tom Lane
I am seeing the database fail to restart after a crash during the
regression tests, due to a divide-by-zero fault in BRIN wal replay.

Core was generated by `postgres: startup'.
Program terminated with signal 8, Arithmetic exception.
#0  brinSetHeapBlockItemptr (buf=, pagesPerRange=0, 
heapBlk=0, tid=...) at brin_revmap.c:169
169 iptr += HEAPBLK_TO_REVMAP_INDEX(pagesPerRange, heapBlk);
(gdb) bt
#0  brinSetHeapBlockItemptr (buf=, pagesPerRange=0, 
heapBlk=0, tid=...) at brin_revmap.c:169
#1  0x00478cdc in brin_xlog_desummarize_page (record=0x2403ac8)
at brin_xlog.c:274
#2  brin_redo (record=0x2403ac8) at brin_xlog.c:320
#3  0x00513174 in StartupXLOG () at xlog.c:7171
#4  0x006dea91 in StartupProcessMain () at startup.c:217
#5  0x0052214a in AuxiliaryProcessMain (argc=2, argv=0x7fff4bb8d1f0)
at bootstrap.c:425
#6  0x006d98b7 in StartChildProcess (type=StartupProcess)
at postmaster.c:5256
#7  0x006ddae6 in PostmasterMain (argc=3, argv=)
at postmaster.c:1329
#8  0x00658038 in main (argc=3, argv=0x2402b20) at main.c:228

The proximate cause of the exception seems to be that
brinSetHeapBlockItemptr is being passed pagesPerRange = 0,
which is problematic since HEAPBLK_TO_REVMAP_INDEX tries to
divide by that.  Looking one level down, the bogus value
seems to be coming out of an xl_brin_desummarize WAL record:

(gdb) f 1
#1  0x00478cdc in brin_xlog_desummarize_page (record=0x2403ac8)
at brin_xlog.c:274
274 brinSetHeapBlockItemptr(buffer, xlrec->pagesPerRange, 
xlrec->heapBlk, iptr);
(gdb) p *xlrec
$1 = {pagesPerRange = 0, heapBlk = 0, regOffset = 1}

This is, perhaps, not unrelated to the fact that
brinRevmapDesummarizeRange doesn't seem to be bothering to fill
that field of the record.

BTW, is it actually sensible that xl_brin_desummarize's heapBlk
is declared OffsetNumber and not BlockNumber?  If there's a reason
why that's correct, the field name seems damn misleading.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers