Re: [HACKERS] 9.3.9 and pg_multixact corruption
On Mon, Sep 28, 2015 at 11:10:52AM -0400, Robert Haas wrote: > On Fri, Sep 25, 2015 at 3:41 AM, Andreas Seltenreich >wrote: > > OTOH, a unit test for multixact.c that exercises the code including > > wraparounds sounds like a desirable thing regardless of the fact that it > > could have caught this miscompilation earlier than 6 months into > > production. > > Definitely. +1. Catching compiler bugs is an important role of "make check". That suite has a notable gap if it passes despite a compiler bug ruining the binary. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
On 09/28/2015 08:10 AM, Robert Haas wrote: > -1 on that idea. I really don't think that we should categorically > decide we don't support higher optimization levels. If the compiler > has a bug, then the compiler manufacturer should fix it, and it's not > our fault. If the compiler doesn't have a bug and our stuff is > blowing up, then we have a bug and should fix it. I suppose there > could be some grey area but hopefully not too much. Or it's PILBChAK. I know Sun-CC used to warn that -O3 was unsuitable for most programs because it could change behavior. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
On Mon, Sep 28, 2015 at 2:34 PM, Josh Berkuswrote: > On 09/28/2015 08:10 AM, Robert Haas wrote: >> -1 on that idea. I really don't think that we should categorically >> decide we don't support higher optimization levels. If the compiler >> has a bug, then the compiler manufacturer should fix it, and it's not >> our fault. If the compiler doesn't have a bug and our stuff is >> blowing up, then we have a bug and should fix it. I suppose there >> could be some grey area but hopefully not too much. > > Or it's PILBChAK. I know Sun-CC used to warn that -O3 was unsuitable for > most programs because it could change behavior. I'm attempting to decipher this acronym. Problem is located between chair and keyboard? But I don't see how that applies here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
On Fri, Sep 25, 2015 at 3:41 AM, Andreas Seltenreichwrote: > I think the intention was to make configure complain if there's a -O > 2 > in CFLAGS. -1 on that idea. I really don't think that we should categorically decide we don't support higher optimization levels. If the compiler has a bug, then the compiler manufacturer should fix it, and it's not our fault. If the compiler doesn't have a bug and our stuff is blowing up, then we have a bug and should fix it. I suppose there could be some grey area but hopefully not too much. > OTOH, a unit test for multixact.c that exercises the code including > wraparounds sounds like a desirable thing regardless of the fact that it > could have caught this miscompilation earlier than 6 months into > production. Definitely. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
On 25/09 09.37, Andreas Seltenreich wrote: > [ adding Bjorn Munch to Cc ] Oh. I am on the -hackers list but usually just scan for any subject mentioning Solaris and this one did not. :-) > Jim Nasby writes: > > On 9/20/15 9:23 AM, Christoph Berg wrote: > >> a short update here: the customer updated the compiler to a newer > >> version, is now compiling using -O2 instead of -O3, and the code > >> generated now looks sane, so this turned out to be a compiler issue. > >> (Though it's unclear if the upgrade fixed it, or the different -O > >> level.) > > > > Do we officially not support anything > -O2? If so it'd be nice if > > configure threw at least a warning (if not an error that you had to > > explicitly over-ride). > > At least the solaris binaries distributed via postgresql.org[1] have > been compiled with -xO3 according to pg_config. And their code for > multixact.c looks inconspicuous. To recap the data points: > > | compiler | flags | multixact.o | > |---+---+-| > | Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18 | -xO3 | bad | > | Sun C 5.13 SunOS_Sparc 2014/10/20 | -xO2 | good| > | Sun C 5.8 Patch 121015-04 2007/01/10 | -xO3 | good| All the binaries I have compiled for distribution via postgresql.org have been built with Sun Studio 11, aka Sun C 5.8 as listed on the bottom here. Except the 9.5 Alpha where I finally had to upgrade to version 12.1 (aka Sun C 5.10). I am also pretty sure they have always been compiled with -O3. At least that's what the build script is set to and I don't think that has even been changed. - Bjorn Munch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
[ adding Bjorn Munch to Cc ] Jim Nasby writes: > On 9/20/15 9:23 AM, Christoph Berg wrote: >> a short update here: the customer updated the compiler to a newer >> version, is now compiling using -O2 instead of -O3, and the code >> generated now looks sane, so this turned out to be a compiler issue. >> (Though it's unclear if the upgrade fixed it, or the different -O >> level.) > > Do we officially not support anything > -O2? If so it'd be nice if > configure threw at least a warning (if not an error that you had to > explicitly over-ride). At least the solaris binaries distributed via postgresql.org[1] have been compiled with -xO3 according to pg_config. And their code for multixact.c looks inconspicuous. To recap the data points: | compiler | flags | multixact.o | |---+---+-| | Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18 | -xO3 | bad | | Sun C 5.13 SunOS_Sparc 2014/10/20 | -xO2 | good| | Sun C 5.8 Patch 121015-04 2007/01/10 | -xO3 | good| regards, Andreas Footnotes: [1] http://www.postgresql.org/download/solaris/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
Alvaro Herrera writes: > Jim Nasby wrote: >> Do we officially not support anything > -O2? If so it'd be nice if configure >> threw at least a warning (if not an error that you had to explicitly >> over-ride). > > Keep in mind this is Sun OS C -- not one of the most popular compilers > in the world. I don't know what you suggest: have a test program that > configure runs and detects whether the compiler does the wrong thing? > It doesn't seem a sane idea to maintain test cases for all known > compiler bugs ... I think the intention was to make configure complain if there's a -O > 2 in CFLAGS. OTOH, a unit test for multixact.c that exercises the code including wraparounds sounds like a desirable thing regardless of the fact that it could have caught this miscompilation earlier than 6 months into production. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
On 9/20/15 9:23 AM, Christoph Berg wrote: a short update here: the customer updated the compiler to a newer version, is now compiling using -O2 instead of -O3, and the code generated now looks sane, so this turned out to be a compiler issue. (Though it's unclear if the upgrade fixed it, or the different -O level.) Do we officially not support anything > -O2? If so it'd be nice if configure threw at least a warning (if not an error that you had to explicitly over-ride). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
Jim Nasby wrote: > On 9/20/15 9:23 AM, Christoph Berg wrote: > >a short update here: the customer updated the compiler to a newer > >version, is now compiling using -O2 instead of -O3, and the code > >generated now looks sane, so this turned out to be a compiler issue. > >(Though it's unclear if the upgrade fixed it, or the different -O > >level.) > > Do we officially not support anything > -O2? If so it'd be nice if configure > threw at least a warning (if not an error that you had to explicitly > over-ride). Keep in mind this is Sun OS C -- not one of the most popular compilers in the world. I don't know what you suggest: have a test program that configure runs and detects whether the compiler does the wrong thing? It doesn't seem a sane idea to maintain test cases for all known compiler bugs ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
Re: Andreas Seltenreich 2015-09-13 <87si6i1875@credativ.de> > I managed disassemble RecordNewMultiXact from the core dump using a > cross-binutils, and it reveals that the compiler[1] appears to have > indeed generated a signed division here. I'm attaching a piece of C > code that does the same computation as the assembly (I think), as well > as the disassembly itself. > > Footnotes: > [1] Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18, 64-bit Hi, a short update here: the customer updated the compiler to a newer version, is now compiling using -O2 instead of -O3, and the code generated now looks sane, so this turned out to be a compiler issue. (Though it's unclear if the upgrade fixed it, or the different -O level.) Thanks to all who provided feedback, it was very valuable in actually tracking down the root of the issue. Christoph -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
Thomas Munro writes: > In various places we have int pageno = offset / (uint32) 1636, expanded > from this macro (which calls the offset an xid): It appears to depend on the context it is expanded in, as some of the code must have gotten the segment number right: ,[ ls -sh pg_multixact/members/ ] | 256K 97E0 | [...] | 256K A03B | 24K A03C = -5FC4 |0 5FC4 ` > I don't really see how any uint32 value could produce such a pageno via > that macro. Even if called in an environment where (xid) is accidentally > an int, the int / unsigned expression would convert it to unsigned first > (unless (xid) is a bigger type like int64_t: by the rules of int promotion > you'd get signed division in that case, hmm...). But it's always called > with a MultiXactOffset AKA uint32 variable. I managed disassemble RecordNewMultiXact from the core dump using a cross-binutils, and it reveals that the compiler[1] appears to have indeed generated a signed division here. I'm attaching a piece of C code that does the same computation as the assembly (I think), as well as the disassembly itself. regards, Andreas Footnotes: [1] Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18, 64-bit #include #include #include uint32_t offset2page(uint32_t offset) { uint64_t l0, i3, i5, i4, l6_1, l6_2, o2, o4, o1, o3; l0 = offset; i3 = 0x5fc3e800ULL; i5 = i3 ^ -375; i4 = (int32_t)l0; l6_1 = i4 * i5; o2 = l6_1 >> 32; o4 = l0 + o2; o1 = ((int32_t)o4) >> 10; o3 = ((int32_t)l0) >> 31; l6_2 = o1 - o3; return l6_2; } int main(int argc, char *argv[]) { uint32_t page = offset2page(atol(argv[1])); printf("page: %d\n", page); printf("segment: %04X\n", (int32_t)page/32); return 0; } RecordNewMultiXact: 100112be8: 07 00 04 01 sethi %hi(0x100400), %g3 100112bec: 9d e3 bf 30 save %sp, -208, %sp 100112bf0: 82 10 e2 bd or %g3, 0x2bd, %g1 100112bf4: 90 10 20 0e mov 0xe, %o0 100112bf8: a5 36 20 00 srl %i0, 0, %l2 100112bfc: 9f 28 70 0c sllx %g1, 0xc, %o7 100112c00: a2 0c a7 ff and %l2, 0x7ff, %l1 100112c04: b9 3e a0 00 sra %i2, 0, %i4 100112c08: b4 03 ed d8 add %o7, 0xdd8, %i2 100112c0c: b1 36 60 00 srl %i1, 0, %i0 100112c10: f0 23 a8 af st %i0, [ %sp + 0x8af ] 100112c14: b2 10 00 1b mov %i3, %i1 100112c18: 40 08 1d 66 call 0x10031a1b0 ; LWLockAcquire 100112c1c: 92 10 20 00 clr %o1 100112c20: 93 34 a0 0b srl %l2, 0xb, %o1 100112c24: 90 10 00 1a mov %i2, %o0 100112c28: 97 34 a0 00 srl %l2, 0, %o3 100112c2c: 7f ff f6 bd call 0x100110720 ; SimpleLruReadPage 100112c30: 94 10 20 01 mov 1, %o2 100112c34: fa 5e a0 00 ldx [ %i2 ], %i5 100112c38: ab 3a 20 00 sra %o0, 0, %l5 100112c3c: 96 10 20 01 mov 1, %o3 100112c40: a9 2d 70 03 sllx %l5, 3, %l4 100112c44: e0 5f 60 08 ldx [ %i5 + 8 ], %l0 100112c48: a7 3c 60 00 sra %l1, 0, %l3 100112c4c: ad 2c f0 02 sllx %l3, 2, %l6 100112c50: da 5c 00 14 ldx [ %l0 + %l4 ], %o5 100112c54: f0 23 40 16 st %i0, [ %o5 + %l6 ] 100112c58: d8 5e a0 00 ldx [ %i2 ], %o4 100112c5c: d4 5b 20 18 ldx [ %o4 + 0x18 ], %o2 100112c60: d6 2a 80 15 stb %o3, [ %o2 + %l5 ] 100112c64: 40 08 1f 2d call 0x10031a918 ; LWLockRelease 100112c68: 90 10 20 0e mov 0xe, %o0 100112c6c: 90 10 20 0f mov 0xf, %o0 100112c70: 40 08 1d 50 call 0x10031a1b0 ; LWLockAcquire 100112c74: 92 10 20 00 clr %o1 100112c78: 80 a7 20 00 cmp %i4, 0 100112c7c: 04 40 00 7c ble,pn %icc, 0x100112e6c 100112c80: 90 07 3f ff add %i4, -1, %o0 100112c84: e0 03 a8 af ld [ %sp + 0x8af ], %l0 100112c88: 37 17 f0 fa sethi %hi(0x5fc3e800), %i3 100112c8c: 86 10 3f ff mov -1, %g3 100112c90: ba 1e fe 89 xor %i3, -375, %i5 100112c94: d0 23 a8 b7 st %o0, [ %sp + 0x8b7 ] 100112c98: a2 10 26 64 mov 0x664, %l1 100112c9c: c0 23 a8 b3 clr [ %sp + 0x8b3 ] 100112ca0: 37 10 1e 0b sethi %hi(0x40782c00), %i3 100112ca4: b9 3c 20 00 sra %l0, 0, %i4 100112ca8: ac 4f 00 1d mulx %i4, %i5, %l6 100112cac: 95 35 b0 20 srlx %l6, 0x20, %o2 100112cb0: ba 10 26 63 mov 0x663, %i5 100112cb4: 98 04 00 0a add %l0, %o2, %o4 100112cb8: b8 10 20 01 mov 1, %i4 100112cbc: 93 3b 20 0a sra %o4, 0xa, %o1 100112cc0: 97 3c 20 1f sra %l0, 0x1f, %o3 100112cc4: ac 22 40 0b sub %o1, %o3, %l6 100112cc8: 89 2d a0 04 sll %l6, 4, %g4 100112ccc: 82 01 00 16 add %g4, %l6, %g1 100112cd0: b1 28 60 02 sll %g1, 2, %i0 100112cd4: 84 26 00 01 sub %i0, %g1, %g2 100112cd8: f0 03 a8 af ld [ %sp + 0x8af ], %i0 100112cdc: 9f 28 a0 03 sll %g2, 3, %o7 100112ce0: 90 05 80 0f add %l6,
Re: [HACKERS] 9.3.9 and pg_multixact corruption
...and here it is ;) --On 10. September 2015 19:45:46 -0300 Alvaro Herrerawrote: > Anyway, can you please request pg_controldata to be run on the failed > cluster and paste it here? pg_control version number:937 Catalog version number: 201306121 Database system identifier: 5995776571405068134 Database cluster state: in archive recovery pg_control last modified: Di 08 Sep 2015 14:58:36 CEST Latest checkpoint location: 1A52/3CFAF758 Prior checkpoint location:1A52/3CFAF758 Latest checkpoint's REDO location:1A52/2313FEF8 Latest checkpoint's REDO WAL file:00011A520023 Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/2896610102 Latest checkpoint's NextOID: 261892 Latest checkpoint's NextMultiXactId: 1068223816 Latest checkpoint's NextMultiOffset: 2147460090 Latest checkpoint's oldestXID:2693040605 Latest checkpoint's oldestXID's DB: 16400 Latest checkpoint's oldestActiveXID: 0 Latest checkpoint's oldestMultiXid: 1012219584 Latest checkpoint's oldestMulti's DB: 16400 Time of latest checkpoint:Di 08 Sep 2015 00:47:01 CEST Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 1A52/2313FEF8 Min recovery ending loc's timeline: 1 Backup start location:1A52/2313FEF8 Backup end location: 0/0 End-of-backup record required:no Current wal_level setting:archive Current max_connections setting: 500 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment:16777216 Maximum length of identifiers:64 Maximum columns in an index: 32 Maximum size of a TOAST chunk:1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 -- Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
Re: Bernd Helmle 2015-09-10 <7E3C7F8D210AC9A423E96F3A@eje.local> > 2015-09-08 11:40:59 CEST [27047] DETAIL: Could not seek in file > "pg_multixact/members/5FC4" to offset 4294950912: Invalid argument. > 2015-09-08 11:40:59 CEST [27047] CONTEXT: xlog redo create mxid 1068235595 > offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) > 2015-09-08 11:40:59 CEST [27045] LOG: startup process (PID 27047) exited > with exit code 1 > 2015-09-08 11:40:59 CEST [27045] LOG: aborting startup due to startup > process failure > > Some side notes: > > An additional recovery from a base backup and archive recovery yield to the > same error, as soon as the affected tuple was touched with a DELETE. The > affected table was fully dumpable via pg_dump, though. A few more words here: the archive recovery was a pitr to 00:45, so well before the problem, and the cluster was initially working well, but crashed shortly after with the same mxid 1068235595 message. The crash was triggered from a delete on a different table (which was related schema-wise, but iirc neither of these tables has any FKs). We then rewound the system to a zfs snapshot taken when the archive recovery had finished (db shut down cleanly), and put it up again, when it again crashed with mxid 1068235595, this time on a third table. The original crash and the first post-recovery crash happened a few minutes after pg_start_backup(), though the next crash was without that. (While the archive recovery was running, I had pg_resetxlog the original cluster. It was possible to isolate the ctid of an affected tuple, but it wasn't possible to DELETE it, yielding an error message similar to the above, but the database would continue. I then zeroed the bad block using dd (zero_damaged_pages didn't help), only to find that at least one more tuple in that table was affected (with a different mxid).) Christoph -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
On 2015-09-11 14:25:39 +0200, Christoph Berg wrote: > A few more words here: the archive recovery was a pitr to 00:45, so > well before the problem, and the cluster was initially working well, > but crashed shortly after with the same mxid 1068235595 message. The > crash was triggered from a delete on a different table (which was > related schema-wise, but iirc neither of these tables has any FKs). > > We then rewound the system to a zfs snapshot taken when the archive > recovery had finished (db shut down cleanly), and put it up again, > when it again crashed with mxid 1068235595, this time on a third > table. > > The original crash and the first post-recovery crash happened a few > minutes after pg_start_backup(), though the next crash was without > that. Do you still have access to that data? That'd make investigation of both the issue and fixes/workaround far easier. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
Bernd Helmle wrote: > A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11 > instance. > > The database crashed with the following log messages: > > 2015-09-08 00:49:16 CEST [2912] PANIC: could not access status of > transaction 1068235595 > 2015-09-08 00:49:16 CEST [2912] DETAIL: Could not open file > "pg_multixact/members/5FC4": No such file or directory. > 2015-09-08 00:49:16 CEST [2912] STATEMENT: delete from StockTransfer > where oid = $1 and tanum = $2 I wonder if these bogus page and offset numbers are just SlruReportIOError being confused because pg_multixact/members is so weird (I don't think it should be the case, since this stuff is using page numbers only, not anything related to how each page is layed out). Anyway, can you please request pg_controldata to be run on the failed cluster and paste it here? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
On Fri, Sep 11, 2015 at 10:45 AM, Alvaro Herrerawrote: > Bernd Helmle wrote: > > A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11 > > instance. > > > > The database crashed with the following log messages: > > > > 2015-09-08 00:49:16 CEST [2912] PANIC: could not access status of > > transaction 1068235595 > > 2015-09-08 00:49:16 CEST [2912] DETAIL: Could not open file > > "pg_multixact/members/5FC4": No such file or directory. > > 2015-09-08 00:49:16 CEST [2912] STATEMENT: delete from StockTransfer > > where oid = $1 and tanum = $2 > > I wonder if these bogus page and offset numbers are just > SlruReportIOError being confused because pg_multixact/members is so > weird (I don't think it should be the case, since this stuff is using > page numbers only, not anything related to how each page is layed out). > But SlruReportIOError uses the same macro to build the filename as SlruReadPhysicalPage and other functions, namely SlruFileName which uses sprintf with %04X (unsigned integer uppercase hex) and gives it segno (which is always an int), so I don't think the problem is in error reporting only. Assuming default block size, to get 5FC4 from SlruFileName you need segno == -41020. We have int segno = pageno / 32 (that's SLRU_PAGES_PER_SEGMENT), so to get segno == -41020 you need pageno between -1312640 and -1312609 (whose bit patterns reinterpreted as unsigned are 4293654656 and 4293654687). In various places we have int pageno = offset / (uint32) 1636, expanded from this macro (which calls the offset an xid): #define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) MULTIXACT_MEMBERS_PER_PAGE) I don't really see how any uint32 value could produce such a pageno via that macro. Even if called in an environment where (xid) is accidentally an int, the int / unsigned expression would convert it to unsigned first (unless (xid) is a bigger type like int64_t: by the rules of int promotion you'd get signed division in that case, hmm...). But it's always called with a MultiXactOffset AKA uint32 variable. So via that route, there is no MultiXactOffset value that can't be mapped to a segment in the range "", "14078". Famously, it wraps after that. Maybe the negative pageno came from somewhere else. Where? Inside SLRU code we can see pageno = shared->page_number[slotno]... maybe the SLRU slots got corrupted somehow? -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] 9.3.9 and pg_multixact corruption
Bernd Helmle wrote: > 2015-09-08 11:40:59 CEST [27047] DETAIL: Could not seek in file > "pg_multixact/members/5FC4" to offset 4294950912: Invalid argument. > 2015-09-08 11:40:59 CEST [27047] CONTEXT: xlog redo create mxid 1068235595 > offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) I just noticed that this offset number 2147483648 is exactly 2^31. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
On Fri, Sep 11, 2015 at 11:51 AM, Thomas Munro < thomas.mu...@enterprisedb.com> wrote: > On Fri, Sep 11, 2015 at 10:45 AM, Alvaro Herrera> wrote: > >> Bernd Helmle wrote: >> > A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11 >> > instance. >> > >> > The database crashed with the following log messages: >> > >> > 2015-09-08 00:49:16 CEST [2912] PANIC: could not access status of >> > transaction 1068235595 >> > 2015-09-08 00:49:16 CEST [2912] DETAIL: Could not open file >> > "pg_multixact/members/5FC4": No such file or directory. >> > 2015-09-08 00:49:16 CEST [2912] STATEMENT: delete from StockTransfer >> > where oid = $1 and tanum = $2 >> >> I wonder if these bogus page and offset numbers are just >> SlruReportIOError being confused because pg_multixact/members is so >> weird (I don't think it should be the case, since this stuff is using >> page numbers only, not anything related to how each page is layed out). >> > > But SlruReportIOError uses the same macro to build the filename as > SlruReadPhysicalPage and other functions, namely SlruFileName which uses > sprintf with %04X (unsigned integer uppercase hex) and gives it segno > (which is always an int), so I don't think the problem is in error > reporting only. > > Assuming default block size, to get 5FC4 from SlruFileName you need > segno == -41020. > Oops, I meant to attach the proviso "Assuming default block size" to the assumption further down that MULTIXACT_MEMBERS_PER_PAGE == 1636. > We have int segno = pageno / 32 (that's SLRU_PAGES_PER_SEGMENT), so to get > segno == -41020 you need pageno between -1312640 and -1312609 (whose bit > patterns reinterpreted as unsigned are 4293654656 and 4293654687). > > In various places we have int pageno = offset / (uint32) 1636, expanded > from this macro (which calls the offset an xid): > > #define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) > MULTIXACT_MEMBERS_PER_PAGE) > I don't really see how any uint32 value could produce such a pageno via > that macro. Even if called in an environment where (xid) is accidentally > an int, the int / unsigned expression would convert it to unsigned first > (unless (xid) is a bigger type like int64_t: by the rules of int promotion > you'd get signed division in that case, hmm...). But it's always called > with a MultiXactOffset AKA uint32 variable. > > So via that route, there is no MultiXactOffset value that can't be mapped > to a segment in the range "", "14078". Famously, it wraps after that. > > Maybe the negative pageno came from somewhere else. Where? Inside SLRU > code we can see pageno = shared->page_number[slotno]... maybe the SLRU > slots got corrupted somehow? > -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] 9.3.9 and pg_multixact corruption
Bernd Helmle wrote: > A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11 > instance. > > 2015-09-08 11:40:59 CEST [27047] FATAL: could not access status of > transaction 1068235595 > 2015-09-08 11:40:59 CEST [27047] DETAIL: Could not seek in file > "pg_multixact/members/5FC4" to offset 4294950912: Invalid argument. > 2015-09-08 11:40:59 CEST [27047] CONTEXT: xlog redo create mxid 1068235595 > offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) I think the math to compute segment number and byte offset of the member might be bogus here. The file names in pg_multixact/members is supposed to go to 14078 (hex) in a 8kB-BLCKSZ build, and of course it goes a bit higher in builds with smaller page sizes, but nowhere as high as 5FC4. And the offset is way too close to 2^32 (exactly 16384 less than 2^32, to be precise.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.3.9 and pg_multixact corruption
A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11 instance. The database crashed with the following log messages: 2015-09-08 00:49:16 CEST [2912] PANIC: could not access status of transaction 1068235595 2015-09-08 00:49:16 CEST [2912] DETAIL: Could not open file "pg_multixact/members/5FC4": No such file or directory. 2015-09-08 00:49:16 CEST [2912] STATEMENT: delete from StockTransfer where oid = $1 and tanum = $2 When they called us later, it turned out that the crash happened during a base backup, leaving a backup_label behind which prevented the database coming up again with a invalid checkpoint location. However, removing the backup_label still didn't let the database through recovery, it failed again with the former error, this time during recovery: 2015-09-08 11:40:04 CEST [27047] LOG: database system was interrupted while in recovery at 2015-09-08 11:19:44 CEST 2015-09-08 11:40:04 CEST [27047] HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. 2015-09-08 11:40:04 CEST [27047] LOG: database system was not properly shut down; automatic recovery in progress 2015-09-08 11:40:05 CEST [27047] LOG: redo starts at 1A52/2313FEF8 2015-09-08 11:40:47 CEST [27082] FATAL: the database system is starting up 2015-09-08 11:40:59 CEST [27047] FATAL: could not access status of transaction 1068235595 2015-09-08 11:40:59 CEST [27047] DETAIL: Could not seek in file "pg_multixact/members/5FC4" to offset 4294950912: Invalid argument. 2015-09-08 11:40:59 CEST [27047] CONTEXT: xlog redo create mxid 1068235595 offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) 2015-09-08 11:40:59 CEST [27045] LOG: startup process (PID 27047) exited with exit code 1 2015-09-08 11:40:59 CEST [27045] LOG: aborting startup due to startup process failure Some side notes: An additional recovery from a base backup and archive recovery yield to the same error, as soon as the affected tuple was touched with a DELETE. The affected table was fully dumpable via pg_dump, though. We also have a core dump, but no direct access to the machine. If there's more information required (and i believe it is), let me know where to dig deeper. I also would like to request a backtrace from the existing core dump, but in the absence of a sparc64 machine here we need to ask the customer to get one. -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers