subject:"\[HACKERS\] 9.3.9 and pg

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-10-04 Thread Noah Misch

On Mon, Sep 28, 2015 at 11:10:52AM -0400, Robert Haas wrote:
> On Fri, Sep 25, 2015 at 3:41 AM, Andreas Seltenreich 
>  wrote:
> > OTOH, a unit test for multixact.c that exercises the code including
> > wraparounds sounds like a desirable thing regardless of the fact that it
> > could have caught this miscompilation earlier than 6 months into
> > production.
> 
> Definitely.

+1.  Catching compiler bugs is an important role of "make check".  That suite
has a notable gap if it passes despite a compiler bug ruining the binary.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-28 Thread Josh Berkus

On 09/28/2015 08:10 AM, Robert Haas wrote:
> -1 on that idea.  I really don't think that we should categorically
> decide we don't support higher optimization levels.  If the compiler
> has a bug, then the compiler manufacturer should fix it, and it's not
> our fault.  If the compiler doesn't have a bug and our stuff is
> blowing up, then we have a bug and should fix it.  I suppose there
> could be some grey area but hopefully not too much.

Or it's PILBChAK. I know Sun-CC used to warn that -O3 was unsuitable for
most programs because it could change behavior.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-28 Thread Robert Haas

On Mon, Sep 28, 2015 at 2:34 PM, Josh Berkus  wrote:
> On 09/28/2015 08:10 AM, Robert Haas wrote:
>> -1 on that idea.  I really don't think that we should categorically
>> decide we don't support higher optimization levels.  If the compiler
>> has a bug, then the compiler manufacturer should fix it, and it's not
>> our fault.  If the compiler doesn't have a bug and our stuff is
>> blowing up, then we have a bug and should fix it.  I suppose there
>> could be some grey area but hopefully not too much.
>
> Or it's PILBChAK. I know Sun-CC used to warn that -O3 was unsuitable for
> most programs because it could change behavior.

I'm attempting to decipher this acronym.  Problem is located between
chair and keyboard?  But I don't see how that applies here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-28 Thread Robert Haas

On Fri, Sep 25, 2015 at 3:41 AM, Andreas Seltenreich
 wrote:
> I think the intention was to make configure complain if there's a -O > 2
> in CFLAGS.

-1 on that idea.  I really don't think that we should categorically
decide we don't support higher optimization levels.  If the compiler
has a bug, then the compiler manufacturer should fix it, and it's not
our fault.  If the compiler doesn't have a bug and our stuff is
blowing up, then we have a bug and should fix it.  I suppose there
could be some grey area but hopefully not too much.

> OTOH, a unit test for multixact.c that exercises the code including
> wraparounds sounds like a desirable thing regardless of the fact that it
> could have caught this miscompilation earlier than 6 months into
> production.

Definitely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-25 Thread Bjorn Munch

On 25/09 09.37, Andreas Seltenreich wrote:
> [ adding Bjorn Munch to Cc ]

Oh. I am on the -hackers list but usually just scan for any subject
mentioning Solaris and this one did not. :-)

> Jim Nasby writes:
> > On 9/20/15 9:23 AM, Christoph Berg wrote:
> >> a short update here: the customer updated the compiler to a newer
> >> version, is now compiling using -O2 instead of -O3, and the code
> >> generated now looks sane, so this turned out to be a compiler issue.
> >> (Though it's unclear if the upgrade fixed it, or the different -O
> >> level.)
> >
> > Do we officially not support anything > -O2? If so it'd be nice if
> > configure threw at least a warning (if not an error that you had to
> > explicitly over-ride).
> 
> At least the solaris binaries distributed via postgresql.org[1] have
> been compiled with -xO3 according to pg_config.  And their code for
> multixact.c looks inconspicuous.  To recap the data points:
> 
> | compiler  | flags | multixact.o |
> |---+---+-|
> | Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18 | -xO3  | bad |
> | Sun C 5.13 SunOS_Sparc 2014/10/20 | -xO2  | good|
> | Sun C 5.8 Patch 121015-04 2007/01/10  | -xO3  | good|

All the binaries I have compiled for distribution via postgresql.org
have been built with Sun Studio 11, aka Sun C 5.8 as listed on the
bottom here. Except the 9.5 Alpha where I finally had to upgrade to
version 12.1 (aka Sun C 5.10).

I am also pretty sure they have always been compiled with -O3. At
least that's what the build script is set to and I don't think that
has even been changed.

- Bjorn Munch


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-25 Thread Andreas Seltenreich

[ adding Bjorn Munch to Cc ]

Jim Nasby writes:
> On 9/20/15 9:23 AM, Christoph Berg wrote:
>> a short update here: the customer updated the compiler to a newer
>> version, is now compiling using -O2 instead of -O3, and the code
>> generated now looks sane, so this turned out to be a compiler issue.
>> (Though it's unclear if the upgrade fixed it, or the different -O
>> level.)
>
> Do we officially not support anything > -O2? If so it'd be nice if
> configure threw at least a warning (if not an error that you had to
> explicitly over-ride).

At least the solaris binaries distributed via postgresql.org[1] have
been compiled with -xO3 according to pg_config.  And their code for
multixact.c looks inconspicuous.  To recap the data points:

| compiler  | flags | multixact.o |
|---+---+-|
| Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18 | -xO3  | bad |
| Sun C 5.13 SunOS_Sparc 2014/10/20 | -xO2  | good|
| Sun C 5.8 Patch 121015-04 2007/01/10  | -xO3  | good|

regards,
Andreas

Footnotes: 
[1]  http://www.postgresql.org/download/solaris/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-25 Thread Andreas Seltenreich

Alvaro Herrera writes:

> Jim Nasby wrote:
>> Do we officially not support anything > -O2? If so it'd be nice if configure
>> threw at least a warning (if not an error that you had to explicitly
>> over-ride).
>
> Keep in mind this is Sun OS C -- not one of the most popular compilers
> in the world.  I don't know what you suggest: have a test program that
> configure runs and detects whether the compiler does the wrong thing?
> It doesn't seem a sane idea to maintain test cases for all known
> compiler bugs ...

I think the intention was to make configure complain if there's a -O > 2
in CFLAGS.

OTOH, a unit test for multixact.c that exercises the code including
wraparounds sounds like a desirable thing regardless of the fact that it
could have caught this miscompilation earlier than 6 months into
production.

regards,
Andreas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-24 Thread Jim Nasby


On 9/20/15 9:23 AM, Christoph Berg wrote:

a short update here: the customer updated the compiler to a newer
version, is now compiling using -O2 instead of -O3, and the code
generated now looks sane, so this turned out to be a compiler issue.
(Though it's unclear if the upgrade fixed it, or the different -O
level.)


Do we officially not support anything > -O2? If so it'd be nice if 
configure threw at least a warning (if not an error that you had to 
explicitly over-ride).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-24 Thread Alvaro Herrera

Jim Nasby wrote:
> On 9/20/15 9:23 AM, Christoph Berg wrote:
> >a short update here: the customer updated the compiler to a newer
> >version, is now compiling using -O2 instead of -O3, and the code
> >generated now looks sane, so this turned out to be a compiler issue.
> >(Though it's unclear if the upgrade fixed it, or the different -O
> >level.)
> 
> Do we officially not support anything > -O2? If so it'd be nice if configure
> threw at least a warning (if not an error that you had to explicitly
> over-ride).

Keep in mind this is Sun OS C -- not one of the most popular compilers
in the world.  I don't know what you suggest: have a test program that
configure runs and detects whether the compiler does the wrong thing?
It doesn't seem a sane idea to maintain test cases for all known
compiler bugs ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-20 Thread Christoph Berg

Re: Andreas Seltenreich 2015-09-13 <87si6i1875@credativ.de>
> I managed disassemble RecordNewMultiXact from the core dump using a
> cross-binutils, and it reveals that the compiler[1] appears to have
> indeed generated a signed division here.  I'm attaching a piece of C
> code that does the same computation as the assembly (I think), as well
> as the disassembly itself.
> 
> Footnotes: 
> [1]  Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18, 64-bit

Hi,

a short update here: the customer updated the compiler to a newer
version, is now compiling using -O2 instead of -O3, and the code
generated now looks sane, so this turned out to be a compiler issue.
(Though it's unclear if the upgrade fixed it, or the different -O
level.)

Thanks to all who provided feedback, it was very valuable in actually
tracking down the root of the issue.

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-13 Thread Andreas Seltenreich

Thomas Munro writes:

> In various places we have int pageno = offset / (uint32) 1636, expanded
> from this macro (which calls the offset an xid):

It appears to depend on the context it is expanded in, as some of the
code must have gotten the segment number right:

,[ ls -sh pg_multixact/members/ ]
| 256K 97E0
| [...]
| 256K A03B
|  24K A03C = -5FC4
|0 5FC4
`

> I don't really see how any uint32 value could produce such a pageno via
> that macro.  Even if called in an environment where (xid) is accidentally
> an int, the int / unsigned expression would convert it to unsigned first
> (unless (xid) is a bigger type like int64_t: by the rules of int promotion
> you'd get signed division in that case, hmm...).  But it's always called
> with a MultiXactOffset AKA uint32 variable.

I managed disassemble RecordNewMultiXact from the core dump using a
cross-binutils, and it reveals that the compiler[1] appears to have
indeed generated a signed division here.  I'm attaching a piece of C
code that does the same computation as the assembly (I think), as well
as the disassembly itself.

regards,
Andreas

Footnotes: 
[1]  Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18, 64-bit

#include 
#include 
#include 

uint32_t offset2page(uint32_t offset)
{
  uint64_t l0, i3, i5, i4, l6_1, l6_2, o2, o4, o1, o3;
  l0 = offset;
  i3 = 0x5fc3e800ULL;
  i5 = i3 ^ -375;
  i4 = (int32_t)l0;

  l6_1 = i4 * i5;
  o2 = l6_1 >> 32;
  o4 = l0 + o2;
  o1 = ((int32_t)o4) >> 10;
  o3 = ((int32_t)l0) >> 31;
  l6_2 = o1 - o3;
  return l6_2;
}

int main(int argc, char *argv[])
{
  uint32_t page = offset2page(atol(argv[1]));
  printf("page: %d\n", page);
  printf("segment: %04X\n", (int32_t)page/32);
  return 0;
}

RecordNewMultiXact:
   100112be8:   07 00 04 01 sethi  %hi(0x100400), %g3
   100112bec:   9d e3 bf 30 save  %sp, -208, %sp
   100112bf0:   82 10 e2 bd or  %g3, 0x2bd, %g1
   100112bf4:   90 10 20 0e mov  0xe, %o0
   100112bf8:   a5 36 20 00 srl  %i0, 0, %l2
   100112bfc:   9f 28 70 0c sllx  %g1, 0xc, %o7
   100112c00:   a2 0c a7 ff and  %l2, 0x7ff, %l1
   100112c04:   b9 3e a0 00 sra  %i2, 0, %i4
   100112c08:   b4 03 ed d8 add  %o7, 0xdd8, %i2
   100112c0c:   b1 36 60 00 srl  %i1, 0, %i0
   100112c10:   f0 23 a8 af st  %i0, [ %sp + 0x8af ]
   100112c14:   b2 10 00 1b mov  %i3, %i1
   100112c18:   40 08 1d 66 call  0x10031a1b0 ; LWLockAcquire
   100112c1c:   92 10 20 00 clr  %o1
   100112c20:   93 34 a0 0b srl  %l2, 0xb, %o1
   100112c24:   90 10 00 1a mov  %i2, %o0
   100112c28:   97 34 a0 00 srl  %l2, 0, %o3
   100112c2c:   7f ff f6 bd call  0x100110720 ; SimpleLruReadPage
   100112c30:   94 10 20 01 mov  1, %o2
   100112c34:   fa 5e a0 00 ldx  [ %i2 ], %i5
   100112c38:   ab 3a 20 00 sra  %o0, 0, %l5
   100112c3c:   96 10 20 01 mov  1, %o3
   100112c40:   a9 2d 70 03 sllx  %l5, 3, %l4
   100112c44:   e0 5f 60 08 ldx  [ %i5 + 8 ], %l0
   100112c48:   a7 3c 60 00 sra  %l1, 0, %l3
   100112c4c:   ad 2c f0 02 sllx  %l3, 2, %l6
   100112c50:   da 5c 00 14 ldx  [ %l0 + %l4 ], %o5
   100112c54:   f0 23 40 16 st  %i0, [ %o5 + %l6 ]
   100112c58:   d8 5e a0 00 ldx  [ %i2 ], %o4
   100112c5c:   d4 5b 20 18 ldx  [ %o4 + 0x18 ], %o2
   100112c60:   d6 2a 80 15 stb  %o3, [ %o2 + %l5 ]
   100112c64:   40 08 1f 2d call  0x10031a918 ; LWLockRelease
   100112c68:   90 10 20 0e mov  0xe, %o0
   100112c6c:   90 10 20 0f mov  0xf, %o0
   100112c70:   40 08 1d 50 call  0x10031a1b0 ; LWLockAcquire
   100112c74:   92 10 20 00 clr  %o1
   100112c78:   80 a7 20 00 cmp  %i4, 0
   100112c7c:   04 40 00 7c ble,pn   %icc, 0x100112e6c
   100112c80:   90 07 3f ff add  %i4, -1, %o0
   100112c84:   e0 03 a8 af ld  [ %sp + 0x8af ], %l0
   100112c88:   37 17 f0 fa sethi  %hi(0x5fc3e800), %i3
   100112c8c:   86 10 3f ff mov  -1, %g3
   100112c90:   ba 1e fe 89 xor  %i3, -375, %i5
   100112c94:   d0 23 a8 b7 st  %o0, [ %sp + 0x8b7 ]
   100112c98:   a2 10 26 64 mov  0x664, %l1
   100112c9c:   c0 23 a8 b3 clr  [ %sp + 0x8b3 ]
   100112ca0:   37 10 1e 0b sethi  %hi(0x40782c00), %i3
   100112ca4:   b9 3c 20 00 sra  %l0, 0, %i4
   100112ca8:   ac 4f 00 1d mulx  %i4, %i5, %l6
   100112cac:   95 35 b0 20 srlx  %l6, 0x20, %o2
   100112cb0:   ba 10 26 63 mov  0x663, %i5
   100112cb4:   98 04 00 0a add  %l0, %o2, %o4
   100112cb8:   b8 10 20 01 mov  1, %i4
   100112cbc:   93 3b 20 0a sra  %o4, 0xa, %o1
   100112cc0:   97 3c 20 1f sra  %l0, 0x1f, %o3
   100112cc4:   ac 22 40 0b sub  %o1, %o3, %l6
   100112cc8:   89 2d a0 04 sll  %l6, 4, %g4
   100112ccc:   82 01 00 16 add  %g4, %l6, %g1
   100112cd0:   b1 28 60 02 sll  %g1, 2, %i0
   100112cd4:   84 26 00 01 sub  %i0, %g1, %g2
   100112cd8:   f0 03 a8 af ld  [ %sp + 0x8af ], %i0
   100112cdc:   9f 28 a0 03 sll  %g2, 3, %o7
   100112ce0:   90 05 80 0f add  %l6,

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-11 Thread Bernd Helmle

...and here it is ;)

--On 10. September 2015 19:45:46 -0300 Alvaro Herrera
 wrote:

> Anyway, can you please request pg_controldata to be run on the failed
> cluster and paste it here?

pg_control version number:937
Catalog version number:   201306121
Database system identifier:   5995776571405068134
Database cluster state:   in archive recovery
pg_control last modified: Di 08 Sep 2015 14:58:36 CEST
Latest checkpoint location:   1A52/3CFAF758
Prior checkpoint location:1A52/3CFAF758
Latest checkpoint's REDO location:1A52/2313FEF8
Latest checkpoint's REDO WAL file:00011A520023
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  0/2896610102
Latest checkpoint's NextOID:  261892
Latest checkpoint's NextMultiXactId:  1068223816
Latest checkpoint's NextMultiOffset:  2147460090
Latest checkpoint's oldestXID:2693040605
Latest checkpoint's oldestXID's DB:   16400
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1012219584
Latest checkpoint's oldestMulti's DB: 16400
Time of latest checkpoint:Di 08 Sep 2015 00:47:01 CEST
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location: 1A52/2313FEF8
Min recovery ending loc's timeline:   1
Backup start location:1A52/2313FEF8
Backup end location:  0/0
End-of-backup record required:no
Current wal_level setting:archive
Current max_connections setting:  500
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value
Data page checksum version:   0

-- 

Bernd



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-11 Thread Christoph Berg

Re: Bernd Helmle 2015-09-10 <7E3C7F8D210AC9A423E96F3A@eje.local>
> 2015-09-08 11:40:59 CEST [27047] DETAIL:  Could not seek in file
> "pg_multixact/members/5FC4" to offset 4294950912: Invalid argument.
> 2015-09-08 11:40:59 CEST [27047] CONTEXT:  xlog redo create mxid 1068235595
> offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) 
> 2015-09-08 11:40:59 CEST [27045] LOG:  startup process (PID 27047) exited
> with exit code 1
> 2015-09-08 11:40:59 CEST [27045] LOG:  aborting startup due to startup
> process failure
> 
> Some side notes:
> 
> An additional recovery from a base backup and archive recovery yield to the
> same error, as soon as the affected tuple was touched with a DELETE. The
> affected table was fully dumpable via pg_dump, though.

A few more words here: the archive recovery was a pitr to 00:45, so
well before the problem, and the cluster was initially working well,
but crashed shortly after with the same mxid 1068235595 message. The
crash was triggered from a delete on a different table (which was
related schema-wise, but iirc neither of these tables has any FKs).

We then rewound the system to a zfs snapshot taken when the archive
recovery had finished (db shut down cleanly), and put it up again,
when it again crashed with mxid 1068235595, this time on a third
table.

The original crash and the first post-recovery crash happened a few
minutes after pg_start_backup(), though the next crash was without
that.


(While the archive recovery was running, I had pg_resetxlog the
original cluster. It was possible to isolate the ctid of an affected
tuple, but it wasn't possible to DELETE it, yielding an error message
similar to the above, but the database would continue. I then zeroed
the bad block using dd (zero_damaged_pages didn't help), only to find
that at least one more tuple in that table was affected (with a
different mxid).)

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-11 Thread Andres Freund

On 2015-09-11 14:25:39 +0200, Christoph Berg wrote:
> A few more words here: the archive recovery was a pitr to 00:45, so
> well before the problem, and the cluster was initially working well,
> but crashed shortly after with the same mxid 1068235595 message. The
> crash was triggered from a delete on a different table (which was
> related schema-wise, but iirc neither of these tables has any FKs).
>
> We then rewound the system to a zfs snapshot taken when the archive
> recovery had finished (db shut down cleanly), and put it up again,
> when it again crashed with mxid 1068235595, this time on a third
> table.
>
> The original crash and the first post-recovery crash happened a few
> minutes after pg_start_backup(), though the next crash was without
> that.

Do you still have access to that data? That'd make investigation of both
the issue and fixes/workaround far easier.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-10 Thread Alvaro Herrera

Bernd Helmle wrote:
> A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11
> instance.
> 
> The database crashed with the following log messages:
> 
> 2015-09-08 00:49:16 CEST [2912] PANIC:  could not access status of
> transaction 1068235595
> 2015-09-08 00:49:16 CEST [2912] DETAIL:  Could not open file
> "pg_multixact/members/5FC4": No such file or directory.
> 2015-09-08 00:49:16 CEST [2912] STATEMENT:  delete from StockTransfer
> where oid = $1 and tanum = $2 

I wonder if these bogus page and offset numbers are just
SlruReportIOError being confused because pg_multixact/members is so
weird (I don't think it should be the case, since this stuff is using
page numbers only, not anything related to how each page is layed out).

Anyway, can you please request pg_controldata to be run on the failed
cluster and paste it here?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-10 Thread Thomas Munro

On Fri, Sep 11, 2015 at 10:45 AM, Alvaro Herrera 
wrote:

> Bernd Helmle wrote:
> > A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11
> > instance.
> >
> > The database crashed with the following log messages:
> >
> > 2015-09-08 00:49:16 CEST [2912] PANIC:  could not access status of
> > transaction 1068235595
> > 2015-09-08 00:49:16 CEST [2912] DETAIL:  Could not open file
> > "pg_multixact/members/5FC4": No such file or directory.
> > 2015-09-08 00:49:16 CEST [2912] STATEMENT:  delete from StockTransfer
> > where oid = $1 and tanum = $2
>
> I wonder if these bogus page and offset numbers are just
> SlruReportIOError being confused because pg_multixact/members is so
> weird (I don't think it should be the case, since this stuff is using
> page numbers only, not anything related to how each page is layed out).
>

But SlruReportIOError uses the same macro to build the filename as
SlruReadPhysicalPage and other functions, namely SlruFileName which uses
sprintf with %04X (unsigned integer uppercase hex) and gives it segno
(which is always an int), so I don't think the problem is in error
reporting only.

Assuming default block size, to get 5FC4 from SlruFileName you need
segno == -41020.

We have int segno = pageno / 32 (that's SLRU_PAGES_PER_SEGMENT), so to get
segno == -41020 you need pageno between -1312640 and -1312609 (whose bit
patterns  reinterpreted as unsigned are 4293654656 and 4293654687).

In various places we have int pageno = offset / (uint32) 1636, expanded
from this macro (which calls the offset an xid):

#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId)
MULTIXACT_MEMBERS_PER_PAGE)
I don't really see how any uint32 value could produce such a pageno via
that macro.  Even if called in an environment where (xid) is accidentally
an int, the int / unsigned expression would convert it to unsigned first
(unless (xid) is a bigger type like int64_t: by the rules of int promotion
you'd get signed division in that case, hmm...).  But it's always called
with a MultiXactOffset AKA uint32 variable.

So via that route, there is no MultiXactOffset value that can't be mapped
to a segment in the range "", "14078".  Famously, it wraps after that.

Maybe the negative pageno came from somewhere else.  Where?  Inside SLRU
code we can see pageno = shared->page_number[slotno]... maybe the SLRU
slots got corrupted somehow?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-10 Thread Alvaro Herrera

Bernd Helmle wrote:

> 2015-09-08 11:40:59 CEST [27047] DETAIL:  Could not seek in file
> "pg_multixact/members/5FC4" to offset 4294950912: Invalid argument.
> 2015-09-08 11:40:59 CEST [27047] CONTEXT:  xlog redo create mxid 1068235595
> offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) 

I just noticed that this offset number 2147483648 is exactly 2^31.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-10 Thread Thomas Munro

On Fri, Sep 11, 2015 at 11:51 AM, Thomas Munro <
thomas.mu...@enterprisedb.com> wrote:

> On Fri, Sep 11, 2015 at 10:45 AM, Alvaro Herrera  > wrote:
>
>> Bernd Helmle wrote:
>> > A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11
>> > instance.
>> >
>> > The database crashed with the following log messages:
>> >
>> > 2015-09-08 00:49:16 CEST [2912] PANIC:  could not access status of
>> > transaction 1068235595
>> > 2015-09-08 00:49:16 CEST [2912] DETAIL:  Could not open file
>> > "pg_multixact/members/5FC4": No such file or directory.
>> > 2015-09-08 00:49:16 CEST [2912] STATEMENT:  delete from StockTransfer
>> > where oid = $1 and tanum = $2
>>
>> I wonder if these bogus page and offset numbers are just
>> SlruReportIOError being confused because pg_multixact/members is so
>> weird (I don't think it should be the case, since this stuff is using
>> page numbers only, not anything related to how each page is layed out).
>>
>
> But SlruReportIOError uses the same macro to build the filename as
> SlruReadPhysicalPage and other functions, namely SlruFileName which uses
> sprintf with %04X (unsigned integer uppercase hex) and gives it segno
> (which is always an int), so I don't think the problem is in error
> reporting only.
>
> Assuming default block size, to get 5FC4 from SlruFileName you need
> segno == -41020.
>

Oops, I meant to attach the proviso "Assuming default block size" to the
assumption further down that MULTIXACT_MEMBERS_PER_PAGE == 1636.


> We have int segno = pageno / 32 (that's SLRU_PAGES_PER_SEGMENT), so to get
> segno == -41020 you need pageno between -1312640 and -1312609 (whose bit
> patterns  reinterpreted as unsigned are 4293654656 and 4293654687).
>
> In various places we have int pageno = offset / (uint32) 1636, expanded
> from this macro (which calls the offset an xid):
>
> #define MXOffsetToMemberPage(xid) ((xid) / (TransactionId)
> MULTIXACT_MEMBERS_PER_PAGE)
> I don't really see how any uint32 value could produce such a pageno via
> that macro.  Even if called in an environment where (xid) is accidentally
> an int, the int / unsigned expression would convert it to unsigned first
> (unless (xid) is a bigger type like int64_t: by the rules of int promotion
> you'd get signed division in that case, hmm...).  But it's always called
> with a MultiXactOffset AKA uint32 variable.
>
> So via that route, there is no MultiXactOffset value that can't be mapped
> to a segment in the range "", "14078".  Famously, it wraps after that.
>
> Maybe the negative pageno came from somewhere else.  Where?  Inside SLRU
> code we can see pageno = shared->page_number[slotno]... maybe the SLRU
> slots got corrupted somehow?
>

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-10 Thread Alvaro Herrera

Bernd Helmle wrote:
> A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11
> instance.
> 

> 2015-09-08 11:40:59 CEST [27047] FATAL:  could not access status of
> transaction 1068235595
> 2015-09-08 11:40:59 CEST [27047] DETAIL:  Could not seek in file
> "pg_multixact/members/5FC4" to offset 4294950912: Invalid argument.
> 2015-09-08 11:40:59 CEST [27047] CONTEXT:  xlog redo create mxid 1068235595
> offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) 

I think the math to compute segment number and byte offset of the member
might be bogus here.  The file names in pg_multixact/members is supposed
to go to 14078 (hex) in a 8kB-BLCKSZ build, and of course it goes a bit
higher in builds with smaller page sizes, but nowhere as high as
5FC4.  And the offset is way too close to 2^32 (exactly 16384 less
than 2^32, to be precise.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.3.9 and pg_multixact corruption

2015-09-10 Thread Bernd Helmle

A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11
instance.

The database crashed with the following log messages:

2015-09-08 00:49:16 CEST [2912] PANIC:  could not access status of
transaction 1068235595
2015-09-08 00:49:16 CEST [2912] DETAIL:  Could not open file
"pg_multixact/members/5FC4": No such file or directory.
2015-09-08 00:49:16 CEST [2912] STATEMENT:  delete from StockTransfer
where oid = $1 and tanum = $2 

When they called us later, it turned out that the crash happened during a
base backup, leaving a backup_label behind which prevented the database
coming up again with a invalid checkpoint location. However, removing the
backup_label still didn't let the database through recovery, it failed
again with the former error, this time during recovery:

2015-09-08 11:40:04 CEST [27047] LOG:  database system was interrupted
while in recovery at 2015-09-08 11:19:44 CEST
2015-09-08 11:40:04 CEST [27047] HINT:  This probably means that some data
is corrupted and you will have to use the last backup for recovery.
2015-09-08 11:40:04 CEST [27047] LOG:  database system was not properly
shut down; automatic recovery in progress
2015-09-08 11:40:05 CEST [27047] LOG:  redo starts at 1A52/2313FEF8
2015-09-08 11:40:47 CEST [27082] FATAL:  the database system is starting up
2015-09-08 11:40:59 CEST [27047] FATAL:  could not access status of
transaction 1068235595
2015-09-08 11:40:59 CEST [27047] DETAIL:  Could not seek in file
"pg_multixact/members/5FC4" to offset 4294950912: Invalid argument.
2015-09-08 11:40:59 CEST [27047] CONTEXT:  xlog redo create mxid 1068235595
offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) 
2015-09-08 11:40:59 CEST [27045] LOG:  startup process (PID 27047) exited
with exit code 1
2015-09-08 11:40:59 CEST [27045] LOG:  aborting startup due to startup
process failure

Some side notes:

An additional recovery from a base backup and archive recovery yield to the
same error, as soon as the affected tuple was touched with a DELETE. The
affected table was fully dumpable via pg_dump, though.

We also have a core dump, but no direct access to the machine. If there's
more information  required (and i believe it is), let me know where to dig
deeper. I also would like to request a backtrace from the existing core
dump, but in the absence of a sparc64 machine here we need to ask the
customer to get one.

-- 
Thanks

Bernd


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

Re: [HACKERS] 9.3.9 and pg_multixact corruption

[HACKERS] 9.3.9 and pg_multixact corruption

20 matches

Site Navigation

Mail list logo

Footer information