Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-10-04 Thread Noah Misch
On Mon, Sep 28, 2015 at 11:10:52AM -0400, Robert Haas wrote: > On Fri, Sep 25, 2015 at 3:41 AM, Andreas Seltenreich > wrote: > > OTOH, a unit test for multixact.c that exercises the code including > > wraparounds sounds like a desirable thing regardless of the

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-28 Thread Josh Berkus
On 09/28/2015 08:10 AM, Robert Haas wrote: > -1 on that idea. I really don't think that we should categorically > decide we don't support higher optimization levels. If the compiler > has a bug, then the compiler manufacturer should fix it, and it's not > our fault. If the compiler doesn't have

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-28 Thread Robert Haas
On Mon, Sep 28, 2015 at 2:34 PM, Josh Berkus wrote: > On 09/28/2015 08:10 AM, Robert Haas wrote: >> -1 on that idea. I really don't think that we should categorically >> decide we don't support higher optimization levels. If the compiler >> has a bug, then the compiler

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-28 Thread Robert Haas
On Fri, Sep 25, 2015 at 3:41 AM, Andreas Seltenreich wrote: > I think the intention was to make configure complain if there's a -O > 2 > in CFLAGS. -1 on that idea. I really don't think that we should categorically decide we don't support higher optimization

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-25 Thread Bjorn Munch
On 25/09 09.37, Andreas Seltenreich wrote: > [ adding Bjorn Munch to Cc ] Oh. I am on the -hackers list but usually just scan for any subject mentioning Solaris and this one did not. :-) > Jim Nasby writes: > > On 9/20/15 9:23 AM, Christoph Berg wrote: > >> a short update here: the customer

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-25 Thread Andreas Seltenreich
[ adding Bjorn Munch to Cc ] Jim Nasby writes: > On 9/20/15 9:23 AM, Christoph Berg wrote: >> a short update here: the customer updated the compiler to a newer >> version, is now compiling using -O2 instead of -O3, and the code >> generated now looks sane, so this turned out to be a compiler

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-25 Thread Andreas Seltenreich
Alvaro Herrera writes: > Jim Nasby wrote: >> Do we officially not support anything > -O2? If so it'd be nice if configure >> threw at least a warning (if not an error that you had to explicitly >> over-ride). > > Keep in mind this is Sun OS C -- not one of the most popular compilers > in the

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-24 Thread Jim Nasby
On 9/20/15 9:23 AM, Christoph Berg wrote: a short update here: the customer updated the compiler to a newer version, is now compiling using -O2 instead of -O3, and the code generated now looks sane, so this turned out to be a compiler issue. (Though it's unclear if the upgrade fixed it, or the

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-24 Thread Alvaro Herrera
Jim Nasby wrote: > On 9/20/15 9:23 AM, Christoph Berg wrote: > >a short update here: the customer updated the compiler to a newer > >version, is now compiling using -O2 instead of -O3, and the code > >generated now looks sane, so this turned out to be a compiler issue. > >(Though it's unclear if

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-20 Thread Christoph Berg
Re: Andreas Seltenreich 2015-09-13 <87si6i1875@credativ.de> > I managed disassemble RecordNewMultiXact from the core dump using a > cross-binutils, and it reveals that the compiler[1] appears to have > indeed generated a signed division here. I'm attaching a piece of C > code that does the

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-13 Thread Andreas Seltenreich
Thomas Munro writes: > In various places we have int pageno = offset / (uint32) 1636, expanded > from this macro (which calls the offset an xid): It appears to depend on the context it is expanded in, as some of the code must have gotten the segment number right: ,[ ls -sh

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-11 Thread Bernd Helmle
...and here it is ;) --On 10. September 2015 19:45:46 -0300 Alvaro Herrera wrote: > Anyway, can you please request pg_controldata to be run on the failed > cluster and paste it here? pg_control version number:937 Catalog version number:

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-11 Thread Christoph Berg
Re: Bernd Helmle 2015-09-10 <7E3C7F8D210AC9A423E96F3A@eje.local> > 2015-09-08 11:40:59 CEST [27047] DETAIL: Could not seek in file > "pg_multixact/members/5FC4" to offset 4294950912: Invalid argument. > 2015-09-08 11:40:59 CEST [27047] CONTEXT: xlog redo create mxid 1068235595 > offset

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-11 Thread Andres Freund
On 2015-09-11 14:25:39 +0200, Christoph Berg wrote: > A few more words here: the archive recovery was a pitr to 00:45, so > well before the problem, and the cluster was initially working well, > but crashed shortly after with the same mxid 1068235595 message. The > crash was triggered from a

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-10 Thread Alvaro Herrera
Bernd Helmle wrote: > A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11 > instance. > > The database crashed with the following log messages: > > 2015-09-08 00:49:16 CEST [2912] PANIC: could not access status of > transaction 1068235595 > 2015-09-08 00:49:16 CEST [2912]

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-10 Thread Thomas Munro
On Fri, Sep 11, 2015 at 10:45 AM, Alvaro Herrera wrote: > Bernd Helmle wrote: > > A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11 > > instance. > > > > The database crashed with the following log messages: > > > > 2015-09-08 00:49:16 CEST [2912]

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-10 Thread Alvaro Herrera
Bernd Helmle wrote: > 2015-09-08 11:40:59 CEST [27047] DETAIL: Could not seek in file > "pg_multixact/members/5FC4" to offset 4294950912: Invalid argument. > 2015-09-08 11:40:59 CEST [27047] CONTEXT: xlog redo create mxid 1068235595 > offset 2147483648 nmembers 2: 2896635220 (upd)

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-10 Thread Thomas Munro
On Fri, Sep 11, 2015 at 11:51 AM, Thomas Munro < thomas.mu...@enterprisedb.com> wrote: > On Fri, Sep 11, 2015 at 10:45 AM, Alvaro Herrera > wrote: > >> Bernd Helmle wrote: >> > A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11 >> > instance. >> >

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-10 Thread Alvaro Herrera
Bernd Helmle wrote: > A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11 > instance. > > 2015-09-08 11:40:59 CEST [27047] FATAL: could not access status of > transaction 1068235595 > 2015-09-08 11:40:59 CEST [27047] DETAIL: Could not seek in file >

[HACKERS] 9.3.9 and pg_multixact corruption

2015-09-10 Thread Bernd Helmle
A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11 instance. The database crashed with the following log messages: 2015-09-08 00:49:16 CEST [2912] PANIC: could not access status of transaction 1068235595 2015-09-08 00:49:16 CEST [2912] DETAIL: Could not open file