Re: What to do when dynamic shared memory control segment is corrupt

2018-06-19 Thread Sherrylyn Branchaw
Yeah, I'd like to know that too. The complaint about corrupt shared memory may be just an unrelated red herring, or it might be a separate effect of whatever the primary failure was ... but I think it was likely not the direct cause of the failure-to-restart. Anyway, I would not be afraid to try

Re: What to do when dynamic shared memory control segment is corrupt

2018-06-19 Thread Alban Hertroys
> On 18 Jun 2018, at 17:34, Sherrylyn Branchaw wrote: > > In the other case, the logs recorded > > LOG: all server processes terminated; reinitializing > LOG: dynamic shared memory control segment is corrupt > LOG: incomplete data in "postmaster.pid": found only 1 newlines while trying > t

Re: What to do when dynamic shared memory control segment is corrupt

2018-06-18 Thread Tom Lane
Sherrylyn Branchaw writes: >> Hm ... were these installations built with --enable-cassert? If not, >> an abort trap seems pretty odd. > The packages are installed directly from the yum repos for RHEL. I'm not > aware that --enable-cassert is being used, and we're certainly not > installing from

Re: What to do when dynamic shared memory control segment is corrupt

2018-06-18 Thread Sherrylyn Branchaw
> Hm ... were these installations built with --enable-cassert? If not, > an abort trap seems pretty odd. The packages are installed directly from the yum repos for RHEL. I'm not aware that --enable-cassert is being used, and we're certainly not installing from source. > Those "incomplete data" m

Re: What to do when dynamic shared memory control segment is corrupt

2018-06-18 Thread Peter Geoghegan
On Mon, Jun 18, 2018 at 1:03 PM, Tom Lane wrote: > Hm, I supposed that Sherrylyn would've noticed any PANIC entries in > the log. The TRAP message from an assertion failure could've escaped > notice though, even assuming that her logging setup captured it. Unhandled C++ exceptions end up calling

Re: What to do when dynamic shared memory control segment is corrupt

2018-06-18 Thread Tom Lane
Andres Freund writes: > On 2018-06-18 12:30:13 -0400, Tom Lane wrote: >> Sherrylyn Branchaw writes: >>> LOG: server process (PID 138529) was terminated by signal 6: Aborted >> Hm ... were these installations built with --enable-cassert? If not, >> an abort trap seems pretty odd. > PANIC does

Re: What to do when dynamic shared memory control segment is corrupt

2018-06-18 Thread Andres Freund
On 2018-06-18 12:30:13 -0400, Tom Lane wrote: > Sherrylyn Branchaw writes: > > We are using Postgres 9.6.8 (planning to upgrade to 9.6.9 soon) on RHEL 6.9. > > We recently experienced two similar outages on two different prod > > databases. The error messages from the logs were as follows: > > LOG

Re: What to do when dynamic shared memory control segment is corrupt

2018-06-18 Thread Tom Lane
Sherrylyn Branchaw writes: > We are using Postgres 9.6.8 (planning to upgrade to 9.6.9 soon) on RHEL 6.9. > We recently experienced two similar outages on two different prod > databases. The error messages from the logs were as follows: > LOG: server process (PID 138529) was terminated by signal