[GENERAL] standby database crash

2017-08-03 Thread Murtuza Zabuawala
++ Forwarding to pgsql-general group.

-- Forwarded message --
From: Seong Son (US) 
Date: Thu, Aug 3, 2017 at 12:48 AM
Subject: standby database crash
To: "pgadmin-supp...@lists.postgresql.org" <
pgadmin-supp...@lists.postgresql.org>


Hello,



I’ve posted this to the legacy list but learned that there’s a new list so
here it is.



I have a client who has streaming replication setup with the primary in one
city and standby in another city both identical servers with Postgresql 9.6
on Windows Server 2012.



They have some network issues, which is causing the connection from the
primary to standby to drop sometimes.  And recently standby crashed with
the following log.  And it could not be restarted.



2017-07-18 09:21:13 UTC FATAL:  invalid memory alloc request size 4148830208

2017-07-18 09:21:14 UTC LOG:  startup process (PID 5608) exited with exit
code 1

2017-07-18 09:21:14 UTC LOG:  terminating any other active server processes

2017-07-18 09:21:14 UTC LOG:  database system is shut down



Last entry from the pg_xlogdump shows the following



pg_xlogdump: FATAL:  error in WAL record at D5/D1BD5FD0:
unexpected pageaddr D1/E7BD6000 in log segment 00D500D1,
offset 12410880



So my questions are, could an old WAL segment being resent through the
network cause crash like this?  Shouldn’t Postgresql be able to handle out
of order WAL segments instead of just crashing?



And what would be the best way to recover the standby server?  Resynching
the entire database seems to be too time consuming.



Thanks in advance for any info.



-Seong


Re: [GENERAL] standby database crash

2017-08-01 Thread Michael Paquier
On Mon, Jul 31, 2017 at 11:15 PM, Seong Son (US)  wrote:
> So my questions are, could an old WAL segment being resent through the
> network cause crash like this?  Shouldn’t Postgresql be able to handle out
> of order WAL segments instead of just crashing?

When the streaming connection between a standby and a primary is cut,
the WAL receiver would restart and try to stream from the beginning of
the last segment it was in the middle of. See RequestXLogStreaming in
walreceiverfuncs.c.

> And what would be the best way to recover the standby server?  Resynching
> the entire database seems to be too time consuming.

You may want to check the validity of the so-said WAL segment as well.
Corrupted data could come from it.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] standby database crash

2017-07-31 Thread Seong Son (US)
I have a client who has streaming replication setup with the primary in one 
city and standby in another city both identical servers with Postgresql 9.6 on 
Windows Server 2012.

They have some network issues, which is causing the connection from the primary 
to standby to drop sometimes.  And recently standby crashed with the following 
log.  And it could not be restarted.

2017-07-18 09:21:13 UTC FATAL:  invalid memory alloc request size 4148830208
2017-07-18 09:21:14 UTC LOG:  startup process (PID 5608) exited with exit code 1
2017-07-18 09:21:14 UTC LOG:  terminating any other active server processes
2017-07-18 09:21:14 UTC LOG:  database system is shut down

Last entry from the pg_xlogdump shows the following

pg_xlogdump: FATAL:  error in WAL record at D5/D1BD5FD0: 
unexpected pageaddr D1/E7BD6000 in log segment 00D500D1, offset 
12410880

So my questions are, could an old WAL segment being resent through the network 
cause crash like this?  Shouldn't Postgresql be able to handle out of order WAL 
segments instead of just crashing?

And what would be the best way to recover the standby server?  Resynching the 
entire database seems to be too time consuming.

Thanks in advance for any info.

-Seong