Hi Greig Wise, If you are using CentOS/REDHAT - 07 , check this link " https://srinivasoguri.blogspot.in/2018/04/postgresql-crash-in-centosredhat-07.html "
On Fri, Apr 20, 2018 at 6:58 PM, Hannes Erven <han...@erven.at> wrote: > Hi Greig, > > > just last week I experienced the same situation as you on a 10.3 physical > replica (it even has checksums activated), and a few months ago on 9.6 . > We used the same resolution as you we, and so far we haven't noticed any > problems with data integrity on the replicas. > > > > The logs were as follows: > 2018-04-13 06:31:16.947 CEST [15603] FATAL: WAL-Receiver-Prozess wird > abgebrochen wegen Zeitüberschreitung > 2018-04-13 06:31:16.948 CEST [15213] FATAL: invalid memory alloc request > size 4280303616 > 2018-04-13 06:31:16.959 CEST [15212] LOG: Startprozess (PID 15213) > beendete mit Status 1 > 2018-04-13 06:31:16.959 CEST [15212] LOG: aktive Serverprozesse werden > abgebrochen > 2018-04-13 06:31:16.959 CEST [19838] user@db WARNUNG: Verbindung wird > abgebrochen wegen Absturz eines anderen Serverprozesses > 2018-04-13 06:31:16.959 CEST [19838] user@db DETAIL: Der Postmaster hat > diesen Serverprozess angewiesen, die aktuelle Transaktion zurückzurollen > und die Sitzung zu beenden, weil ein anderer Serverprozess abnormal beendet > wurde und möglicherweise das Shared Memory verfälscht hat. > 2018-04-13 06:31:16.959 CEST [19838] user@db TIPP: In einem Moment > sollten Sie wieder mit der Datenbank verbinden und Ihren Befehl wiederholen > können. > > > This replica then refused to start up: > 2018-04-13 09:25:15.941 CEST [1957] LOG: Standby-Modus eingeschaltet > 2018-04-13 09:25:15.947 CEST [1957] LOG: Redo beginnt bei 1C/69C0FF30 > 2018-04-13 09:25:15.951 CEST [1957] LOG: konsistenter > Wiederherstellungszustand erreicht bei 1C/69D9A9C0 > 2018-04-13 09:25:15.952 CEST [1956] LOG: Datenbanksystem ist bereit, um > lesende Verbindungen anzunehmen > 2018-04-13 09:25:15.953 CEST [1957] FATAL: invalid memory alloc request > size 4280303616 > 2018-04-13 09:25:15.954 CEST [1956] LOG: Startprozess (PID 1957) beendete > mit Status 1 > > > ... until the WAL files from the hot standby's pg_wal were manually > removed and re-downloaded from the primary. > > Unfortunately I did not collect hard evidence, but I think I saw the > primary's replication slot's restart point was set to a position /after/ > the standby's actual restart location. This time, the error was noticed > immediately and the required WAL was still present on the master. > > > A few months ago I experienced the same situation on a 9.6 cluster, but > that was not noticed for a long time, and - despite using a replication > slot! - the primary had already removed required segments. Fortunately I > could get them from a tape backup... > > > > Best regards, > > -hannes > > > > > Am 2018-04-18 um 18:16 schrieb greigwise: > >> Hello. I've had several instances where postgres on my physical replica >> under version 9.6.6 is crashing with messages like the following in the >> logs: >> >> 2018-04-18 05:43:26 UTC dbname 5acf5e4a.6918 dbuser DETAIL: The >> postmaster >> has commanded this server process to roll back the current transaction and >> exit, because another server process exited abnormally and possibly >> corrupted shared memory. >> 2018-04-18 05:43:26 UTC dbname 5acf5e4a.6918 dbuser HINT: In a moment you >> should be able to reconnect to the database and repeat your command. >> 2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser WARNING: terminating >> connection because of crash of another server process >> 2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser DETAIL: The >> postmaster >> has commanded this server process to roll back the current transaction and >> exit, because another server process exited abnormally and possibly >> corrupted shared memory. >> 2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser HINT: In a moment you >> should be able to reconnect to the database and repeat your command. >> 2018-04-18 05:43:27 UTC 5acf5e12.6819 LOG: database system is shut down >> >> When this happens, what I've found is that I can go into the pg_xlog >> directory on the replica, remove all the log files and the postgres will >> restart and things seem to come back up normally. >> >> So, the question is what's going on here... is the log maybe getting >> corrupt >> in transmission somehow? Should I be concerned about the viability of my >> replica after having restarted in the described fashion? >> >> Thanks, >> Greig Wise >> >> >> >> -- >> Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f184378 >> 0.html >> >> > >