Re: postgres on physical replica crashes

SRINIVASARAO OGURI Mon, 07 May 2018 03:32:45 -0700

Hi Greig Wise,

If you are using CentOS/REDHAT - 07 , check this link "
https://srinivasoguri.blogspot.in/2018/04/postgresql-crash-in-centosredhat-07.html
"


On Fri, Apr 20, 2018 at 6:58 PM, Hannes Erven <[email protected]> wrote:

> Hi Greig,
>
>
> just last week I experienced the same situation as you on a 10.3 physical
> replica (it even has checksums activated), and a few months ago on 9.6 .
> We used the same resolution as you we, and so far we haven't noticed any
> problems with data integrity on the replicas.
>
>
>
> The logs were as follows:
> 2018-04-13 06:31:16.947 CEST [15603] FATAL:  WAL-Receiver-Prozess wird
> abgebrochen wegen Zeitüberschreitung
> 2018-04-13 06:31:16.948 CEST [15213] FATAL:  invalid memory alloc request
> size 4280303616
> 2018-04-13 06:31:16.959 CEST [15212] LOG:  Startprozess (PID 15213)
> beendete mit Status 1
> 2018-04-13 06:31:16.959 CEST [15212] LOG:  aktive Serverprozesse werden
> abgebrochen
> 2018-04-13 06:31:16.959 CEST [19838] user@db WARNUNG:  Verbindung wird
> abgebrochen wegen Absturz eines anderen Serverprozesses
> 2018-04-13 06:31:16.959 CEST [19838] user@db DETAIL:  Der Postmaster hat
> diesen Serverprozess angewiesen, die aktuelle Transaktion zurückzurollen
> und die Sitzung zu beenden, weil ein anderer Serverprozess abnormal beendet
> wurde und möglicherweise das Shared Memory verfälscht hat.
> 2018-04-13 06:31:16.959 CEST [19838] user@db TIPP:  In einem Moment
> sollten Sie wieder mit der Datenbank verbinden und Ihren Befehl wiederholen
> können.
>
>
> This replica then refused to start up:
> 2018-04-13 09:25:15.941 CEST [1957] LOG:  Standby-Modus eingeschaltet
> 2018-04-13 09:25:15.947 CEST [1957] LOG:  Redo beginnt bei 1C/69C0FF30
> 2018-04-13 09:25:15.951 CEST [1957] LOG:  konsistenter
> Wiederherstellungszustand erreicht bei 1C/69D9A9C0
> 2018-04-13 09:25:15.952 CEST [1956] LOG:  Datenbanksystem ist bereit, um
> lesende Verbindungen anzunehmen
> 2018-04-13 09:25:15.953 CEST [1957] FATAL:  invalid memory alloc request
> size 4280303616
> 2018-04-13 09:25:15.954 CEST [1956] LOG:  Startprozess (PID 1957) beendete
> mit Status 1
>
>
> ... until the WAL files from the hot standby's pg_wal were manually
> removed and re-downloaded from the primary.
>
> Unfortunately I did not collect hard evidence, but I think I saw the
> primary's replication slot's restart point was set to a position /after/
> the standby's actual restart location. This time, the error was noticed
> immediately and the required WAL was still present on the master.
>
>
> A few months ago I experienced the same situation on a 9.6 cluster, but
> that was not noticed for a long time, and - despite using a replication
> slot! - the primary had already removed required segments. Fortunately I
> could get them from a tape backup...
>
>
>
> Best regards,
>
>         -hannes
>
>
>
>
> Am 2018-04-18 um 18:16 schrieb greigwise:
>
>> Hello.  I've had several instances where postgres on my physical replica
>> under version 9.6.6 is crashing with messages like the following in the
>> logs:
>>
>> 2018-04-18 05:43:26 UTC dbname 5acf5e4a.6918 dbuser DETAIL:  The
>> postmaster
>> has commanded this server process to roll back the current transaction and
>> exit, because another server process exited abnormally and possibly
>> corrupted shared memory.
>> 2018-04-18 05:43:26 UTC dbname 5acf5e4a.6918 dbuser HINT:  In a moment you
>> should be able to reconnect to the database and repeat your command.
>> 2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser WARNING:  terminating
>> connection because of crash of another server process
>> 2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser DETAIL:  The
>> postmaster
>> has commanded this server process to roll back the current transaction and
>> exit, because another server process exited abnormally and possibly
>> corrupted shared memory.
>> 2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser HINT:  In a moment you
>> should be able to reconnect to the database and repeat your command.
>> 2018-04-18 05:43:27 UTC  5acf5e12.6819  LOG:  database system is shut down
>>
>> When this happens, what I've found is that I can go into the pg_xlog
>> directory on the replica, remove all the log files and the postgres will
>> restart and things seem to come back up normally.
>>
>> So, the question is what's going on here... is the log maybe getting
>> corrupt
>> in transmission somehow?  Should I be concerned about the viability of my
>> replica after having restarted in the described fashion?
>>
>> Thanks,
>> Greig Wise
>>
>>
>>
>> --
>> Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f184378
>> 0.html
>>
>>
>
>

Re: postgres on physical replica crashes

Reply via email to