Thank you Adrian for the reply, I did check the postgres processes running around the time when OOM was invoked, there were lots of high CPU consuming postgres processes running long running selects. I am not sure of how to interpret the memory terms appearing in linux dmeg or /var/log/messages but I can see out of memory happened and Postmaster invoked OOM.
Regards Vikas Sharma On Tue, 12 Feb 2019 at 16:39, Adrian Klaver <adrian.kla...@aklaver.com> wrote: > On 2/12/19 8:20 AM, Vikas Sharma wrote: > > Hello All, > > > > I have a 4 node PostgreSQL 9.6 cluster with streaming replication. we > > encounter today the Out of Memory Error on the Master which resulted in > > All postres processes restarted and cluster recovered itself. Please > > let me know the best way to diagnose this issue. > > For a start look back further in the Postgres log then the below. What > is shown below is the effects of the OOM killer. What you need to look > for is the statement that caused Postgres memory to increase to the > point that the OOM killer was invoked. > > > > > > > > > The error seen in the postgresql log: > > > > 2019-02-12 10:55:17 GMT LOG: terminating any other active server > processes > > 2019-02-12 10:55:17 GMT WARNING: terminating connection because of > > crash of another server process > > 2019-02-12 10:55:17 GMT DETAIL: The postmaster has commanded this > > server process to roll back the current transaction and exit, because > > another server process exited abnormally and possibly corrupted shared > > memory. > > 2019-02-12 10:55:17 GMT HINT: In a moment you should be able to > > reconnect to the database and repeat your command. > > 2019-02-12 10:55:17 GMT WARNING: terminating connection because of > > crash of another server process > > 2019-02-12 10:55:17 GMT DETAIL: The postmaster has commanded this > > server process to roll back the current transaction and exit, because > > another server process exited abnormally and possibly corrupted shared > > memory. > > 2019-02-12 10:55:17 GMT HINT: In a moment you should be able to > > reconnect to the database and repeat your command. > > 2019-02-12 10:55:17 GMT WARNING: terminating connection because of > > crash of another server process > > ----- > > > > Error from dmesg on linux: > > ----------------------------------- > > [4331093.885622] Out of memory: Kill process nnnnn (postmaster) score nn > > or sacrifice child > > [4331093.890225] Killed process nnnnn (postmaster) total-vm:18905944kB, > > anon-rss:1747460kB, file-rss:4kB, shmem-rss:838220kB > > > > Thanks & Best Regards > > Vikas Sharma > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >