Re: Properly handle OOM death?

Jeffrey Walton Mon, 13 Mar 2023 12:37:11 -0700

On Mon, Mar 13, 2023 at 1:21 PM Israel Brewster <ijbrews...@alaska.edu> wrote:
>
> I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more 
> memory constrained than I would like, such that every week or so the various 
> processes running on the machine will align badly and the OOM killer will 
> kick in, killing off postgresql, as per the following journalctl output:
>
> Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process 
> of this unit has been killed by the OOM killer.
> Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with 
> result 'oom-kill'.
> Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d 
> 17h 48min 24.509s CPU time.
>
> And the service is no longer running.
>
> When this happens, I go in and restart the postgresql service, and everything 
> is happy again for the next week or two.
>
> Obviously this is not a good situation. Which leads to two questions:
>
> 1) is there some tweaking I can do in the postgresql config itself to prevent 
> the situation from occurring in the first place?
> 2) My first thought was to simply have systemd restart postgresql whenever it 
> is killed like this, which is easy enough. Then I looked at the default unit 
> file, and found these lines:
>
> # prevent OOM killer from choosing the postmaster (individual backends will
> # reset the score to 0)
> OOMScoreAdjust=-900
> # restarting automatically will prevent "pg_ctlcluster ... stop" from working,
> # so we disable it here. Also, the postmaster will restart by itself on most
> # problems anyway, so it is questionable if one wants to enable external
> # automatic restarts.
> #Restart=on-failure
>
> Which seems to imply that the OOM killer should only be killing off 
> individual backends, not the entire cluster to begin with - which should be 
> fine. And also that adding the restart=on-failure option is probably not the 
> greatest idea. Which makes me wonder what is really going on?
>


Related, we (a FOSS project) used to have a Linux server with a LAMP
stack on GoDaddy. The machine provided a website and wiki. It was very
low-end. I think it had 512MB or 1 GB RAM and no swap file. And no way
to enable a swap file (part of an upsell). We paid about $2 a month
for it.

MySQL was killed several times a week. It corrupted the database on a
regular basis. We had to run the database repair tools daily. We
eventually switched to Ionos for hosting. We got a VM with more memory
and a swap file for about $5 a month. No more OOM kills.

If possible, you might want to add more memory (or a swap file) to the
machine. It will help sidestep the OOM problem.

You can also add vm.overcommit_memory = 2 to stop Linux from
oversubscribing memory. The machine will act like a Solaris box rather
than a Linux box (which takes some getting used to). Also see
https://serverfault.com/questions/606185/how-does-vm-overcommit-memory-work
.

Jeff

Re: Properly handle OOM death?

Reply via email to