Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet

Frits Hoogland Wed, 06 Aug 2025 01:47:58 -0700

Joe, 

Can you name any technical reason why not having swap for a database is an 
actual bad idea?

Memory always is limited. Swap was invented to overcome a situation where the 
(incidental) memory usage of paged in memory was could (regularly) get higher 
than physical memory would allow, and thus have the (clear) workaround of 
having swap to 'cushion' the memory shortage issue by allowing a "second level" 
memory storage on disk.
Still, this does not making memory unlimited. Swap extends the physical memory 
available with the amount of swap. There still is a situation where you can run 
out of memory when swap is added, simply by paging in more memory than physical 
memory and swap.

Today, most systems are not memory constrained anymore, or: it is possible to 
get a server with enough physical memory to hold your common needed total 
memory need. 
And given the latency sensitive nature of databases in general, which includes 
postgres, for any serious deployment you should get a server with enough memory 
to host your workload, and configure postgres not to overload the memory.

If you do oversubscribe on (physical) memory, you will get pain somewhere, 
there is no way around that.
The article in defense of swap in essence is saying that if you happen to 
oversubscribe on memory, sharing the pain between anonymous and file is better.
I would say you are already in a bad place if that happens, which is especially 
bad for databases, and databases should allow you to make memory usage 
predictable.

However, what I found is that with 4+ kernels (4.18 to be precise; rhel 8), the 
kernel can try to favour file pages in certain situations making anonymous 
memory getting paged out even if swappiness is set to 1 or 0, and if there is a 
wealth of inactive file memory. It seems to have to do with workingset 
protection(?) mechanisms, but given the lack of clear statistics I can't be 
sure about that. What it does lead to in my situations is a constant rate of 
swapping in and out in certain situations, whilst there is no technical reason 
for linux to do so because there is enough available memory.

My point of view has been that vm.overcommit_memory set to 2 was the way to go, 
because that allows linux to limit based on a set limit on allocation time, 
which guarantees way to make the database never run out of memory.
it does guarantees linux to never run out of memory, absolutely.
However, this limit is hard, and is applied for the process at both usermode 
and system mode (kernel level), and thus can enforce not providing memory at 
times where it's not safe to do so, and thus corrupt execution. I have to be 
honest, I have not seen this myself, but trustworthy sources have reported this 
repeatedly, which I am inclined to believe. This means postgres execution can 
corrupt/terminate in unlucky situations, which is impacts availability.

Frits Hoogland

> On 5 Aug 2025, at 20:52, Joe Conway <[email protected]> wrote:
> 
> On 8/5/25 13:01, Priya V wrote:
>> *Environment:*
>>    *PostgreSQL Versions:* Mix of 13.13 and 15.12 (upgrades in progress
>>    to be at 15.12 currently both are actively in use)
> 
> PostgreSQL 13 end of life after November 13, 2025
> 
>>    *OS / Kernel:* RHEL 7 & RHEL 8 variants, kernels in the 4.14–4.18 range
> 
> RHEL 7 has been EOL for quite a while now. Note that you have to watch out 
> for collation issues/corrupted indexes after OS upgrades due to collations 
> changing with newer glibc versions.
> 
>>    *Swap:* Currently none
> 
> bad idea
> 
>>    *Workload:* Highly mixed — OLTP-style internal apps with
>>    unpredictable query patterns and connection counts
>>    *Goal:* Uniform, safe memory settings across the fleet to avoid
>>    kernel or database instability
> 
>> We’re considering:
>>    *|vm.overcommit_memory = 2|* for strict accounting
> 
> yes
> 
>>    Increasing |vm.overcommit_ratio| from 50 → 80 or 90 to better
>>    reflect actual PostgreSQL usage (e.g., |work_mem| reservations that
>>    aren’t fully used)
> 
> work_mem does not reserve memory -- it is a maximum that might be used in 
> memory for a particular operation
> 
>> *Our questions for those running large PostgreSQL fleets:*
>> 1.
>>    What |overcommit_ratio| do you find safe for PostgreSQL without
>>    causing kernel memory crunches?
> 
> Read this:
> https://www.cybertec-postgresql.com/en/what-you-should-know-about-linux-memory-overcommit-in-postgresql/
> 
>> 2.
>>    Do you prefer |overcommit_memory = 1| or |= 2| for production stability?
> 
> Use overcommit_memory = 2 for production stability
> 
>> 3.
>>    How much swap (if any) do you keep in large-memory servers where
>>    PostgreSQL is the primary workload? Is having swap configured a good
>>    idea or not ?
> 
> You don't necessary need a large amount of swap, but you definitely should 
> not disable it.
> 
> Some background on that:
> https://chrisdown.name/2018/01/02/in-defence-of-swap.html
> 
>> 4.
>>    Any real-world cases where kernel accounting was too strict or too
>>    loose for PostgreSQL?
> 
> In my experience the biggest issues are when postgres is running in a memory 
> constrained cgroup. If you want to constrain memory with cgroups, use cgroup 
> v2 (not 1) and use memory.high to constrain it, not memory.max.
> 
>> 5. What settings to go with if we are not planning on using swap ?
> 
> IMHO do not disable swap on Linux, at least not on production, ever.
> 
>> We’d like to avoid both extremes:
>>    Too low a ratio → PostgreSQL backends failing allocations even with
>>    free RAM
> 
> Have you actually seen this or are you theorizing?
> 
>>    Too high a ratio → OOM killer terminating PostgreSQL under load spikes
> 
> If overcommit_memory = 2, overcommit_ratio is reasonable (less than 100, 
> maybe 80 or so as you suggested), and swap is not disabled, and you are not 
> running in a memory constrained cgroup, I would be very surprised if you will 
> ever get hit by the OOM killer. And if you do, things are so bad the database 
> was probably dying anyway.
> 
> HTH,
> 
> -- 
> Joe Conway
> PostgreSQL Contributors Team
> Amazon Web Services: https://aws.amazon.com
> 
>

Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet

Reply via email to