Hi,

On 8/11/23 14:05, Merlin Moncure wrote:
On Thu, Jul 27, 2023 at 8:28 AM David Geier <geidav...@gmail.com> wrote:

    Hi,

    On 6/7/23 23:37, Andres Freund wrote:
    > I think we're starting to hit quite a few limits related to the
    process model,
    > particularly on bigger machines. The overhead of cross-process
    context
    > switches is inherently higher than switching between threads in
    the same
    > process - and my suspicion is that that overhead will continue to
    > increase. Once you have a significant number of connections we
    end up spending
    > a *lot* of time in TLB misses, and that's inherent to the
    process model,
    > because you can't share the TLB across processes.

    Another problem I haven't seen mentioned yet is the excessive kernel
    memory usage because every process has its own set of page table
    entries
    (PTEs). Without huge pages the amount of wasted memory can be huge if
    shared buffers are big.


Hm, noted this upthread, but asking again, does this help/benefit interactions with the operating system make oom kill situations less likely?   These things are the bane of my existence, and I'm having a hard time finding a solution that prevents them other than running pgbouncer and lowering max_connections, which adds complexity.  I suspect I'm not the only one dealing with this.   What's really scary about these situations is they come without warning.  Here's a pretty typical example per sar -r.

The conjecture here is that lots of idle connections make the server appear to have less memory available than it looks, and sudden transient demands can cause it to destabilize.

It does in the sense that your server will have more memory available in case you have many long living connections around. Every connection has less kernel memory overhead if you will. Of course even then a runaway query will be able to invoke the OOM killer. The unfortunate thing with the OOM killer is that, in my experience, it often kills the checkpointer. That's because the checkpointer will touch all of shared buffers over time which makes it likely to get selected by the OOM killer. Have you tried disabling memory overcommit?

--
David Geier
(ServiceNow)



Reply via email to