On Thu, Jul 27, 2023 at 8:28 AM David Geier <geidav...@gmail.com> wrote:

> Hi,
>
> On 6/7/23 23:37, Andres Freund wrote:
> > I think we're starting to hit quite a few limits related to the process
> model,
> > particularly on bigger machines. The overhead of cross-process context
> > switches is inherently higher than switching between threads in the same
> > process - and my suspicion is that that overhead will continue to
> > increase. Once you have a significant number of connections we end up
> spending
> > a *lot* of time in TLB misses, and that's inherent to the process model,
> > because you can't share the TLB across processes.
>
> Another problem I haven't seen mentioned yet is the excessive kernel
> memory usage because every process has its own set of page table entries
> (PTEs). Without huge pages the amount of wasted memory can be huge if
> shared buffers are big.


Hm, noted this upthread, but asking again, does this
help/benefit interactions with the operating system make oom kill
situations less likely?   These things are the bane of my existence, and
I'm having a hard time finding a solution that prevents them other than
running pgbouncer and lowering max_connections, which adds complexity.  I
suspect I'm not the only one dealing with this.   What's really scary about
these situations is they come without warning.  Here's a pretty typical
example per sar -r.

             kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit
%commit  kbactive   kbinact   kbdirty
 14:20:02       461612  15803476     97.16         0  11120280  12346980
  60.35  10017820   4806356       220
 14:30:01       378244  15886844     97.67         0  11239012  12296276
  60.10  10003540   4909180       240
 14:40:01       308632  15956456     98.10         0  11329516  12295892
  60.10  10015044   4981784       200
 14:50:01       458956  15806132     97.18         0  11383484  12101652
  59.15   9853612   5019916       112
 15:00:01     10592736   5672352     34.87         0   4446852   8378324
  40.95   1602532   3473020       264   <-- reboot!
 15:10:01      9151160   7113928     43.74         0   5298184   8968316
  43.83   2714936   3725092       124
 15:20:01      8629464   7635624     46.94         0   6016936   8777028
  42.90   2881044   4102888       148
 15:30:01      8467884   7797204     47.94         0   6285856   8653908
  42.30   2830572   4323292       436
 15:40:02      8077480   8187608     50.34         0   6828240   8482972
  41.46   2885416   4671620       320
 15:50:01      7683504   8581584     52.76         0   7226132   8511932
  41.60   2998752   4958880       308
 16:00:01      7239068   9026020     55.49         0   7649948   8496764
  41.53   3032140   5358388       232
 16:10:01      7030208   9234880     56.78         0   7899512   8461588
  41.36   3108692   5492296       216

Triggering query was heavy (maybe even runaway), server load was minimal
otherwise:

                 CPU     %user     %nice   %system   %iowait    %steal
%idle
 14:30:01        all      9.55      0.00      0.63      0.02      0.00
89.81

 14:40:01        all      9.95      0.00      0.69      0.02      0.00
89.33

 14:50:01        all     10.22      0.00      0.83      0.02      0.00
88.93

 15:00:01        all     10.62      0.00      1.63      0.76      0.00
86.99

 15:10:01        all      8.55      0.00      0.72      0.12      0.00
90.61

The conjecture here is that lots of idle connections make the server appear
to have less memory available than it looks, and sudden transient demands
can cause it to destabilize.

Just throwing it out there, if it can be shown to help it may be supportive
of moving forward with something like this, either instead of, or along
with, O_DIRECT or other internalized database memory management
strategies.  Lowering context switches, faster page access etc are of
course nice would not be a game changer for the workloads we see which are
pretty varied  (OLTP, analytics) although we don't extremely high
transaction rates.

merlin

Reply via email to