Hi Tomas,
I've tried the pgbench test
again, to see if it gets stuck somewhere, and I'm observing this on a
new / idle cluster:
$ pgbench -n -f test.sql -P 1 test -T 60
pgbench (18devel)
progress: 1.0 s, 1647.9 tps, lat 0.604 ms stddev 0.438, 0 failed
progress: 2.0 s, 1374.3 tps, lat 0.727 ms stddev 0.386, 0 failed
progress: 3.0 s, 1514.4 tps, lat 0.661 ms stddev 0.330, 0 failed
progress: 4.0 s, 1563.4 tps, lat 0.639 ms stddev 0.212, 0 failed
progress: 5.0 s, 1665.0 tps, lat 0.600 ms stddev 0.177, 0 failed
progress: 6.0 s, 1538.0 tps, lat 0.650 ms stddev 0.192, 0 failed
progress: 7.0 s, 1491.4 tps, lat 0.670 ms stddev 0.261, 0 failed
progress: 8.0 s, 1539.5 tps, lat 0.649 ms stddev 0.443, 0 failed
progress: 9.0 s, 1517.0 tps, lat 0.659 ms stddev 0.167, 0 failed
progress: 10.0 s, 1594.0 tps, lat 0.627 ms stddev 0.227, 0 failed
progress: 11.0 s, 28.0 tps, lat 0.705 ms stddev 0.277, 0 failed
progress: 12.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 13.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 14.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 16.0 s, 1480.6 tps, lat 4.043 ms stddev 130.113, 0 failed
progress: 17.0 s, 1524.9 tps, lat 0.655 ms stddev 0.286, 0 failed
progress: 18.0 s, 1246.0 tps, lat 0.802 ms stddev 0.330, 0 failed
progress: 19.0 s, 1383.1 tps, lat 0.722 ms stddev 0.934, 0 failed
progress: 20.0 s, 1432.7 tps, lat 0.698 ms stddev 0.199, 0 failed
...
There's always a period of 10-15 seconds when everything seems to be
working fine, and then a couple seconds when it gets stuck, with the usual
LOG: Wait for 69454 process to publish stats timed out, trying again
The PIDs I've seen were for checkpointer, autovacuum launcher, ... all
of that are processes that should be handling the signal, so how come it
gets stuck every now and then? The system is entirely idle, there's no
contention for the shmem stuff, etc. Could it be forgetting about the
signal in some cases, or something like that?
Yes, This occurs when, due to concurrent signals received by a backend,
both signals are processed together, and stats are published only once.
Once the stats are read by the first client that gains access, they are erased,
causing the second client to wait until timeout.
If we make clients wait for the latest stats, timeouts may occur during
concurrent
operations. To avoid such timeouts, we can retain the previously published
memory
statistics for every backend and avoid waiting for the latest statistics when
the
previous statistics are newer than STALE_STATS_LIMIT. This limit can be
determined
based on the server load and how fast the memory statistics requests are being
handled by the server.
For example, on a server running make -j 4 installcheck-world while concurrently
probing client backends for memory statistics using pgbench, accepting
statistics
that were approximately 1 second old helped eliminate timeouts. Conversely, on
an
idle system, waiting for new statistics when the previous ones were older than
0.1
seconds was sufficient to avoid any timeouts caused by concurrent requests.
PFA an updated and rebased patch that includes the capability to associate
timestamps with statistics. Additionally, I have made some minor fixes and
improved
the indentation.
Currently, I have set STALE_STATS_LIMIT to 0.5 seconds in code. which means do
not
do not wait for newer statistics if previous statistics were published within
the last
5 seconds of current request.
Inshort, there are following options to design the wait for statistics
depending on whether
we expect concurrent requests to a backend for memory statistics to be common.
1. Always get the latest statistics and timeout if not able to.
This works fine for sequential probing which is going to be the most common use
case.
This can lead to a backend timeouts upto MAX_TRIES * MEMSTATS_WAIT_TIMEOUT.
2. Determine the appropriate STALE_STATS_LIMIT and not wait for the latest
stats if
previous statistics are within that limit .
This will help avoid the timeouts in case of the concurrent requests.
3. Do what v10 patch on this thread does -
Wait for the latest statistics for up to MEMSTATS_WAIT_TIMEOUT;
otherwise, display the previous statistics, regardless of when they were
published.
Since timeouts are likely to occur only during concurrent requests, the
displayed
statistics are unlikely to be very outdated.
However, in this scenario, we observe the behavior you mentioned, i.e.,
concurrent
backends can get stuck for the duration of MEMSTATS_WAIT_TIMEOUT
(currently 5 seconds as per the current settings).
I am inclined toward the third approach, as concurrent requests are not expected
to be a common use case for this feature. Moreover, with the second approach,
determining an appropriate value for STALE_STATS_LIMIT is challenging, as it
depends on the server's load.