On 2025/01/21 20:27, Rahila Syed wrote:
Hi Tomas,

      I've tried the pgbench test
    again, to see if it gets stuck somewhere, and I'm observing this on a
    new / idle cluster:

    $ pgbench -n -f test.sql -P 1 test -T 60
    pgbench (18devel)
    progress: 1.0 s, 1647.9 tps, lat 0.604 ms stddev 0.438, 0 failed
    progress: 2.0 s, 1374.3 tps, lat 0.727 ms stddev 0.386, 0 failed
    progress: 3.0 s, 1514.4 tps, lat 0.661 ms stddev 0.330, 0 failed
    progress: 4.0 s, 1563.4 tps, lat 0.639 ms stddev 0.212, 0 failed
    progress: 5.0 s, 1665.0 tps, lat 0.600 ms stddev 0.177, 0 failed
    progress: 6.0 s, 1538.0 tps, lat 0.650 ms stddev 0.192, 0 failed
    progress: 7.0 s, 1491.4 tps, lat 0.670 ms stddev 0.261, 0 failed
    progress: 8.0 s, 1539.5 tps, lat 0.649 ms stddev 0.443, 0 failed
    progress: 9.0 s, 1517.0 tps, lat 0.659 ms stddev 0.167, 0 failed
    progress: 10.0 s, 1594.0 tps, lat 0.627 ms stddev 0.227, 0 failed
    progress: 11.0 s, 28.0 tps, lat 0.705 ms stddev 0.277, 0 failed
    progress: 12.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 13.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 14.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 16.0 s, 1480.6 tps, lat 4.043 ms stddev 130.113, 0 failed
    progress: 17.0 s, 1524.9 tps, lat 0.655 ms stddev 0.286, 0 failed
    progress: 18.0 s, 1246.0 tps, lat 0.802 ms stddev 0.330, 0 failed
    progress: 19.0 s, 1383.1 tps, lat 0.722 ms stddev 0.934, 0 failed
    progress: 20.0 s, 1432.7 tps, lat 0.698 ms stddev 0.199, 0 failed
    ...

    There's always a period of 10-15 seconds when everything seems to be
    working fine, and then a couple seconds when it gets stuck, with the usual

       LOG:  Wait for 69454 process to publish stats timed out, trying again

    The PIDs I've seen were for checkpointer, autovacuum launcher, ... all
    of that are processes that should be handling the signal, so how come it
    gets stuck every now and then? The system is entirely idle, there's no
    contention for the shmem stuff, etc. Could it be forgetting about the
    signal in some cases, or something like that?

Yes, This occurs when, due to concurrent signals received by a backend,
both signals are processed together, and stats are published only once.
Once the stats are read by the first client that gains access, they are erased,
causing the second client to wait until timeout.

If we make clients wait for the latest stats, timeouts may occur during 
concurrent
operations. To avoid such timeouts, we can retain the previously published 
memory
statistics for every backend and avoid waiting for the latest statistics when 
the
previous statistics are newer than STALE_STATS_LIMIT. This limit can be 
determined
based on the server load and how fast the memory statistics requests are being
handled by the server.

For example, on a server running make -j 4 installcheck-world while concurrently
probing client backends for memory statistics using pgbench, accepting 
statistics
that were approximately 1 second old helped eliminate timeouts. Conversely, on 
an
idle system, waiting for new statistics when the previous ones were older than 
0.1
seconds was sufficient to avoid any timeouts caused by concurrent requests.

PFA an updated and rebased patch that includes the capability to associate
timestamps with statistics. Additionally, I have made some minor fixes and 
improved
the indentation.

Currently, I have set STALE_STATS_LIMIT to 0.5 seconds in code. which means do 
not
do not wait for newer statistics if previous statistics were published within 
the last
5 seconds of current request.

Inshort, there are following options to design the wait for statistics 
depending on whether
we expect concurrent requests to a backend for memory statistics to be common.

1. Always get the latest statistics and timeout if not able to.

This works fine for sequential probing which is going to be the most common use 
case.
This can lead to a backend timeouts upto MAX_TRIES * MEMSTATS_WAIT_TIMEOUT.

2. Determine the appropriate STALE_STATS_LIMIT and not wait for the latest 
stats if
previous statistics are within that limit .
This will help avoid the timeouts in case of the concurrent requests.

3.  Do what v10 patch on this thread does -

Wait for the latest statistics for up to MEMSTATS_WAIT_TIMEOUT;
otherwise, display the previous statistics, regardless of when they were 
published.

Since timeouts are likely to occur only during concurrent requests, the 
displayed
statistics are unlikely to be very outdated.
However, in this scenario, we observe the behavior you mentioned, i.e., 
concurrent
backends can get stuck for the duration of MEMSTATS_WAIT_TIMEOUT
(currently 5 seconds as per the current settings).

I am inclined toward the third approach, as concurrent requests are not expected
to be a common use case for this feature. Moreover, with the second approach,
determining an appropriate value for STALE_STATS_LIMIT is challenging, as it
depends on the server's load.

Just idea; as an another option, how about blocking new requests to
the target process (e.g., causing them to fail with an error or
returning NULL with a warning) if a previous request is still pending?
Users can simply retry the request if it fails. IMO failing quickly
seems preferable to getting stuck for a while in cases with concurrent
requests.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Reply via email to