05.01.2024 02:48, Michael Paquier wrote:
On Thu, Jan 04, 2024 at 04:00:01PM +0300, Alexander Lakhin wrote:
Reproduced here.
Did you just make the run slow enough to show the failure with
valgrind?

Yes, I just run several test instances (with no extra modifications) under
Valgrind with parallel as follows:
for i in `seq 20`; do cp -r src/test/recovery/ src/test/recovery_$i/; sed "s|src/test/recovery|src/test/recovery_$i|" -i src/test/recovery_$i/Makefile; done

for i in `seq 20`; do echo "iteration $i"; parallel --halt now,fail=1 -j20 --linebuffer --tag PROVE_TESTS="t/035*" NO_TEMP_INSTALL=1 make check -s -C src/test/recovery_{} PROVE_FLAGS="--timer" ::: `seq 20` || break; done

The test run fails for me on iterations 9, 8, 14 like so:
...
iteration 9
...
5
5       #   Failed test 'inactiveslot slot invalidation is logged with vacuum 
on pg_authid'
5       #   at t/035_standby_logical_decoding.pl line 224.
5
5       #   Failed test 'activeslot slot invalidation is logged with vacuum on 
pg_authid'
5       #   at t/035_standby_logical_decoding.pl line 229.
13      [07:35:53] t/035_standby_logical_decoding.pl .. ok   432930 ms ( 0.02 usr  0.00 sys + 292.08 cusr 13.05 csys = 305.15 CPU)
13      [07:35:53]
13      All tests successful.
...

I've also reproduced it without Valgrind in a VM with CPU slowed down to
5% (on iterations 2, 5, 4), where average test duration is about 800 sec:

4
4       #   Failed test 'inactiveslot slot invalidation is logged with vacuum 
on pg_authid'
4       #   at t/035_standby_logical_decoding.pl line 222.
4
4       #   Failed test 'activeslot slot invalidation is logged with vacuum on 
pg_authid'
4       #   at t/035_standby_logical_decoding.pl line 227.
6       [15:16:53] t/035_standby_logical_decoding.pl .. ok   763168 ms ( 0.68 usr  0.10 sys + 19.55 cusr 102.59 csys = 122.92 CPU)


As I can see in the failure logs you referenced, the first problem is:
#   Failed test 'inactiveslot slot invalidation is logged with vacuum on 
pg_authid'
#   at t/035_standby_logical_decoding.pl line 224.
It reminded me of:
https://www.postgresql.org/message-id/b2a1f7d0-7629-72c0-2da7-e9c4e336b010%40gmail.com

It seems that it's not something new, and maybe that my analysis is still
valid. If so, VACUUM FREEZE/autovacuum = off should fix the issue.
Not sure about that yet.


Your suspicion was proved right. After
git show c161ab74f src/test/recovery/t/035_standby_logical_decoding.pl  | git 
apply -R
20 iterations with 20 tests in parallel performed successfully for me
(twice).

So it looks like c161ab74f really made the things worse.

Best regards,
Alexander


Reply via email to