On 12/9/24 13:30, Heikki Linnakangas wrote: > On 09/12/2024 01:12, Tomas Vondra wrote: >> On 11/14/24 15:13, Heikki Linnakangas wrote: >>> On 09/10/2024 23:40, Heikki Linnakangas wrote: >>>> I pushed the first three patches, with the new test and one of the >>>> small >>>> refactoring patches. Thanks for all the comments so far! Here is a new >>>> version of the remaining patches. >>>> >> Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84 >> seems to have problems with valgrind :-( I reliably get this failure: > > How exactly do you run the test with valgrind? What platform? >
It failed for me on both amd64 (Fedora 41) and rpi5 32/64-bit (Debian). > It works for me, with this: > > (cd build && ninja && rm -rf tmp_install && meson test --suite setup && > valgrind --leak-check=no --gen-suppressions=all --suppressions=/home/ > heikki/git-sandbox/postgresql/src/tools/valgrind.supp --time-stamp=yes > --error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END --log-file=$HOME/ > pg-valgrind/%p.log --trace-children=yes meson test --suite postmaster ) > I have a patch that tweaks pg_ctl/pg_regress to execute valgrind, so I just do ./configure --enable-debug --prefix=/home/user/builds/master --enable-depend --enable-cassert --enable-tap-tests CPPFLAGS="-O0 -ggdb3 -DUSE_VALGRIND" and then the usual "make check" or whatever. The patch has a hardcoded path to the .supp file, and places the valgrind log into /tmp. It has worked for me fine up until that commit, and it still seems to be working in every other test directory. >> t/001_connection_limits.pl .. 3/? # Tests were run but no plan was >> declared and done_testing() was not seen. >> # Looks like your test exited with 29 just after 4. >> t/001_connection_limits.pl .. Dubious, test returned 29 (wstat 7424, >> 0x1d00) >> All 4 subtests passed >> >> >> and tmp_check/log/regress_log_001_connection_limits says: >> >> >> [23:48:44.444](1.129s) ok 3 - reserved_connections limit >> [23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches >> process ended prematurely at >> /home/user/work/postgres/src/test/postmaster/../../../src/test/perl/ >> PostgreSQL/Test/BackgroundPsql.pm >> line 154. >> # Postmaster PID for node "primary" is 198592 >> >> >> That BackgroundPsql.pm line is this in wait_connect() >> >> $self->{run}->pump() >> until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired; >> >> By trial and error I found that it fails on this line 70: >> >> push(@sessions, background_psql_as_user('regress_superuser')); >> >> but I have no idea idea why. There are multiple similar calls a couple >> lines earlier, and those work fine. And various other TAP tests with >> background_sql() work fine too. >> >> So what's so special about this particular line? > > Weird. Valgrind makes everything slow; is it a timeout? Any other clues > in the logs? > Yeah, weird. Timeouts were the first thing I thought about, but it fails even if I set PGCTLTIMEOUT/PG_TEST_TIMEOUT_DEFAULT to 3600. And it doesn't seem to be waiting for anything for that long :-( regards -- Tomas Vondra
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index d6bb2c33119..83544aeb5e2 100644 --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -487,12 +487,12 @@ start_postmaster(void) * has the same PID as the current child process. */ if (log_file != NULL) - cmd = psprintf("exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1", - exec_path, pgdata_opt, post_opts, + cmd = psprintf("exec valgrind --quiet --trace-children=yes --track-origins=yes --read-var-info=yes --num-callers=20 --leak-check=no --gen-suppressions=all --suppressions=/home/user/work/postgres/src/tools/valgrind.supp --error-limit=no --log-file=/tmp/valgrind.%d.log \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1", + getpid(), exec_path, pgdata_opt, post_opts, DEVNULL, log_file); else - cmd = psprintf("exec \"%s\" %s%s < \"%s\" 2>&1", - exec_path, pgdata_opt, post_opts, DEVNULL); + cmd = psprintf("exec valgrind --quiet --trace-children=yes --track-origins=yes --read-var-info=yes --num-callers=20 --leak-check=no --gen-suppressions=all --suppressions=/home/user/work/postgres/src/tools/valgrind.supp --error-limit=no --log-file=/tmp/valgrind.%d.log \"%s\" %s%s < \"%s\" 2>&1", + getpid(), exec_path, pgdata_opt, post_opts, DEVNULL); (void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL); diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index 0e40ed32a21..97d880c3deb 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -2478,10 +2478,10 @@ regression_main(int argc, char *argv[], * Start the temp postmaster */ snprintf(buf, sizeof(buf), - "\"%s%spostgres\" -D \"%s/data\" -F%s " + "valgrind --quiet --trace-children=yes --track-origins=yes --read-var-info=yes --num-callers=20 --leak-check=no --gen-suppressions=all --suppressions=/home/user/work/postgres/src/tools/valgrind.supp --error-limit=no --log-file=/tmp/valgrind.%d.log \"%s%spostgres\" -D \"%s/data\" -F%s " "-c \"listen_addresses=%s\" -k \"%s\" " "> \"%s/log/postmaster.log\" 2>&1", - bindir ? bindir : "", + getpid(), bindir ? bindir : "", bindir ? "/" : "", temp_instance, debug ? " -d 5" : "", hostname ? hostname : "", sockdir ? sockdir : "",