On 12/9/24 13:30, Heikki Linnakangas wrote:
> On 09/12/2024 01:12, Tomas Vondra wrote:
>> On 11/14/24 15:13, Heikki Linnakangas wrote:
>>> On 09/10/2024 23:40, Heikki Linnakangas wrote:
>>>> I pushed the first three patches, with the new test and one of the
>>>> small
>>>> refactoring patches. Thanks for all the comments so far! Here is a new
>>>> version of the remaining patches.
>>>>
>> Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84
>> seems to have problems with valgrind :-( I reliably get this failure:
> 
> How exactly do you run the test with valgrind? What platform?
> 

It failed for me on both amd64 (Fedora 41) and rpi5 32/64-bit (Debian).

> It works for me, with this:
> 
> (cd build && ninja && rm -rf tmp_install && meson test --suite setup &&
> valgrind --leak-check=no --gen-suppressions=all --suppressions=/home/
> heikki/git-sandbox/postgresql/src/tools/valgrind.supp --time-stamp=yes
> --error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END --log-file=$HOME/
> pg-valgrind/%p.log --trace-children=yes meson test --suite postmaster )
> 

I have a patch that tweaks pg_ctl/pg_regress to execute valgrind, so I
just do

./configure --enable-debug --prefix=/home/user/builds/master
--enable-depend --enable-cassert --enable-tap-tests CPPFLAGS="-O0 -ggdb3
-DUSE_VALGRIND"

and then the usual "make check" or whatever.

The patch has a hardcoded path to the .supp file, and places the
valgrind log into /tmp. It has worked for me fine up until that commit,
and it still seems to be working in every other test directory.

>> t/001_connection_limits.pl .. 3/? # Tests were run but no plan was
>> declared and done_testing() was not seen.
>> # Looks like your test exited with 29 just after 4.
>> t/001_connection_limits.pl .. Dubious, test returned 29 (wstat 7424,
>> 0x1d00)
>> All 4 subtests passed
>>
>>
>> and tmp_check/log/regress_log_001_connection_limits says:
>>
>>
>> [23:48:44.444](1.129s) ok 3 - reserved_connections limit
>> [23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
>> process ended prematurely at
>> /home/user/work/postgres/src/test/postmaster/../../../src/test/perl/
>> PostgreSQL/Test/BackgroundPsql.pm
>> line 154.
>> # Postmaster PID for node "primary" is 198592
>>
>>
>> That BackgroundPsql.pm line is this in wait_connect()
>>
>>    $self->{run}->pump()
>>      until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;
>>
>> By trial and error I found that it fails on this line 70:
>>
>>    push(@sessions, background_psql_as_user('regress_superuser'));
>>
>> but I have no idea idea why. There are multiple similar calls a couple
>> lines earlier, and those work fine. And various other TAP tests with
>> background_sql() work fine too.
>>
>> So what's so special about this particular line?
> 
> Weird. Valgrind makes everything slow; is it a timeout? Any other clues
> in the logs?
> 

Yeah, weird.

Timeouts were the first thing I thought about, but it fails even if I
set PGCTLTIMEOUT/PG_TEST_TIMEOUT_DEFAULT to 3600. And it doesn't seem to
be waiting for anything for that long :-(


regards

-- 
Tomas Vondra
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index d6bb2c33119..83544aeb5e2 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -487,12 +487,12 @@ start_postmaster(void)
 	 * has the same PID as the current child process.
 	 */
 	if (log_file != NULL)
-		cmd = psprintf("exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
-					   exec_path, pgdata_opt, post_opts,
+		cmd = psprintf("exec valgrind --quiet --trace-children=yes --track-origins=yes --read-var-info=yes --num-callers=20 --leak-check=no --gen-suppressions=all --suppressions=/home/user/work/postgres/src/tools/valgrind.supp --error-limit=no --log-file=/tmp/valgrind.%d.log \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
+					   getpid(), exec_path, pgdata_opt, post_opts,
 					   DEVNULL, log_file);
 	else
-		cmd = psprintf("exec \"%s\" %s%s < \"%s\" 2>&1",
-					   exec_path, pgdata_opt, post_opts, DEVNULL);
+		cmd = psprintf("exec valgrind --quiet --trace-children=yes --track-origins=yes --read-var-info=yes --num-callers=20 --leak-check=no --gen-suppressions=all --suppressions=/home/user/work/postgres/src/tools/valgrind.supp --error-limit=no --log-file=/tmp/valgrind.%d.log \"%s\" %s%s < \"%s\" 2>&1",
+					   getpid(), exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	(void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL);
 
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 0e40ed32a21..97d880c3deb 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2478,10 +2478,10 @@ regression_main(int argc, char *argv[],
 		 * Start the temp postmaster
 		 */
 		snprintf(buf, sizeof(buf),
-				 "\"%s%spostgres\" -D \"%s/data\" -F%s "
+				 "valgrind --quiet --trace-children=yes --track-origins=yes --read-var-info=yes --num-callers=20 --leak-check=no --gen-suppressions=all --suppressions=/home/user/work/postgres/src/tools/valgrind.supp --error-limit=no --log-file=/tmp/valgrind.%d.log \"%s%spostgres\" -D \"%s/data\" -F%s "
 				 "-c \"listen_addresses=%s\" -k \"%s\" "
 				 "> \"%s/log/postmaster.log\" 2>&1",
-				 bindir ? bindir : "",
+				 getpid(), bindir ? bindir : "",
 				 bindir ? "/" : "",
 				 temp_instance, debug ? " -d 5" : "",
 				 hostname ? hostname : "", sockdir ? sockdir : "",

Reply via email to