On 2022-Sep-30, Michael Paquier wrote: > On Thu, Sep 29, 2022 at 09:07:34PM -0700, Andres Freund wrote: > > ISTM we should at least install a SIGINT/TERM handler in Cluster.pm that > > does > > the stuff we already do in END. > > Hmm, indeed. And here I thought that END was actually taking care of > that on an interrupt..
Me too. But the perlmod manpage says An "END" code block is executed as late as possible, that is, after perl has finished running the program and just before the interpreter is being exited, even if it is exiting as a result of a die() function. (But not if it's morphing into another program via "exec", or being blown out of the water by a signal--you have to trap that yourself (if you can).) So clearly we need to fix it. I thought it should be as simple as the attached, since exit() calls END. (Would it be better to die() instead of exit()?) But on testing, some nodes linger after being sent a shutdown signal. I'm not clear why this is -- I think it's due to the fact that we send the signal just as the node is starting up, which means the signal doesn't reach the process. (I added the 0002 patch --not for commit-- to see which Clusters were being shut down and in the trace file I can clearly see that the nodes that linger were definitely subject to ->teardown_node). Another funny thing: C-C'ing one run, I got this lingering process: alvherre 800868 98.2 0.0 12144 5052 pts/9 R 11:03 0:26 /pgsql/install/master/bin/psql -X -c BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32); -c SELECT pg_backup_stop() -d port=54380 host=/tmp/O_2PPNj9Fg dbname='postgres' replication=database This is probably a bug in psql. Backtrace is: #0 PQclear (res=<optimized out>) at /pgsql/source/master/src/interfaces/libpq/fe-exec.c:748 #1 PQclear (res=res@entry=0x55ad308c6190) at /pgsql/source/master/src/interfaces/libpq/fe-exec.c:718 #2 0x000055ad2f303323 in ClearOrSaveResult (result=0x55ad308c6190) at /pgsql/source/master/src/bin/psql/common.c:472 #3 ClearOrSaveAllResults () at /pgsql/source/master/src/bin/psql/common.c:488 #4 ExecQueryAndProcessResults (query=query@entry=0x55ad308bc7a0 "BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);", elapsed_msec=elapsed_msec@entry=0x7fff9c9941d8, svpt_gone_p=svpt_gone_p@entry=0x7fff9c9941d7, is_watch=is_watch@entry=false, opt=opt@entry=0x0, printQueryFout=printQueryFout@entry=0x0) at /pgsql/source/master/src/bin/psql/common.c:1608 #5 0x000055ad2f301b9d in SendQuery (query=0x55ad308bc7a0 "BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);") at /pgsql/source/master/src/bin/psql/common.c:1172 #6 0x000055ad2f2f7bd9 in main (argc=<optimized out>, argv=<optimized out>) at /pgsql/source/master/src/bin/psql/startup.c:384 -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "How amazing is that? I call it a night and come back to find that a bug has been identified and patched while I sleep." (Robert Davidson) http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php
>From 5b01504da83e0a593a4adc24f77b22e7bfda9a0b Mon Sep 17 00:00:00 2001 From: Alvaro Herrera <alvhe...@alvh.no-ip.org> Date: Fri, 30 Sep 2022 11:00:57 +0200 Subject: [PATCH 1/2] tear down nodes on signal --- src/test/perl/PostgreSQL/Test/Cluster.pm | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 4fef9c12e6..4a64cb749b 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -1539,7 +1539,6 @@ sub can_bind # were created in) when the test script exits. END { - # take care not to change the script's exit value my $exit_code = $?; @@ -2922,6 +2921,14 @@ sub corrupt_page_checksum return; } +# +# Signal handlers +# +$SIG{TERM} = $SIG{INT} = sub +{ + exit 1; +}; + =pod =back -- 2.30.2
>From 38d4ef4e81a5def4307edca2e2dc207a7559f53d Mon Sep 17 00:00:00 2001 From: Alvaro Herrera <alvhe...@alvh.no-ip.org> Date: Fri, 30 Sep 2022 11:06:45 +0200 Subject: [PATCH 2/2] trace cluster termination --- src/test/perl/PostgreSQL/Test/Cluster.pm | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 4a64cb749b..ae80d06b7d 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -1251,6 +1251,7 @@ sub new my $testname = basename($0); $testname =~ s/\.[^.]+$//; my $node = { + _testname => $testname, _port => $port, _host => $host, _basedir => @@ -1542,6 +1543,15 @@ END # take care not to change the script's exit value my $exit_code = $?; + { + open my $TRACE, ">>", "/tmp/perl.trace"; + foreach my $node (@all_nodes) + { + print $TRACE "shutting down $node->{_testname} / $node->{_name}\n"; + } + close $TRACE; + } + foreach my $node (@all_nodes) { $node->teardown_node; -- 2.30.2