On 2022-Sep-30, Michael Paquier wrote:

> On Thu, Sep 29, 2022 at 09:07:34PM -0700, Andres Freund wrote:
> > ISTM we should at least install a SIGINT/TERM handler in Cluster.pm that 
> > does
> > the stuff we already do in END.
> 
> Hmm, indeed.  And here I thought that END was actually taking care of
> that on an interrupt..

Me too.  But the perlmod manpage says

       An "END" code block is executed as late as possible, that is, after perl 
has
       finished running the program and just before the interpreter is being 
exited,
       even if it is exiting as a result of a die() function.  (But not if it's
       morphing into another program via "exec", or being blown out of the 
water by a
       signal--you have to trap that yourself (if you can).)

So clearly we need to fix it.  I thought it should be as simple as the
attached, since exit() calls END.  (Would it be better to die() instead
of exit()?)

But on testing, some nodes linger after being sent a shutdown signal.
I'm not clear why this is -- I think it's due to the fact that we send
the signal just as the node is starting up, which means the signal
doesn't reach the process.  (I added the 0002 patch --not for commit--
to see which Clusters were being shut down and in the trace file I can
clearly see that the nodes that linger were definitely subject to
->teardown_node).


Another funny thing: C-C'ing one run, I got this lingering process:

alvherre  800868 98.2  0.0  12144  5052 pts/9    R    11:03   0:26 
/pgsql/install/master/bin/psql -X -c BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 
32); -c SELECT pg_backup_stop() -d port=54380 host=/tmp/O_2PPNj9Fg 
dbname='postgres' replication=database

This is probably a bug in psql.  Backtrace is:

#0  PQclear (res=<optimized out>) at 
/pgsql/source/master/src/interfaces/libpq/fe-exec.c:748
#1  PQclear (res=res@entry=0x55ad308c6190) at 
/pgsql/source/master/src/interfaces/libpq/fe-exec.c:718
#2  0x000055ad2f303323 in ClearOrSaveResult (result=0x55ad308c6190) at 
/pgsql/source/master/src/bin/psql/common.c:472
#3  ClearOrSaveAllResults () at /pgsql/source/master/src/bin/psql/common.c:488
#4  ExecQueryAndProcessResults (query=query@entry=0x55ad308bc7a0 "BASE_BACKUP 
(CHECKPOINT 'fast', MAX_RATE 32);", 
    elapsed_msec=elapsed_msec@entry=0x7fff9c9941d8, 
svpt_gone_p=svpt_gone_p@entry=0x7fff9c9941d7, is_watch=is_watch@entry=false, 
    opt=opt@entry=0x0, printQueryFout=printQueryFout@entry=0x0) at 
/pgsql/source/master/src/bin/psql/common.c:1608
#5  0x000055ad2f301b9d in SendQuery (query=0x55ad308bc7a0 "BASE_BACKUP 
(CHECKPOINT 'fast', MAX_RATE 32);")
    at /pgsql/source/master/src/bin/psql/common.c:1172
#6  0x000055ad2f2f7bd9 in main (argc=<optimized out>, argv=<optimized out>) at 
/pgsql/source/master/src/bin/psql/startup.c:384


-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"How amazing is that? I call it a night and come back to find that a bug has
been identified and patched while I sleep."                (Robert Davidson)
               http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php
>From 5b01504da83e0a593a4adc24f77b22e7bfda9a0b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvhe...@alvh.no-ip.org>
Date: Fri, 30 Sep 2022 11:00:57 +0200
Subject: [PATCH 1/2] tear down nodes on signal

---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 4fef9c12e6..4a64cb749b 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1539,7 +1539,6 @@ sub can_bind
 # were created in) when the test script exits.
 END
 {
-
 	# take care not to change the script's exit value
 	my $exit_code = $?;
 
@@ -2922,6 +2921,14 @@ sub corrupt_page_checksum
 	return;
 }
 
+#
+# Signal handlers
+#
+$SIG{TERM} = $SIG{INT} = sub
+{
+	exit 1;
+};
+
 =pod
 
 =back
-- 
2.30.2

>From 38d4ef4e81a5def4307edca2e2dc207a7559f53d Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvhe...@alvh.no-ip.org>
Date: Fri, 30 Sep 2022 11:06:45 +0200
Subject: [PATCH 2/2] trace cluster termination

---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 4a64cb749b..ae80d06b7d 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1251,6 +1251,7 @@ sub new
 	my $testname = basename($0);
 	$testname =~ s/\.[^.]+$//;
 	my $node = {
+		_testname => $testname,
 		_port => $port,
 		_host => $host,
 		_basedir =>
@@ -1542,6 +1543,15 @@ END
 	# take care not to change the script's exit value
 	my $exit_code = $?;
 
+	{
+		open my $TRACE, ">>", "/tmp/perl.trace";
+		foreach my $node (@all_nodes)
+		{
+			print $TRACE "shutting down $node->{_testname} / $node->{_name}\n";
+		}
+		close $TRACE;
+	}
+
 	foreach my $node (@all_nodes)
 	{
 		$node->teardown_node;
-- 
2.30.2

Reply via email to