Hi,

On Mon, 30 Jun 2025 at 18:01, Andres Freund <and...@anarazel.de> wrote:
>
> Hi,
>
> On 2025-06-30 16:01:04 +0300, Nazir Bilal Yavuz wrote:
> > This patch aims to improve 027_stream_regress test's regression test
> > error reporting per Andres' suggestion [1].
>
> Thanks for working on that!
>
>
> One thing I don't yet like is that I think we should report if the primary is
> alive *before* reporting the diff and skipping reporting it if the primary
> crashed. It's not interesting to report a long diff if the server crashed IMO.

I agree with you. So, the current logic is:

If primary is not alive: Do not report anything.
If only primary is alive: Report the entire diff file.
If both primary and standby are alive: Report entire diff file and add
head+tail of diff to the failure message.

Done like above in v2.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
From b1039c23dc78934882c69329b8865fd5902cdcda Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavu...@gmail.com>
Date: Tue, 17 Jun 2025 16:08:20 +0300
Subject: [PATCH v2] Improve error reporting in 027_stream_regress test

Previously, the 027_stream_regress test only reported that the regression
test had failed, without showing the actual error. The detailed failure
information was hidden in the regression.diffs file.

This commit improves the situation by including the head and tail of
regression.diffs directly in the failure message if both the primary and
standby are alive. This helps quickly identify the root cause without
needing to open extra files.

Also, regression diffs are only meaningful if the primary is still alive
after the regression test failure, otherwise something else is broken.
So, do not report anything if the primary is not alive although
regression tests are failed.

Suggested-by: Andres Freund <and...@anarazel.de>
---
 src/test/perl/PostgreSQL/Test/Cluster.pm  | 19 +++++++
 src/test/recovery/t/027_stream_regress.pl | 64 ++++++++++++++++++++++-
 2 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 301766d2ed9..f71a331208e 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -290,6 +290,25 @@ sub connstr
 
 =pod
 
+=item $node->is_alive()
+
+Check if the node is alive.
+
+=cut
+
+sub is_alive {
+    my ($self) = @_;
+
+    my $host = $self->host;
+    my $port = $self->port;
+	my $null = File::Spec->devnull;
+
+    my $cmd = "pg_isready -h $host -p $port";
+	return !system("$cmd >$null 2>&1");
+}
+
+=pod
+
 =item $node->raw_connect()
 
 Open a raw TCP or Unix domain socket connection to the server. This is
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index 83def062d11..152926f725c 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -81,7 +81,13 @@ my $rc =
 	  . "--max-concurrent-tests=20 "
 	  . "--inputdir=../regress "
 	  . "--outputdir=\"$outputdir\"");
-if ($rc != 0)
+my $primary_alive = $node_primary->is_alive;
+my $standby_alive = $node_standby_1->is_alive;
+# Regression diffs are only meaningful if the primary is still alive after the
+# regression test failure, otherwise something else is broken. So, do not
+# report anything if the primary is not alive although regression tests are
+# failed.
+if ($rc != 0 && $primary_alive)
 {
 	# Dump out the regression diffs file, if there is one
 	my $diffs = "$outputdir/regression.diffs";
@@ -90,9 +96,18 @@ if ($rc != 0)
 		print "=== dumping $diffs ===\n";
 		print slurp_file($diffs);
 		print "=== EOF ===\n";
+
+		# Dump out 50 lines from head and tail of regression diffs to the
+		# failure message if the standby is alive.
+		if ($standby_alive)
+		{
+			regression_log_helper($diffs, 50);
+		}
 	}
 }
 is($rc, 0, 'regression tests pass');
+is($primary_alive, 1, 'primary is alive after the regression tests');
+is($standby_alive, 1, 'standby is alive after the regression tests');
 
 # Clobber all sequences with their next value, so that we don't have
 # differences between nodes due to caching.
@@ -181,4 +196,51 @@ UPDATE|t), 'check contents of pg_stat_statements on regression database');
 $node_standby_1->stop;
 $node_primary->stop;
 
+sub regression_log_helper
+{
+	my ($diff_file, $lines_to_dump) = @_;
+	my @lines;
+
+	open my $fh, '<', $diff_file or die "couldn't open file: $diff_file\n";
+
+	# Read all lines to process them below
+	while (my $line = <$fh>)
+	{
+		push @lines, $line;
+	}
+	close $fh;
+
+	my $line_count = scalar @lines;
+
+
+	# If the diff_file has fewer lines than (2 * $lines_to_dump), dump out the
+	# entire file.
+	if ($line_count <= (2 * $lines_to_dump))
+	{
+		diag("\n=== dumping $diff_file ===\n");
+		foreach my $line (@lines)
+		{
+			diag($line);
+		}
+	}
+	else
+	{
+		diag(
+			"\n=== dumping $lines_to_dump lines from head of $diff_file ===\n"
+		);
+		for my $i (0 .. $lines_to_dump - 1)
+		{
+			diag($lines[$i]);
+		}
+		diag(
+			"\n=== dumping $lines_to_dump lines from tail of $diff_file ===\n"
+		);
+		for my $i ($line_count - $lines_to_dump .. $line_count - 1)
+		{
+			diag($lines[$i]);
+		}
+	}
+	diag("=== EOF ===\n\n");
+}
+
 done_testing();
-- 
2.49.0

Reply via email to