Re: Improve error reporting in 027_stream_regress test

Nazir Bilal Yavuz Fri, 18 Jul 2025 01:57:35 -0700

Hi,

On Thu, 17 Jul 2025 at 03:08, Michael Paquier <mich...@paquier.xyz> wrote:
>
> On Wed, Jul 16, 2025 at 02:32:53PM +0300, Nazir Bilal Yavuz wrote:
> > On Wed, 16 Jul 2025 at 04:39, Michael Paquier <mich...@paquier.xyz> wrote:
> >> Hmm.  Is that actually useful as we know that the standby has been
> >> stalen down when running the test?  Even if we report something, we
> >> could always trim the output, like the standby case.
>
> My fingers and brain have flipped here.  I meant likely
> s/stalen/taken/.
>
> > I guess you are right. Also, I tried killing standby while the
> > 027_stream_regress test was running and the test did not fail; instead
> > it continued until timeout. If that is always the case, then it makes
> > sense to report if and only if both primary and standby are alive.
>
> Okay.  Understood.  I'm pretty sure that somebody around you has also
> run the test and crashed the standby on replay, to note that the
> pattern on HEAD was bad.


Updated, now error reporting happens only if regression tests are
failed and both primary and standby are alive.

> > system_log() logs which command is run. I actually do not want to log
> > the command because it needs to be run after the regression tests and
> > before the 'regression tests pass'. If I log the command it looks like
> > this:
> >
> > # All 228 tests passed.
> > # Running: pg_isready -h /tmp/EXCcSldGZE -p 10493 >/dev/null 2>&1
> > # Running: pg_isready -h /tmp/EXCcSldGZE -p 10494 >/dev/null 2>&1
> > (21.939s) ok 2 - regression tests pass
> > (0.000s) ok 3 - primary is alive after the regression tests
> > (0.000s) ok 4 - standby is alive after the regression tests
> >
> > I think it looks like these pg_isready commands run randomly.
>
> FWIW, I'm usually in favor of a bit more logging, to understand what
> happens in the test script under-the-hood.  That's less guessing when
> calling these routines.  If I'm outvoted, that's fine.

I don't have a strong opinion on this. My point was 'pg_isready' being
on top of 'regression test  pass' and not being on top of
'primary|standby is alive after the regression tests' gives a false
impression to me. However, if you say this is okay; then I can update
it.

> >> We could use that as a routine of its own in Utils.pm?  It's not
> >> really something only for diff files, more something that we can use
> >> to trim the contents of a file, which could be the tail of a server
> >> log file as well..  It's always difficult to measure what a "good"
> >> number of lines is, but perhaps we could use an environment variable
> >> to make that configurable rather than enforce a policy of 50 lines
> >> because we think it's good enough?
> >
> > I think this is a good idea. How about a function called file_trim()
> > with three arguments: the file name, a mode ('head' or 'tail'), and
> > the number of lines to trim from that end. This approach may require
> > reading the file more than once, but it is easy to use.
>
> Sounds fine to me.  Thanks!

I added that as 0001. I used a shifting method for the 'tail'
direction to not use too much memory. I found that there is
'File::ReadBackwards' in Perl but you need to install it, so I didn't
use it.

Now patch is divided into three:

0001: 'Add trim_file() helper to Utils.pm' -> Which effectively does
nothing, just adds a function to be used for a subsequent patch. This
function accepts 'line_count' as an argument but
'PG_TEST_FILE_TRIM_LINES' environment variable overrides it. Should I
document 'PG_TEST_FILE_TRIM_LINES' somewhere?

0002: 'Improve error reporting in 027_stream_regress test' -> Uses
trim_file() function to improve error reporting by including the head
and tail of regression.diffs directly in the failure message.

0003: 'Check primary and standby are alive after regression tests in
027_stream_regress' -> Add test for checking status of primary and
standby after the regression tests in 027_stream_regress. Also, it
makes error reporting happen only if regression tests are failed and
both primary and standby are alive.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

From 9af435039575fa76dcb710f40d87d3e792e2f6d2 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavu...@gmail.com>
Date: Fri, 18 Jul 2025 10:34:28 +0300
Subject: [PATCH v3 1/3] Add trim_file() helper to Utils.pm

The trim_file() function trims lines from a file, either from the head
or the tail, based on the provided direction. It takes how many lines to
trim as a 'line_count' argument but if 'PG_TEST_FILE_TRIM_LINES'
environment variable is set, it overrides the 'line_count'.

Suggested-by: Michael Paquier <mich...@paquier.xyz>
Discussion: https://postgr.es/m/CAN55FZ1D6KXvjSs7YGsDeadqCxNF3UUhjRAfforzzP0k-cE%3DbA%40mail.gmail.com
---
 src/test/perl/PostgreSQL/Test/Utils.pm | 53 ++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 85d36a3171e..42ada64d5b7 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -68,6 +68,7 @@ our @EXPORT = qw(
   slurp_file
   append_to_file
   string_replace_file
+  trim_file
   check_mode_recursive
   chmod_recursive
   check_pg_config
@@ -590,6 +591,58 @@ sub string_replace_file
 
 =pod
 
+=item trim_file(filename, direction, line_count)
+
+Return lines from the head or tail of given file.
+
+=cut
+
+sub trim_file
+{
+	my ($filename, $direction, $line_count) = @_;
+	my @lines;
+
+	if ($direction ne 'tail' && $direction ne 'head')
+	{
+		die
+		  "trim_file(_, direction, _) direction should be either 'tail' or 'head'.";
+	}
+
+	my $temp_env_var = $ENV{PG_TEST_FILE_TRIM_LINES};
+	if (defined $temp_env_var)
+	{
+		$line_count = $temp_env_var;
+	}
+	if ($line_count <= 0)
+	{
+		return;
+	}
+
+	open my $fh, '<', $filename or die "couldn't open file: $filename\n";
+
+	if ($direction eq 'head')
+	{
+		while (my $line = <$fh>)
+		{
+			push @lines, $line;
+			last if $. == $line_count;
+		}
+	}
+	elsif ($direction eq 'tail')
+	{
+		while (my $line = <$fh>)
+		{
+			push @lines, $line;
+			shift @lines if @lines > $line_count;
+		}
+	}
+
+	close $fh;
+	return @lines;
+}
+
+=pod
+
 =item check_mode_recursive(dir, expected_dir_mode, expected_file_mode, ignore_list)
 
 Check that all file/dir modes in a directory match the expected values,
-- 
2.50.0

From be178882824329870ef7f97e1f3cf0abda500831 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavu...@gmail.com>
Date: Fri, 18 Jul 2025 10:34:41 +0300
Subject: [PATCH v3 2/3] Improve error reporting in 027_stream_regress test

Previously, the 027_stream_regress test only reported that the regression
test had failed, without showing the actual error. The detailed failure
information was hidden in the regression.diffs file.

This commit improves the situation by including the head and tail of
regression.diffs directly in the failure message. This helps quickly
identify the root cause without needing to open extra files.

Suggested-by: Andres Freund <and...@anarazel.de>
Reviewed-by: Michael Paquier <mich...@paquier.xyz>
Discussion: https://postgr.es/m/CAN55FZ1D6KXvjSs7YGsDeadqCxNF3UUhjRAfforzzP0k-cE%3DbA%40mail.gmail.com
---
 src/test/recovery/t/027_stream_regress.pl | 53 +++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index 83def062d11..981e3aa7e77 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -90,6 +90,10 @@ if ($rc != 0)
 		print "=== dumping $diffs ===\n";
 		print slurp_file($diffs);
 		print "=== EOF ===\n";
+
+		# Dump out 50 lines from head and tail of regression diffs to the
+		# failure message.
+		regression_log_helper($diffs, 50);
 	}
 }
 is($rc, 0, 'regression tests pass');
@@ -181,4 +185,53 @@ UPDATE|t), 'check contents of pg_stat_statements on regression database');
 $node_standby_1->stop;
 $node_primary->stop;
 
+sub regression_log_helper
+{
+	my ($diff_file, $lines_to_dump) = @_;
+	my @lines;
+
+	my $temp_env_var = $ENV{PG_TEST_FILE_TRIM_LINES};
+	if (defined $temp_env_var)
+	{
+		$lines_to_dump = $temp_env_var;
+	}
+	if ($lines_to_dump <= 0)
+	{
+		return;
+	}
+
+	open my $fh, '<', $diff_file or die "couldn't open file: $diff_file\n";
+
+	while (my $line = <$fh>)
+	{
+		push @lines, $line;
+	}
+	close $fh;
+
+	my $line_count = scalar @lines;
+
+	# If the diff_file has fewer lines than (2 * $lines_to_dump), dump out the
+	# entire file.
+	if ($line_count <= (2 * $lines_to_dump))
+	{
+		diag("\n=== dumping $diff_file ===\n");
+		foreach my $line (@lines)
+		{
+			diag($line);
+		}
+	}
+	else
+	{
+		diag(
+			"\n=== dumping $lines_to_dump lines from head of $diff_file ===\n"
+		);
+		diag(trim_file($diff_file, 'head', $lines_to_dump));
+		diag(
+			"\n=== dumping $lines_to_dump lines from tail of $diff_file ===\n"
+		);
+		diag(trim_file($diff_file, 'tail', $lines_to_dump));
+	}
+	diag("=== EOF ===\n\n");
+}
+
 done_testing();
-- 
2.50.0

From b33abe1e315b68fc1a5d880f6fdecd20714af8dc Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavu...@gmail.com>
Date: Fri, 18 Jul 2025 10:21:39 +0300
Subject: [PATCH v3 3/3] Check primary and standby are alive after regression
 tests in 027_stream_regress

Add a test for checking if the primary and the standby are alive after
the regression tests in 027_stream_regress.

Also, regression diffs are only meaningful if both the primary and the
standby are still alive after the regression test failure, otherwise
something else is broken. So, do not report anything if the primary or
standby are not alive although regression tests are failed.

Suggested-by: Andres Freund <and...@anarazel.de>
Reviewed-by: Michael Paquier <mich...@paquier.xyz>
Discussion: https://postgr.es/m/CAN55FZ1D6KXvjSs7YGsDeadqCxNF3UUhjRAfforzzP0k-cE%3DbA%40mail.gmail.com
---
 src/test/perl/PostgreSQL/Test/Cluster.pm  | 20 ++++++++++++++++++++
 src/test/recovery/t/027_stream_regress.pl | 10 +++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 301766d2ed9..3b9ce34db63 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -290,6 +290,26 @@ sub connstr
 
 =pod
 
+=item $node->is_alive()
+
+Check if the node is alive.
+
+=cut
+
+sub is_alive {
+    my ($self) = @_;
+	local %ENV = $self->_get_env();
+
+    my $host = $self->host;
+    my $port = $self->port;
+	my $null = File::Spec->devnull;
+
+    my $cmd = "pg_isready -h $host -p $port";
+	return !system("$cmd >$null 2>&1");
+}
+
+=pod
+
 =item $node->raw_connect()
 
 Open a raw TCP or Unix domain socket connection to the server. This is
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index 981e3aa7e77..3de570ead5d 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -81,7 +81,13 @@ my $rc =
 	  . "--max-concurrent-tests=20 "
 	  . "--inputdir=../regress "
 	  . "--outputdir=\"$outputdir\"");
-if ($rc != 0)
+my $primary_alive = $node_primary->is_alive;
+my $standby_alive = $node_standby_1->is_alive;
+# Regression diffs are only meaningful if both the primary and the standby are
+# still alive after the regression test failure, otherwise something else is
+# broken. So, do not report anything if the primary or standby are not alive
+# although regression tests are failed.
+if ($rc != 0 && $primary_alive && $standby_alive)
 {
 	# Dump out the regression diffs file, if there is one
 	my $diffs = "$outputdir/regression.diffs";
@@ -97,6 +103,8 @@ if ($rc != 0)
 	}
 }
 is($rc, 0, 'regression tests pass');
+is($primary_alive, 1, 'primary is alive after the regression tests');
+is($standby_alive, 1, 'standby is alive after the regression tests');
 
 # Clobber all sequences with their next value, so that we don't have
 # differences between nodes due to caching.
-- 
2.50.0

Re: Improve error reporting in 027_stream_regress test

Reply via email to