Re: perl: unsafe empty pattern behavior

2024-03-12 Thread Tom Lane
Andrew Dunstan  writes:
> On 2024-03-12 Tu 18:59, Tom Lane wrote:
>> I wonder whether perlcritic has sufficiently deep understanding of
>> Perl code that it could find these hazards.

> Yeah, that was my thought too. I'd start with ProhibitComplexRegexes.pm 
> as a template.

Oooh.  Taking a quick look at the source code:

https://metacpan.org/dist/Perl-Critic/source/lib/Perl/Critic/Policy/RegularExpressions/ProhibitComplexRegexes.pm

it seems like it'd be pretty trivial to convert that from "complain if
regex contains more than N characters" to "complain if regex contains
zero characters".

regards, tom lane




Re: perl: unsafe empty pattern behavior

2024-03-12 Thread Andrew Dunstan



On 2024-03-12 Tu 18:59, Tom Lane wrote:

Jeff Davis  writes:

On Tue, 2024-03-12 at 18:53 +0100, Alvaro Herrera wrote:

I also tried grepping (for things
like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you
have ... but I only looked for the "qr" literal, not other ways to
get regexes.

I think that's fine. qr// seems the most dangerous, because it seems to
behave differently in different versions of perl.

I wonder whether perlcritic has sufficiently deep understanding of
Perl code that it could find these hazards.  I already checked,
and found that there's no built-in filter for this (at least not
in the perlcritic version I have), but maybe we could write one?
The rules seem to be plug-in modules, so you can make your own
in principle.




Yeah, that was my thought too. I'd start with ProhibitComplexRegexes.pm 
as a template.


If nobody else does it I'll have a go, but it might take a while.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: perl: unsafe empty pattern behavior

2024-03-12 Thread Tom Lane
Jeff Davis  writes:
> On Tue, 2024-03-12 at 18:53 +0100, Alvaro Herrera wrote:
>> I also tried grepping (for things
>> like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you
>> have ... but I only looked for the "qr" literal, not other ways to
>> get regexes.

> I think that's fine. qr// seems the most dangerous, because it seems to
> behave differently in different versions of perl.

I wonder whether perlcritic has sufficiently deep understanding of
Perl code that it could find these hazards.  I already checked,
and found that there's no built-in filter for this (at least not
in the perlcritic version I have), but maybe we could write one?
The rules seem to be plug-in modules, so you can make your own
in principle.

regards, tom lane




Re: perl: unsafe empty pattern behavior

2024-03-12 Thread Jeff Davis
On Tue, 2024-03-12 at 18:53 +0100, Alvaro Herrera wrote:
> I suggest that pg_dump/t/002_pg_dump.pl could use a verification that
> the ->{regexp} thing is not empty.

I'm not sure how exactly to test for an empty pattern. The problem is,
we don't really want to test for an empty pattern, because /(?^:)/ is
fine. The problem is //, which gets turned into an actual pattern
(perhaps empty or perhaps not), and by the time it's in the %tests
hash, I think it's too late to distinguish.

Again, I'm not a perl expert, so suggestions welcome.

>   I also tried grepping (for things
> like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you
> have ... but I only looked for the "qr" literal, not other ways to
> get
> regexes.

I think that's fine. qr// seems the most dangerous, because it seems to
behave differently in different versions of perl.

Grepping for regexes in perl code is an "interesting" exercise.

Regards,
Jeff Davis





Re: perl: unsafe empty pattern behavior

2024-03-12 Thread Alvaro Herrera
On 2024-Mar-12, Jeff Davis wrote:

> Do not use perl empty patterns like // or qr// or s//.../, the behavior
> is too surprising for perl non-experts.

Yeah, nasty.

> There are a few such uses in
> our tests; patch attached. Unfortunately, there is no obvious way to
> automatically detect them so I am just relying on grep. I'm sure there
> are others here who know more about perl than I do, so
> suggestions/corrections are welcome.

I suggest that pg_dump/t/002_pg_dump.pl could use a verification that
the ->{regexp} thing is not empty.  I also tried grepping (for things
like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you
have ... but I only looked for the "qr" literal, not other ways to get
regexes.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
After collecting 500 such letters, he mused, a university somewhere in
Arizona would probably grant him a degree.  (Don Knuth)




perl: unsafe empty pattern behavior

2024-03-12 Thread Jeff Davis
Moved from discussion on -committers:

https://postgr.es/m/0ef325fa06e7a1605c4e119c4ecb637c67e5fb4e.ca...@j-davis.com

Summary:

Do not use perl empty patterns like // or qr// or s//.../, the behavior
is too surprising for perl non-experts. There are a few such uses in
our tests; patch attached. Unfortunately, there is no obvious way to
automatically detect them so I am just relying on grep. I'm sure there
are others here who know more about perl than I do, so
suggestions/corrections are welcome.

Long version:

Some may know this already, but we just discovered the dangers of using
empty patterns in perl:

"If the PATTERN evaluates to the empty string, the last successfully
matched regular expression is used instead... If no match has
previously succeeded, this will (silently) act instead as a genuine
empty pattern (which will always match)."

https://perldoc.perl.org/perlop#The-empty-pattern-//

In other words, if you have code like:

   if ('xyz' =~ //)
   {
   print "'xyz' matches //\n";
   }

The match will succeed and print, because there's no previous pattern,
so // is a "genuine" empty pattern, which is treated like /.*/ (I
think?). Then, if you add some other code before it:

   if ('abc' =~ /abc/)
   {
   print "'abc' matches /abc/\n";
   }

   if ('xyz' =~ //)
   {
   print "'xyz' matches //\n";
   }

The first match will succeed, but the second match will fail, because
// is treated like /abc/.

On reflection, that does seem very perl-like. But it can cause
surprising action-at-a-distance if not used carefully, especially for
those who aren't experts in perl. It's much safer to just not use the
empty pattern.

If you use qr// instead:

https://perldoc.perl.org/perlop#qr/STRING/msixpodualn

like:

   if ('abc' =~ qr/abc/)
   {
   print "'abc' matches /abc/\n";
   }

   if ('xyz' =~ qr//)
   {
   print "'xyz' matches //\n";
   }

Then the second match may succeed or may fail, and it's not clear from
the documentation what precise circumstances matter. It seems to fail
on older versions of perl (like 5.16.3) and succeed on newer versions
(5.38.2). However, it may also depend on when the qr// is [re]compiled,
or regex flags, or locale, or may just be undefined.

Regards,
Jeff Davis


From be5aa677e37180a8c1b0faebcceab5506b1c8130 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Mon, 11 Mar 2024 16:44:56 -0700
Subject: [PATCH v1] perl: avoid empty regex patterns

Empty patterns have special behavior that uses the last successful
pattern match. This behavior can be surprising, so remove empty
patterns and instead match against exactly what is intended
(e.g. /^$/ or /.*/).

Unfortunately there's not an easy way to check for this in an
automated way, so it's likely that some cases have been missed and
will be missed in the future. This commit just cleans up known
instances.

Discussion: https://postgr.es/m/1548559.1710188...@sss.pgh.pa.us
---
 src/bin/pg_upgrade/t/003_logical_slots.pl   | 4 ++--
 src/bin/pg_upgrade/t/004_subscription.pl| 2 +-
 src/bin/psql/t/001_basic.pl | 8 
 src/bin/psql/t/010_tab_completion.pl| 2 +-
 src/test/recovery/t/037_invalid_database.pl | 2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 83d71c3084..256dfd53b1 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -78,7 +78,7 @@ command_checks_all(
 	[
 		qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
 	],
-	[qr//],
+	[qr/^$/],
 	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
 );
 ok( -d $newpub->data_dir . "/pg_upgrade_output.d",
@@ -118,7 +118,7 @@ command_checks_all(
 	[
 		qr/Your installation contains logical replication slots that can't be upgraded./
 	],
-	[qr//],
+	[qr/^$/],
 	'run of pg_upgrade of old cluster with slots having unconsumed WAL records'
 );
 
diff --git a/src/bin/pg_upgrade/t/004_subscription.pl b/src/bin/pg_upgrade/t/004_subscription.pl
index df5d6dffbc..c8ee2390d1 100644
--- a/src/bin/pg_upgrade/t/004_subscription.pl
+++ b/src/bin/pg_upgrade/t/004_subscription.pl
@@ -68,7 +68,7 @@ command_checks_all(
 	[
 		qr/max_replication_slots \(0\) must be greater than or equal to the number of subscriptions \(1\) on the old cluster/
 	],
-	[qr//],
+	[qr/^$/],
 	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
 );
 
diff --git a/src/bin/psql/t/001_basic.pl b/src/bin/psql/t/001_basic.pl
index 9f0b6cf8ca..ce875ce316 100644
--- a/src/bin/psql/t/001_basic.pl
+++ b/src/bin/psql/t/001_basic.pl
@@ -412,23 +412,23 @@ my $perlbin = $^X;
 $perlbin =~ s!\\!/!g if $PostgreSQL::Test::Utils::windows_os;
 my $pipe_cmd = "$perlbin -pe '' >$g_file";
 
-psql_like($node, "SELECT 'one' \\g | $pipe_cmd", qr//, "one command \\g");
+psql_like($node, "SELECT 'one' \\g