> On Aug 9, 2021, at 4:31 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
> There is a potentially interesting definitional question:
> what exactly ought this regexp do?
>
> ((.)){0}\2
>
> Because the capturing paren sets are zero-quantified, they will
> never be matched to any characters, so the backref can never
> have any defined referent.
Perl regular expressions are not POSIX, but if there is a principled reason
POSIX should differ from perl on this, we should be clear what that is:
#!/usr/bin/perl
use strict;
use warnings;
our $match;
if ('foo' =~ m/((.)(??{ die; })){0}(..)/)
{
print "captured 1 $1\n" if defined $1;
print "captured 2 $2\n" if defined $2;
print "captured 3 $3\n" if defined $3;
print "captured 4 $4\n" if defined $4;
print "match = $match\n" if defined $match;
}
This will print "captured 3 fo", proving that although the regular expression
is parsed with the (..) bound to the third capture group, the first two capture
groups never run. If you don't believe that, change the {0} to {1} and observe
that the script dies.
> So I think throwing an
> error is an appropriate response. The existing code will
> throw such an error for
>
> ((.)){0}\1
>
> so I guess Spencer did think about this to some extent -- he
> just forgot about the possibility of nested parens.
Ugg. That means our code throws an error where perl does not, pretty well
negating my point above. If we're already throwing an error for this type of
thing, I agree we should be consistent about it. My personal preference would
have been to do the same thing as perl, but it seems that ship has already
sailed.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company