On Tue, 3 Oct 2000 01:08:31 -0400, James Mastros wrote:
>It'd be somwhat useful, I think, if you could return somthing like \matched
>to
>let paren-catching of the ?{} thingy have somthing other then "". (Remember,
>a ref is always true.)
>
>For example, that would let you parse somthing inside a regex that
>requires a complicated decision only once, with only common
>regexish stuff outside the regex itself.
>
>Of course, auto-deref is evil, but you can't just use the return value, or
>you couldn't return things that aren't true, but match.
I have a few problems with that.
First, I want assertins to be something that is easily explained to
people experienced with regexes. I think that what I propose in this RFC
can be explained in, ooh, under 20 seconds?
Second, I don't think there's anything you can do with your version,
that you can't do with an assertion followed by a plain regex. Well,
actually, I can: if your code would say "match $n characters", whatever
those characters may be. You can emulate it, though.
Your version would allow to match things that aren't even there. For
example, in the regex
a(?{ ... assert and math "xyz" ... })b
you can even let this match in "azb", even though there isn't an "xyz"
in the string.
And I think your version would be *less* efficient, not more. See the
example I gave in the RFC:
$_ = "SKIP buzzer";
if(/(?{print "Testing\n";})([a-z])\1(?{print "Got a match: $1\n"})/) {
print "YES\n";
} else {
print "NO\n";
}
This prints:
Testing
Testing
Testing
Got a match: z
YES
So for the fact that the assertion is garanteed not to match any
characters, the regex engine can be (and is) optimized to search for the
thing that *must follow* the assertion, in this case, a lower case
letter. The assertion isn't even tested at locations that don't start
with a lower case letter!
In your case, where an assertion can match anything, it woiuld have to
be invoked in every single character position.
As for "expensive calculations": you could put a value in a variable,
and use that further on. If you're sure that this code *will* be
invoked, which isn't alway clearcut.
>It feels like "transactional variables" may be a
>Good Thing here.
You've got a point there. Yes, transactional variables sound just like
what is needed here. If these were generalized to all of Perl, this
special "local" for regexes wouldn't be as much of a special case any
more.
Oh, here's another reason why I'm not too fond of this special kind of
local, as per the example in perlre:
$_ = 'a' x 10000;
m<
(?{ $cnt = 0 }) # Initialize $cnt.
(
a
(?{
local $cnt = $cnt + 1;
# Update $cnt, backtracking-safe.
})
)*
aaaa
(?{ $res = $cnt })
# On success copy to non-localized location.
>x;
Guess what? this will create a stack, or linked list, containing 10000
copies of $cnt, localized. That's a lot of memory for not even a huge
source string.
--
Bart.