Re: RFC 348 (v2) Regex assertions in plain Perl code

Bart Lateur Sun, 08 Oct 2000 03:23:28 -0700
On Tue, 3 Oct 2000 01:08:31 -0400, James Mastros wrote:

>It'd be somwhat useful, I think, if you could return somthing like \matched
>to
>let paren-catching of the ?{} thingy have somthing other then "". (Remember,
>a ref is always true.)
>
>For example, that would let you parse somthing inside a regex that
>requires a complicated decision only once, with only common
>regexish stuff outside the regex itself.
>
>Of course, auto-deref is evil, but you can't just use the return value, or
>you couldn't return things that aren't true, but match.

I have a few problems with that.

First, I want assertins to be something that is easily explained to
people experienced with regexes. I think that what I propose in this RFC
can be explained in, ooh, under 20 seconds?

Second, I don't think there's anything you can do with your version,
that you can't do with an assertion followed by a plain regex. Well,
actually, I can: if your code would say "match $n characters", whatever
those characters may be. You can emulate it, though.

Your version would allow to match things that aren't even there. For
example, in the regex

        a(?{ ... assert and math "xyz" ... })b

you can even let this match in "azb", even though there isn't an "xyz"
in the string.


And I think your version would be *less* efficient, not more. See the
example I gave in the RFC:
        
  $_ = "SKIP buzzer";
  if(/(?{print "Testing\n";})([a-z])\1(?{print "Got a match: $1\n"})/) {
      print "YES\n";
  } else {
      print "NO\n";
  }

This prints:

  Testing
  Testing
  Testing
  Got a match: z
  YES

So for the fact that the assertion is garanteed not to match any
characters, the regex engine can be (and is) optimized to search for the
thing that *must follow* the assertion, in this case, a lower case
letter. The assertion isn't even tested at locations that   don't start
with a lower case letter!

In your case, where an assertion can match anything, it woiuld have to
be invoked in every single character position.

As for "expensive calculations": you could put a value in a variable,
and use that further on. If you're sure that this code *will* be
invoked, which isn't alway clearcut.


>It feels like "transactional variables" may be a
>Good Thing here.

You've got a point there. Yes, transactional variables sound just like
what is needed here. If these were generalized to all of Perl, this
special "local" for regexes wouldn't be as much of a special case any
more.


Oh, here's another reason why I'm not too fond of this special kind of
local, as per the example in perlre:

        $_ = 'a' x 10000;
        m< 
           (?{ $cnt = 0 })                    # Initialize $cnt.
           (
             a 
             (?{
                 local $cnt = $cnt + 1;
                 # Update $cnt, backtracking-safe.
             })
           )*  
           aaaa
           (?{ $res = $cnt })
           # On success copy to non-localized location.
        >x;

Guess what? this will create a stack, or linked list, containing 10000
copies of $cnt, localized. That's a lot of memory for not even a huge
source string.

-- 
        Bart.
Re: RFC 348 (v2) Regex assertions in plain Perl code

Reply via email to