On Tue, 26 Sep 2000 13:32:37 -0400, Michael Maraist wrote:

>

>I can't believe that there currently isn't a means of killing a back-track
>based on perl-code.  Looking through perlre it seems like you're right.

There is, but as MJD wrote: "it ain't pretty". Now, semantic checks or
assertions would be the only reason why I'd expect to be able to execute
perl code every time a part of a regex is succesfully parsed. Simply
look at RFC 197: a syntactic extension to regexes just to check if a
number is within a range! That is absurd, isn't it? Would a simple way
to include localized tests, *any*¨test, make more sense?

>I'm
>not really crazy about breaking backward compatibilty like this though.  It
>shouldn't be too hard to find another character sequence to perform your
>above job.

Me neither. But many prominent people in the Perl World have expressed
their amazement when they found out that the purpose of embedding Perl
in a regex wasn't aimed to just do this kind of tests. (?{...}) hasn't
even been tried out yet by many people, let alone that they'd use it in
production code. (?{...}) is notorious for dumping core. I can't see why
it can't be recycled. After all, it still executes Perl code.

>Beyond that, there's a growing rift between reg-ex extenders and purifiers.
>I assume the functionality you're trying to produce above is to find the
>first bare number that is less than 256 (your above would match the 25 in
>256).. 

You're forgetting about greediness. This test simply answers the
question: "will this do?" If the answer is always yes, the regex will
*always* match the same thing as it would do without this assertion.
Compare it to other assertions, such as /\b/, anchors (/^/ and /$/), and
lookahead and loobehind. These too don't really control what it would
match. They can only express their veto.

>In any case, the above is not very intuitive to the casual observers as
>might be
>
>while ( /(\d+)/g ) {
>  if ( $1 < 256 ) {
>    $answer = $1;
>    last;
>  }
>}

Maybe for this simple example. But the same can be said of lookahead and
lookbehind. It takes a *bit* of getting used to, but it's very simple,
and very powerful. IMO.

>Likewise, complex matching tokens are the realm of a parser (I'm almost
>getting tired of saying that).  Please be kind to your local maintainer,
>don't proliferate n'th order code complexities such as recursive or
>conditional reg-ex's.

I said nothing of recursive regexes. Again, just look at RFC 197, and
see what complex rules people would like to cram into a regex. Or look
at the examples in Friedl's book, to see what contortions people put
themselves through, just to make sure that they only match numbers
between 0 and 23:

        /[01]?[09]|2[0-3]/
        /[01]?[4-9]|[012]?[0-3]/

So you think these are easy on the maintainer? I think not. A simple
boolean expression, "match a number and it must be 23 or less", is far
simpler, at least to me.

-- 
        Bart.

Reply via email to