In perl.git, the branch davem/re_eval has been created

<http://perl5.git.perl.org/perl.git/commitdiff/a4cadab9ebe750fd6381cebe7fd8fa51a44ee04a?hp=0000000000000000000000000000000000000000>

        at  a4cadab9ebe750fd6381cebe7fd8fa51a44ee04a (commit)

- Log -----------------------------------------------------------------
commit a4cadab9ebe750fd6381cebe7fd8fa51a44ee04a
Author: David Mitchell <da...@iabyn.com>
Date:   Sat Jul 23 21:29:02 2011 +0100

    make re_evals be seen by the toker/parser
    
    This commit is a first step to making the handling of (/(?{...})/ more sane.
    But see the big proviso at the end.
    
    Currently a patten like /a(?{...})b/ is uninterpreted by the lexer and
    parser, and is instead passed as-is to the regex compiler, which is
    responsible for ensuring that the embedded perl code is extracted and
    compiled. The only thing the quoted string code in the lexer currently
    does is to skip nested matched {}'s, in order to get to end of the code
    block and restart looking for interpolated variables, \Q etc.
    
    This commit makes the lexer smarter.
    
    Consider the following pattern:
    
        /FOO(?{BLOCK})BAR$var/
    
    This is currently tokenised as
    
        op_match
        (
        op_const["FOO(?{BLOCK})BAR"]
        ,
        $
        "var"
        )
    
    Instead, tokenise it as:
    
        op_match
        (
        op_const["FOO"]
        ,
        DO
        {
        BLOCK
        ;
        }
        ,
        op_const["(?{BLOCK})"]
        ,
        op_const["BAR"]
        ,
        $
        "var"
        )
    
    This means that BLOCK is itself tokenised and parsed. We also insert
    a const into the stream to include the original source text of BLOCK so
    that it's available for stringifying qr's etc.
    
    Note that by allowing the lexer/parser direct access to BLOCK, we can now
    handle things like
        /(?{"{"})/
    
    This mechanism is similar to the way something like
    
        "abc $a[foo(q(]))] def"
    
    is currently parsed: the double-quoted string handler in the lexer stops
    at $a[, the 'foo(q(]))' is treated as perl code, then at the end control is
    passed back to the string handler to handle the ' def'.
    
    This commit includes a new error message:
    
        Sequence (?{...}) not terminated with ')'
    
    since when control is passed back to the quoted-string handler, it expects
    to find the ')' as the next char. This new error mostly replaces the old
    
        Sequence (?{...}) not terminated or not {}-balanced in regex
    
    Big proviso:
    
    This commit updates toke.c to recognise the embedded code, but doesn't
    then do anything with it. The parser will pass both a compiled do block
    and a const for each embedded (?{..}), and Perl_pmruntime just throws
    away the do block and keeps the constant text instead which is passed to
    the regex compiler. So currently each code block gets compiled twice (!)
    with two sets of warnings etc. The next stage will be to pass these do
    blocks to the regex compiler.
    
    This commit is based on a patch I had originally worked up about 6 years
    ago and has been sitting bit-rotting ever since.

M       op.c
M       perl.h
M       pod/perldiag.pod
M       t/lib/strict/vars
M       t/op/blocks.t
M       t/re/re_tests
M       t/re/reg_mesg.t
M       t/run/fresh_perl.t
M       toke.c

commit 9eb0cdcaf54df2c25b96a6e1d93d7bdcb3d2a44f
Author: David Mitchell <da...@iabyn.com>
Date:   Fri Jul 22 14:56:17 2011 +0100

    correct comment about how strings are tokenised
    
    The stuff about "foo\lbar" being tokenised as a list which you need to
    apply join() to, was wrong; the tokeniser outputs the necessary concats
    rather than commas.

M       toke.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to