In perl.git, the branch smoke-me/davem/re_overload has been created

<http://perl5.git.perl.org/perl.git/commitdiff/41ff9613f5863fc0a8a79b3b8de25901e19cd0d8?hp=0000000000000000000000000000000000000000>

        at  41ff9613f5863fc0a8a79b3b8de25901e19cd0d8 (commit)

- Log -----------------------------------------------------------------
commit 41ff9613f5863fc0a8a79b3b8de25901e19cd0d8
Author: David Mitchell <[email protected]>
Date:   Wed Apr 10 16:10:28 2013 +0100

    fix runtime /(?{})/ with overload::constant qr
    
    There are two issues fixed here.
    
    First, when a pattern has a run-time code-block included, such as
    
        $code = '(?{...})'
        /foo$code/
    
    the mechanism used to parse those run-time blocks: of feeding the
    resultant pattern into a call to eval_sv() with the string
    
        qr'foo(?{...})'
    
    and then extracting out any resulting opcode trees from the returned
    qr object -- suffered from the re-parsed qr'..' also being subject to
    overload:constant qr processing, which could result in Bad Things
    happening.
    
    Since we now have the PL_parser->lex_re_reparsing flag in scope throughout
    the parsing of the pattern, this is easy to detect and avoid.
    
    The second issue is a mechanism to avoid recursion when getting false
    positives in S_has_runtime_code() for code like '[(?{})]'.
    For patterns like this, we would suspect that the pattern may have code
    (even though it doesn't), so feed it into qr'...' and reparse, and
    again it looks like runtime code, so feed it in, rinse and repeat.
    The thing to stop recursion was when we saw a qr with a single OP_CONST
    string, we assumed it couldn't have any run-time component, and thus no
    run-time code blocks.
    
    However, this broke qr/foo/ in the presence of overload::constant qr
    overloading, which could convert foo into a string containing code blocks.
    
    The fix for this is to change the recursion-avoidance mechanism (in a way
    which also turns out to be simpler too). Basically, when we fake up a
    qr'...' and eval it, we turn off any 'use re eval' in scope: its not
    needed, since we know the .... will be a constant string without any
    overloading. Then we use the lack of 'use re eval' in scope to
    skip calling S_has_runtime_code() and just assume that the code has no
    run-time patterns (if it has, then eventually the regex parser will
    rightly complain about 'Eval-group not allowed at runtime').
    
    This commit also adds some fairly comprehensive tests for this.

M       pp_ctl.c
M       regcomp.c
M       t/re/overload.t
M       toke.c

commit 9adf723d69086552a6cfa370781c3d3996a73232
Author: David Mitchell <[email protected]>
Date:   Tue Apr 9 17:17:16 2013 +0100

    add lex_re_reparsing boolean to yy_parser struct
    
    When re-parsing a pattern for run-time (?{}) code blocks,
    we end up with the EVAL_RE_REPARSING flag set in PL_in_eval.
    Currently we clear this flag as soon as scan_str() returns, to ensure that
    it's not set if we happen to parse further patterns (e.g. within the
    (?{ ... }) code itself.
    
    However, a soon-to-be-applied bugfix requires us to know the reparsing
    state beyond this point. To solve this, we add a new boolean flag to the
    parser struct, which is set from PL_in_eval in S_sublex_push() (with the
    old value being saved). This allows us to have the flag around for the
    entire pattern string parsing phase, without it affecting nested pattern
    compilation.

M       parser.h
M       regcomp.c
M       regexec.c
M       toke.c

commit dd34663ab8077b8a693310459f91f7e618af5a1c
Author: David Mitchell <[email protected]>
Date:   Thu Apr 4 17:50:22 2013 +0100

    Eliminate PL_reg_state.re_reparsing, part 2
    
    The previous commit added an alternative flag mechanism to
    PL_reg_state.re_reparsing, but kept the old one around for consistency
    checking. Remove the old one now.

M       perl.c
M       regcomp.c
M       regexec.c
M       regexp.h
M       toke.c

commit 09cdc3bdd354a8cb38584d5270a04b33005f5034
Author: David Mitchell <[email protected]>
Date:   Thu Apr 4 17:29:53 2013 +0100

    Eliminate PL_reg_state.re_reparsing, part 1
    
    PL_reg_state.re_reparsing is a hacky flag used to allow runtime
    code blocks to be included in patterns. Basically, since code blocks
    are now handled by the perl parser within literal patterns, runtime
    patterns are handled by taking the (assembled at runtime) pattern,
    and feeding it back through the parser via the equivalent of
        eval q{qr'the_pattern'},
    so that run-time (?{..})'s appear to be literal code blocks.
    When this happens, the global flag PL_reg_state.re_reparsing is set,
    which modifies lexing and parsing in minor ways (such as whether \\ is
    stripped).
    
    Now, I'm in the slow process of trying to eliminate global regex state
    (i.e. gradually removing the fields of PL_reg_state), and also a change
    which will be coming a few commits ahead requires the info which this flag
    indicates to linger for longer (currently it is cleared immediately after
    the call to scan_str().
    
    For those two reasons, this commit adds a new mechanism to indicate this:
    a new flag to eval_sv(), G_RE_REPARSING (which sets OPpEVAL_RE_REPARSING
    in the entereval op), which sets the EVAL_RE_REPARSING bit in PL_in_eval.
    
    Its still a yukky global flag hack, but its a *different* global flag hack
    now.
    
    For this commit, we add the new flag(s) but keep the old
    PL_reg_state.re_reparsing flag and assert that the two mechanisms always
    match. The next commit will remove re_reparsing.

M       cop.h
M       op.h
M       perl.c
M       pp_ctl.c
M       regcomp.c
M       regexec.c
M       toke.c

commit 0321ee9bb45621136b63057b7a1a4a11987e3af8
Author: David Mitchell <[email protected]>
Date:   Thu Mar 28 15:29:14 2013 +0000

    re_op_compile(): reapply debugging statements
    
    These were temporarily removed a few commits ago to make rebasing easier.
    
    (And since the code's been simplified in the conflicting branch, not so
    many debug statements had to be added back as were in the original).

M       regcomp.c

commit 9e4bb10cc3e53e598ed0e47e45e63fc45cda14d3
Author: David Mitchell <[email protected]>
Date:   Thu Mar 28 14:11:16 2013 +0000

    Handle overloading properly in compile-time regex
    
    [perl #116823]
    
    In re_op_compile(), there were two different code paths for compile-time
    patterns (/foo(?{1})bar/) and runtime (/$foo(?{1})bar/).
    
    The code in question is where the various components of the pattern
    are concatenated into a single string, for example, 'foo', '(?{1})' and
    'bar' in the first pattern.
    
    In the run-time branch, the code assumes that each component (e.g. the
    value of $foo) can be absolutely anything, and full magic and overload
    handling is applied as each component is retrieved and appended to the
    pattern string.
    
    The compile-time branch on the other hand, was a lot simpler because it
    "knew" that each component is just a simple constant SV attached to an
    OP_CONST op. This turned out to be an incorrect assumption, due to
    overload::constant qr overloading; here, a simple constant part of a
    compile-time pattern, such as 'foo', can be converted into whatever the
    overload function returns; in particular, an object blessed into an
    overloaded class. So the "simple" SVs that get attached to OP_CONST ops
    can in fact be complex and need full magic, overloading etc applied to
    them.
    
    The quickest solution to this turned out to be, for the compile-time case,
    extract out the SV from each OP_CONST and assemble them into a temporary
    SV** array; then from then onwards, treat it the same as the run-time case
    (which expects an array of SVs).

M       regcomp.c
M       t/re/overload.t

commit 12846d915b020eefbf762be935a95e8645f9e9ff
Author: David Mitchell <[email protected]>
Date:   Thu Mar 28 13:08:42 2013 +0000

    re-indent after last change
    
    (only whitespace changes)

M       regcomp.c

commit 69adccb3258add8a7ab371130587f8ceaf67ed05
Author: David Mitchell <[email protected]>
Date:   Thu Mar 28 12:07:18 2013 +0000

    re_op_compile(): unify 1-op and N-op branches
    
    When assembling a compile-time pattern from a list of OP_CONSTs (and
    possibly embedded code-blocks), there were separate code paths for a
    single arg (a lone OP_CONST) and a list of OP_CONST / DO's.
    Unify the branches into single loop.
    
    This will make a subsequent commit easier, where we will need to do more
    processing of each "constant".
    
    Re-indenting has been left to the commit that follows this.

M       regcomp.c

commit 359fffd394015bdeb3cd096bbc7ba8f0dc5a0826
Author: David Mitchell <[email protected]>
Date:   Mon Mar 25 17:23:12 2013 +0000

    re_op_compile(): simplify a code snippet
    
    and eliminate one local var.

M       regcomp.c

commit 797979c280287f0906bf34d50dbf96e77ba6fc15
Author: David Mitchell <[email protected]>
Date:   Mon Mar 25 17:19:23 2013 +0000

    re-indent code after previous commit
    
    (whitespace changes only)

M       regcomp.c

commit 031b9d3311188a63d57ad4d8343a02c6575b62be
Author: David Mitchell <[email protected]>
Date:   Mon Mar 25 17:06:47 2013 +0000

    regex and overload: unifiy 1 and N arg branches
    
    When compiling a regex, something like /a$b/ that parses two two args,
    was treated in a different code path than /$a/ say, which is only one arg.
    
    In particular the 1-arg code path, where it handled "" overloading, didn't
    check for a loop (where the ""-sub returns the overloaded object itself) -
    the N-arg branch did handle that. By unififying the branches, we get that
    fix for free, and ensure that any future fixes don't have to be applied to
    two separate branches.
    
    Re-indented has been left to the commit that follows this.

M       regcomp.c
M       t/re/overload.t

commit 2a33b001925d4764e6c3ec091544602ecc79894a
Author: David Mitchell <[email protected]>
Date:   Thu Mar 28 15:08:27 2013 +0000

    re_op_compile(): temp remove some debugging code
    
    These four DEBUG_PARSE_r()'s were recently added to a block I code
    which I have just been extensively reworking in a separate branch.
    Temporarily remove these statements to allow my branch to be rebased;
    I'll re-add them (or similar) afterwards.

M       regcomp.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to