In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/e501306eca0fea1cc9fc53e2eb024ad37e85ce59?hp=335e2ee52f38eaea7888c33d9c4f0d703130625e>

- Log -----------------------------------------------------------------
commit e501306eca0fea1cc9fc53e2eb024ad37e85ce59
Merge: 335e2ee 4f3e251
Author: David Mitchell <[email protected]>
Date:   Fri Apr 12 11:30:25 2013 +0100

    [MERGE] handle /(?{})/ with overload::constant qr
    
    The reworking of the re_eval implementation for 5.17.1 made the assumption
    that constant strings within literal patterns were, um, constant.
    
    It turns out this this is an invalid assumption, because
    
        overload::constant qr => { sub return bless [], 'Foo' }
    
    can cause the constant bits of a pattern, like foo, bar in
    
        /foo(?{...})bar/
    
    to get replaced with (for example) blessed objects: so the 'constant' SV
    attached to an OP_CONST is actually a blessed object, that could itself be
    overloaded with string or concat methods say, or could be a qr// object
    etc.
    
    The commits in this merge (hopefully) fix the various problems this
    assumption caused: chiefly with qr// objects containing compiled (?{})
    code that were getting re-stringified and thus failing unless in the
    presence of use re 'eval' (and sometimes failing even in its presence).
    Also, runtime patterns could trigger a recursive call to the overload
    method, and eventually stack overflow and SEGV.
    
    See [perl #116823].

commit 4f3e2518850e12605980071a25c189c30710bcfd
Author: David Mitchell <[email protected]>
Date:   Wed Apr 10 16:10:28 2013 +0100

    fix runtime /(?{})/ with overload::constant qr
    
    There are two issues fixed here.
    
    First, when a pattern has a run-time code-block included, such as
    
        $code = '(?{...})'
        /foo$code/
    
    the mechanism used to parse those run-time blocks: of feeding the
    resultant pattern into a call to eval_sv() with the string
    
        qr'foo(?{...})'
    
    and then extracting out any resulting opcode trees from the returned
    qr object -- suffered from the re-parsed qr'..' also being subject to
    overload:constant qr processing, which could result in Bad Things
    happening.
    
    Since we now have the PL_parser->lex_re_reparsing flag in scope throughout
    the parsing of the pattern, this is easy to detect and avoid.
    
    The second issue is a mechanism to avoid recursion when getting false
    positives in S_has_runtime_code() for code like '[(?{})]'.
    For patterns like this, we would suspect that the pattern may have code
    (even though it doesn't), so feed it into qr'...' and reparse, and
    again it looks like runtime code, so feed it in, rinse and repeat.
    The thing to stop recursion was when we saw a qr with a single OP_CONST
    string, we assumed it couldn't have any run-time component, and thus no
    run-time code blocks.
    
    However, this broke qr/foo/ in the presence of overload::constant qr
    overloading, which could convert foo into a string containing code blocks.
    
    The fix for this is to change the recursion-avoidance mechanism (in a way
    which also turns out to be simpler too). Basically, when we fake up a
    qr'...' and eval it, we turn off any 'use re eval' in scope: its not
    needed, since we know the .... will be a constant string without any
    overloading. Then we use the lack of 'use re eval' in scope to
    skip calling S_has_runtime_code() and just assume that the code has no
    run-time patterns (if it has, then eventually the regex parser will
    rightly complain about 'Eval-group not allowed at runtime').
    
    This commit also adds some fairly comprehensive tests for this.

M       pp_ctl.c
M       regcomp.c
M       t/re/overload.t
M       toke.c

commit 3a54fd60e3d777bf86f4eec331b79a61c23d8393
Author: David Mitchell <[email protected]>
Date:   Tue Apr 9 17:17:16 2013 +0100

    add lex_re_reparsing boolean to yy_parser struct
    
    When re-parsing a pattern for run-time (?{}) code blocks,
    we end up with the EVAL_RE_REPARSING flag set in PL_in_eval.
    Currently we clear this flag as soon as scan_str() returns, to ensure that
    it's not set if we happen to parse further patterns (e.g. within the
    (?{ ... }) code itself.
    
    However, a soon-to-be-applied bugfix requires us to know the reparsing
    state beyond this point. To solve this, we add a new boolean flag to the
    parser struct, which is set from PL_in_eval in S_sublex_push() (with the
    old value being saved). This allows us to have the flag around for the
    entire pattern string parsing phase, without it affecting nested pattern
    compilation.

M       parser.h
M       regcomp.c
M       regexec.c
M       toke.c

commit c1789b9f89e17b99d728910cb490561f334c2033
Author: David Mitchell <[email protected]>
Date:   Thu Apr 4 17:50:22 2013 +0100

    Eliminate PL_reg_state.re_reparsing, part 2
    
    The previous commit added an alternative flag mechanism to
    PL_reg_state.re_reparsing, but kept the old one around for consistency
    checking. Remove the old one now.

M       perl.c
M       regcomp.c
M       regexec.c
M       regexp.h
M       toke.c

commit a1941760704242726e754cde1820f738676ca838
Author: David Mitchell <[email protected]>
Date:   Thu Apr 4 17:29:53 2013 +0100

    Eliminate PL_reg_state.re_reparsing, part 1
    
    PL_reg_state.re_reparsing is a hacky flag used to allow runtime
    code blocks to be included in patterns. Basically, since code blocks
    are now handled by the perl parser within literal patterns, runtime
    patterns are handled by taking the (assembled at runtime) pattern,
    and feeding it back through the parser via the equivalent of
        eval q{qr'the_pattern'},
    so that run-time (?{..})'s appear to be literal code blocks.
    When this happens, the global flag PL_reg_state.re_reparsing is set,
    which modifies lexing and parsing in minor ways (such as whether \\ is
    stripped).
    
    Now, I'm in the slow process of trying to eliminate global regex state
    (i.e. gradually removing the fields of PL_reg_state), and also a change
    which will be coming a few commits ahead requires the info which this flag
    indicates to linger for longer (currently it is cleared immediately after
    the call to scan_str().
    
    For those two reasons, this commit adds a new mechanism to indicate this:
    a new flag to eval_sv(), G_RE_REPARSING (which sets OPpEVAL_RE_REPARSING
    in the entereval op), which sets the EVAL_RE_REPARSING bit in PL_in_eval.
    
    Its still a yukky global flag hack, but its a *different* global flag hack
    now.
    
    For this commit, we add the new flag(s) but keep the old
    PL_reg_state.re_reparsing flag and assert that the two mechanisms always
    match. The next commit will remove re_reparsing.

M       cop.h
M       op.h
M       perl.c
M       pp_ctl.c
M       regcomp.c
M       regexec.c
M       toke.c

commit efd541675261f15c09a9fee926b7d145a59daaa0
Author: David Mitchell <[email protected]>
Date:   Thu Mar 28 15:29:14 2013 +0000

    re_op_compile(): reapply debugging statements
    
    These were temporarily removed a few commits ago to make rebasing easier.
    
    (And since the code's been simplified in the conflicting branch, not so
    many debug statements had to be added back as were in the original).

M       regcomp.c

commit 55269f4f7ec374aedc4b04fe74db5d9f3a2886d6
Author: David Mitchell <[email protected]>
Date:   Thu Mar 28 14:11:16 2013 +0000

    Handle overloading properly in compile-time regex
    
    [perl #116823]
    
    In re_op_compile(), there were two different code paths for compile-time
    patterns (/foo(?{1})bar/) and runtime (/$foo(?{1})bar/).
    
    The code in question is where the various components of the pattern
    are concatenated into a single string, for example, 'foo', '(?{1})' and
    'bar' in the first pattern.
    
    In the run-time branch, the code assumes that each component (e.g. the
    value of $foo) can be absolutely anything, and full magic and overload
    handling is applied as each component is retrieved and appended to the
    pattern string.
    
    The compile-time branch on the other hand, was a lot simpler because it
    "knew" that each component is just a simple constant SV attached to an
    OP_CONST op. This turned out to be an incorrect assumption, due to
    overload::constant qr overloading; here, a simple constant part of a
    compile-time pattern, such as 'foo', can be converted into whatever the
    overload function returns; in particular, an object blessed into an
    overloaded class. So the "simple" SVs that get attached to OP_CONST ops
    can in fact be complex and need full magic, overloading etc applied to
    them.
    
    The quickest solution to this turned out to be, for the compile-time case,
    extract out the SV from each OP_CONST and assemble them into a temporary
    SV** array; then from then onwards, treat it the same as the run-time case
    (which expects an array of SVs).

M       regcomp.c
M       t/re/overload.t

commit faa0d3c9abc8a0214e2478797030c3300a58989e
Author: David Mitchell <[email protected]>
Date:   Thu Mar 28 13:08:42 2013 +0000

    re-indent after last change
    
    (only whitespace changes)

M       regcomp.c

commit 92126d70299363f23cbdcf5332cc2d9f8f939aa4
Author: David Mitchell <[email protected]>
Date:   Thu Mar 28 12:07:18 2013 +0000

    re_op_compile(): unify 1-op and N-op branches
    
    When assembling a compile-time pattern from a list of OP_CONSTs (and
    possibly embedded code-blocks), there were separate code paths for a
    single arg (a lone OP_CONST) and a list of OP_CONST / DO's.
    Unify the branches into single loop.
    
    This will make a subsequent commit easier, where we will need to do more
    processing of each "constant".
    
    Re-indenting has been left to the commit that follows this.

M       regcomp.c

commit 38e86c9ce7750d3d1e0e979504cfe16c0ad95fb3
Author: David Mitchell <[email protected]>
Date:   Mon Mar 25 17:23:12 2013 +0000

    re_op_compile(): simplify a code snippet
    
    and eliminate one local var.

M       regcomp.c

commit 35aef8bff8da032406e5b434c0b877c506564023
Author: David Mitchell <[email protected]>
Date:   Mon Mar 25 17:19:23 2013 +0000

    re-indent code after previous commit
    
    (whitespace changes only)

M       regcomp.c

commit 16cc92aeb437b42b3f141f69d8d60dae5309ec0f
Author: David Mitchell <[email protected]>
Date:   Mon Mar 25 17:06:47 2013 +0000

    regex and overload: unifiy 1 and N arg branches
    
    When compiling a regex, something like /a$b/ that parses two two args,
    was treated in a different code path than /$a/ say, which is only one arg.
    
    In particular the 1-arg code path, where it handled "" overloading, didn't
    check for a loop (where the ""-sub returns the overloaded object itself) -
    the N-arg branch did handle that. By unififying the branches, we get that
    fix for free, and ensure that any future fixes don't have to be applied to
    two separate branches.
    
    Re-indented has been left to the commit that follows this.

M       regcomp.c
M       t/re/overload.t

commit 5dd442fcfdfbfbdd8bc0fdb52cd98474826ec4fe
Author: David Mitchell <[email protected]>
Date:   Thu Mar 28 15:08:27 2013 +0000

    re_op_compile(): temp remove some debugging code
    
    These four DEBUG_PARSE_r()'s were recently added to a block I code
    which I have just been extensively reworking in a separate branch.
    Temporarily remove these statements to allow my branch to be rebased;
    I'll re-add them (or similar) afterwards.

M       regcomp.c
-----------------------------------------------------------------------

Summary of changes:
 cop.h           |    2 +
 op.h            |    1 +
 parser.h        |    2 +-
 perl.c          |    5 +-
 pp_ctl.c        |   13 ++-
 regcomp.c       |  398 +++++++++++++++++++++++++------------------------------
 regexec.c       |    2 -
 regexp.h        |    1 -
 t/re/overload.t |  145 ++++++++++++++++++++
 toke.c          |   21 ++-
 10 files changed, 359 insertions(+), 231 deletions(-)

diff --git a/cop.h b/cop.h
index 086cd22..b20eddb 100644
--- a/cop.h
+++ b/cop.h
@@ -1048,6 +1048,7 @@ L<perlcall>.
                                   Perl_magic_methcall().  */
 #define G_WRITING_TO_STDERR 1024 /* Perl_write_to_stderr() is calling
                                    Perl_magic_methcall().  */
+#define G_RE_REPARSING 0x800     /* compiling a run-time /(?{..})/ */
 
 /* flag bits for PL_in_eval */
 #define EVAL_NULL      0       /* not in an eval */
@@ -1055,6 +1056,7 @@ L<perlcall>.
 #define EVAL_WARNONLY  2       /* used by yywarn() when calling yyerror() */
 #define EVAL_KEEPERR   4       /* set by Perl_call_sv if G_KEEPERR */
 #define EVAL_INREQUIRE 8       /* The code is being required. */
+#define EVAL_RE_REPARSING 0x10 /* eval_sv() called with G_RE_REPARSING */
 
 /* Support for switching (stack and block) contexts.
  * This ensures magic doesn't invalidate local stack and cx pointers.
diff --git a/op.h b/op.h
index 8b87a9c..7c5030d 100644
--- a/op.h
+++ b/op.h
@@ -308,6 +308,7 @@ Deprecated.  Use C<GIMME_V> instead.
 #define OPpEVAL_UNICODE                4
 #define OPpEVAL_BYTES          8
 #define OPpEVAL_COPHH          16      /* Construct %^H from cop hints */
+#define OPpEVAL_RE_REPARSING   32      /* eval_sv(..., G_RE_REPARSING) */
     
 /* Private for OP_CALLER, OP_WANTARRAY and OP_RUNCV */
 #define OPpOFFBYONE            128     /* Treat caller(1) as caller(2) */
diff --git a/parser.h b/parser.h
index 05735bf..e7b887e 100644
--- a/parser.h
+++ b/parser.h
@@ -71,7 +71,7 @@ typedef struct yy_parser {
     char       multi_open;     /* delimiter of said string */
     char       multi_close;    /* delimiter of said string */
     bool       preambled;
-    /*** 8-bit hole ***/
+    bool        lex_re_reparsing; /* we're doing G_RE_REPARSING */
     I32                lex_allbrackets;/* (), [], {}, ?: bracket count */
     SUBLEXINFO sublex_info;
     LEXSHARED  *lex_shared;
diff --git a/perl.c b/perl.c
index 87d98dc..a39d66f 100644
--- a/perl.c
+++ b/perl.c
@@ -2808,8 +2808,9 @@ Perl_eval_sv(pTHX_ SV *sv, I32 flags)
     myop.op_flags |= OP_GIMME_REVERSE(flags);
     if (flags & G_KEEPERR)
        myop.op_flags |= OPf_SPECIAL;
-    if (PL_reg_state.re_reparsing)
-       myop.op_private = OPpEVAL_COPHH;
+
+    if (flags & G_RE_REPARSING)
+       myop.op_private = (OPpEVAL_COPHH | OPpEVAL_RE_REPARSING);
 
     /* fail now; otherwise we could fail after the JMPENV_PUSH but
      * before a PUSHEVAL, which corrupts the stack after a croak */
diff --git a/pp_ctl.c b/pp_ctl.c
index f518bc2..bdbd75a 100644
--- a/pp_ctl.c
+++ b/pp_ctl.c
@@ -3358,7 +3358,9 @@ S_doeval(pTHX_ int gimme, CV* outside, U32 seq, HV *hh)
 
     PL_in_eval = (in_require
                  ? (EVAL_INREQUIRE | (PL_in_eval & EVAL_INEVAL))
-                 : EVAL_INEVAL);
+                 : (EVAL_INEVAL |
+                        ((PL_op->op_private & OPpEVAL_RE_REPARSING)
+                            ? EVAL_RE_REPARSING : 0)));
 
     PUSHMARK(SP);
 
@@ -3420,6 +3422,15 @@ S_doeval(pTHX_ int gimme, CV* outside, U32 seq, HV *hh)
     else {
        PL_hints = saveop->op_private & OPpEVAL_COPHH
                     ? oldcurcop->cop_hints : saveop->op_targ;
+
+        /* making 'use re eval' not be in scope when compiling the
+         * qr/mabye_has_runtime_code_block/ ensures that we don't get
+         * infinite recursion when S_has_runtime_code() gives a false
+         * positive: the second time round, HINT_RE_EVAL isn't set so we
+         * don't bother calling S_has_runtime_code() */
+        if (PL_in_eval & EVAL_RE_REPARSING)
+            PL_hints &= ~HINT_RE_EVAL;
+
        if (hh) {
            /* SAVEHINTS created a new HV in PL_hintgv, which we need to GC */
            SvREFCNT_dec(GvHV(PL_hintgv));
diff --git a/regcomp.c b/regcomp.c
index 34a4e9f..ee843e3 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -4877,19 +4877,12 @@ Perl_re_compile(pTHX_ SV * const pattern, U32 rx_flags)
  * False positives are allowed */
 
 static bool
-S_has_runtime_code(pTHX_ RExC_state_t * const pRExC_state, OP *expr,
-                   U32 pm_flags, char *pat, STRLEN plen)
+S_has_runtime_code(pTHX_ RExC_state_t * const pRExC_state,
+                   char *pat, STRLEN plen)
 {
     int n = 0;
     STRLEN s;
 
-    /* avoid infinitely recursing when we recompile the pattern parcelled up
-     * as qr'...'. A single constant qr// string can't have have any
-     * run-time component in it, and thus, no runtime code. (A non-qr
-     * string, however, can, e.g. $x =~ '(?{})') */
-    if  ((pm_flags & PMf_IS_QR) && expr && expr->op_type == OP_CONST)
-       return 0;
-
     for (s = 0; s < plen; s++) {
        if (n < pRExC_state->num_code_blocks
            && s == pRExC_state->code_blocks[n].start)
@@ -5003,11 +4996,10 @@ S_compile_runtime_code(pTHX_ RExC_state_t * const 
pRExC_state,
        SAVETMPS;
        save_re_context();
        PUSHSTACKi(PERLSI_REQUIRE);
-       /* this causes the toker to collapse \\ into \ when parsing
-        * qr''; normally only q'' does this. It also alters hints
-        * handling */
-       PL_reg_state.re_reparsing = TRUE;
-       eval_sv(sv, G_SCALAR);
+        /* G_RE_REPARSING causes the toker to collapse \\ into \ when
+         * parsing qr''; normally only q'' does this. It also alters
+         * hints handling */
+       eval_sv(sv, G_SCALAR|G_RE_REPARSING);
        SvREFCNT_dec_NN(sv);
        SPAGAIN;
        qr_ref = POPs;
@@ -5212,8 +5204,9 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
     I32 flags;
     I32 minlen = 0;
     U32 rx_flags;
-    SV *pat;
+    SV *pat = NULL;
     SV *code_blocksv = NULL;
+    SV** new_patternp = patternp;
 
     /* these are all flags - maybe they should be turned
      * into a single int with different bit masks */
@@ -5221,7 +5214,6 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
     I32 sawplus = 0;
     I32 sawopen = 0;
     regex_charset initial_charset = get_regex_charset(orig_rx_flags);
-    bool code_is_utf8 = 0;
     bool recompile = 0;
     bool runtime_code = 0;
     scan_data_t data;
@@ -5308,40 +5300,68 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
 
     if (expr && (expr->op_type == OP_LIST ||
                (expr->op_type == OP_NULL && expr->op_targ == OP_LIST))) {
-
-       /* is the source UTF8, and how many code blocks are there? */
+       /* allocate code_blocks if needed */
        OP *o;
        int ncode = 0;
 
-       for (o = cLISTOPx(expr)->op_first; o; o = o->op_sibling) {
-           if (o->op_type == OP_CONST) {
-                /* skip if we have SVs as well as OPs. In this case,
-                 * a) we decide utf8 based on SVs not OPs;
-                 * b) the current pad may not match that which the ops
-                 *    were compiled in, so, so on threaded builds,
-                 *    cSVOPo_sv would look in the wrong pad */
-                if (!pat_count && SvUTF8(cSVOPo_sv))
-                    code_is_utf8 = 1;
-            }
-           else if (o->op_type == OP_NULL && (o->op_flags & OPf_SPECIAL))
-               /* count of DO blocks */
-               ncode++;
-       }
+       for (o = cLISTOPx(expr)->op_first; o; o = o->op_sibling)
+           if (o->op_type == OP_NULL && (o->op_flags & OPf_SPECIAL))
+               ncode++; /* count of DO blocks */
        if (ncode) {
            pRExC_state->num_code_blocks = ncode;
            Newx(pRExC_state->code_blocks, ncode, struct reg_code_block);
        }
     }
 
-    if (pat_count) {
-       /* handle a list of SVs */
+    if (!pat_count) {
+        /* compile-time pattern with just OP_CONSTs and DO blocks */
+
+        int n;
+        OP *o;
+
+        /* find how many CONSTs there are */
+        assert(expr);
+        n = 0;
+        if (expr->op_type == OP_CONST)
+            n = 1;
+        else
+            for (o = cLISTOPx(expr)->op_first; o; o = o->op_sibling) {
+                if (o->op_type == OP_CONST)
+                    n++;
+            }
+
+        /* fake up an SV array */
+
+        assert(!new_patternp);
+        Newx(new_patternp, n, SV*);
+        SAVEFREEPV(new_patternp);
+        pat_count = n;
+
+        n = 0;
+        if (expr->op_type == OP_CONST)
+            new_patternp[n] = cSVOPx_sv(expr);
+        else
+            for (o = cLISTOPx(expr)->op_first; o; o = o->op_sibling) {
+                if (o->op_type == OP_CONST)
+                    new_patternp[n++] = cSVOPo_sv;
+            }
+
+    }
+
+    {
+       /* concat args, handling magic, overloading etc */
 
        SV **svp;
+        OP *o = NULL;
+        int n = 0;
+        STRLEN orig_patlen = 0;
 
         DEBUG_PARSE_r(PerlIO_printf(Perl_debug_log,
-            "Compiling List of SVs %d elements%s\n",pat_count, orig_rx_flags & 
RXf_SPLIT ? " for split" : ""));
+            "Assembling pattern from %d elements%s\n", pat_count,
+                orig_rx_flags & RXf_SPLIT ? " for split" : ""));
+
        /* apply magic and RE overloading to each arg */
-       for (svp = patternp; svp < patternp + pat_count; svp++) {
+       for (svp = new_patternp; svp < new_patternp + pat_count; svp++) {
            SV *rx = *svp;
            SvGETMAGIC(rx);
            if (SvROK(rx) && SvAMAGIC(rx)) {
@@ -5356,21 +5376,19 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
            }
        }
 
-       if (pat_count > 1) {
-           /* concat multiple args and find any code block indexes */
-
-           OP *o = NULL;
-           int n = 0;
-           bool utf8 = 0;
-            STRLEN orig_patlen = 0;
-
-           if (pRExC_state->num_code_blocks) {
-               o = cLISTOPx(expr)->op_first;
-               assert(   o->op_type == OP_PUSHMARK
+        if (pRExC_state->num_code_blocks) {
+            if (expr->op_type == OP_CONST)
+                o = expr;
+            else {
+                o = cLISTOPx(expr)->op_first;
+                assert(   o->op_type == OP_PUSHMARK
                        || (o->op_type == OP_NULL && o->op_targ == OP_PUSHMARK)
                        || o->op_type == OP_PADRANGE);
-               o = o->op_sibling;
-           }
+                o = o->op_sibling;
+            }
+        }
+
+        if (pat_count > 1) {
 
            pat = newSVpvn("", 0);
            SAVEFREESV(pat);
@@ -5381,124 +5399,120 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
             * overloading but not concat overloading; but the main effect
             * in this obscure case is to need a 'use re eval' for a
             * literal code block */
-           for (svp = patternp; svp < patternp + pat_count; svp++) {
+           for (svp = new_patternp; svp < new_patternp + pat_count; svp++) {
                if (SvUTF8(*svp))
-                   utf8 = 1;
+                    SvUTF8_on(pat);
            }
-           if (utf8)
-               SvUTF8_on(pat);
-
-           for (svp = patternp; svp < patternp + pat_count; svp++) {
-               SV *sv, *msv = *svp;
-               SV *rx;
-               bool code = 0;
-                /* we make the assumption here that each op in the list of
-                 * op_siblings maps to one SV pushed onto the stack,
-                 * except for code blocks, with have both an OP_NULL and
-                 * and OP_CONST.
-                 * This allows us to match up the list of SVs against the
-                 * list of OPs to find the next code block.
-                 *
-                 * Note that       PUSHMARK PADSV PADSV ..
-                 * is optimised to
-                 *                 PADRANGE NULL  NULL  ..
-                 * so the alignment still works. */
-               if (o) {
-                   if (o->op_type == OP_NULL && (o->op_flags & OPf_SPECIAL)) {
-                       assert(n < pRExC_state->num_code_blocks);
-                       pRExC_state->code_blocks[n].start = SvCUR(pat);
-                       pRExC_state->code_blocks[n].block = o;
-                       pRExC_state->code_blocks[n].src_regex = NULL;
-                       n++;
-                       code = 1;
-                       o = o->op_sibling; /* skip CONST */
-                       assert(o);
-                   }
-                   o = o->op_sibling;;
-               }
+        }
 
-               if ((SvAMAGIC(pat) || SvAMAGIC(msv)) &&
-                       (sv = amagic_call(pat, msv, concat_amg, AMGf_assign)))
-               {
-                   sv_setsv(pat, sv);
-                   /* overloading involved: all bets are off over literal
-                    * code. Pretend we haven't seen it */
-                   pRExC_state->num_code_blocks -= n;
-                   n = 0;
-                    rx = NULL;
+        /* process args, concat them if there are multiple ones,
+         * and find any code block indexes */
+
+
+        for (svp = new_patternp; svp < new_patternp + pat_count; svp++) {
+            SV *sv, *msv = *svp;
+            SV *rx  = NULL;
+            bool code = 0;
+            /* we make the assumption here that each op in the list of
+             * op_siblings maps to one SV pushed onto the stack,
+             * except for code blocks, with have both an OP_NULL and
+             * and OP_CONST.
+             * This allows us to match up the list of SVs against the
+             * list of OPs to find the next code block.
+             *
+             * Note that       PUSHMARK PADSV PADSV ..
+             * is optimised to
+             *                 PADRANGE NULL  NULL  ..
+             * so the alignment still works. */
+            if (o) {
+                if (o->op_type == OP_NULL && (o->op_flags & OPf_SPECIAL)) {
+                    assert(n < pRExC_state->num_code_blocks);
+                    pRExC_state->code_blocks[n].start = pat ? SvCUR(pat) : 0;
+                    pRExC_state->code_blocks[n].block = o;
+                    pRExC_state->code_blocks[n].src_regex = NULL;
+                    n++;
+                    code = 1;
+                    o = o->op_sibling; /* skip CONST */
+                    assert(o);
+                }
+                o = o->op_sibling;;
+            }
 
-               }
-               else  {
-                    while (SvAMAGIC(msv)
-                            && (sv = AMG_CALLunary(msv, string_amg))
-                            && sv != msv
-                            &&  !(   SvROK(msv)
-                                  && SvROK(sv)
-                                  && SvRV(msv) == SvRV(sv))
-                    ) {
-                        msv = sv;
-                        SvGETMAGIC(msv);
-                    }
-                    if (SvROK(msv) && SvTYPE(SvRV(msv)) == SVt_REGEXP)
-                        msv = SvRV(msv);
+            /* try concatenation overload ... */
+            if (pat && (SvAMAGIC(pat) || SvAMAGIC(msv)) &&
+                    (sv = amagic_call(pat, msv, concat_amg, AMGf_assign)))
+            {
+                sv_setsv(pat, sv);
+                /* overloading involved: all bets are off over literal
+                 * code. Pretend we haven't seen it */
+                pRExC_state->num_code_blocks -= n;
+                n = 0;
+            }
+            else  {
+                /* ... or failing that, try "" overload */
+                while (SvAMAGIC(msv)
+                        && (sv = AMG_CALLunary(msv, string_amg))
+                        && sv != msv
+                        &&  !(   SvROK(msv)
+                              && SvROK(sv)
+                              && SvRV(msv) == SvRV(sv))
+                ) {
+                    msv = sv;
+                    SvGETMAGIC(msv);
+                }
+                if (SvROK(msv) && SvTYPE(SvRV(msv)) == SVt_REGEXP)
+                    msv = SvRV(msv);
+                if (pat) {
                     orig_patlen = SvCUR(pat);
                     sv_catsv_nomg(pat, msv);
                     rx = msv;
-                    if (code)
-                        pRExC_state->code_blocks[n-1].end = SvCUR(pat)-1;
                 }
+                else
+                    pat = msv;
+                if (code)
+                    pRExC_state->code_blocks[n-1].end = SvCUR(pat)-1;
+            }
 
-               /* extract any code blocks within any embedded qr//'s */
-               if (rx && SvTYPE(rx) == SVt_REGEXP
-                   && RX_ENGINE((REGEXP*)rx)->op_comp)
-               {
-
-                   RXi_GET_DECL(ReANY((REGEXP *)rx), ri);
-                   if (ri->num_code_blocks) {
-                       int i;
-                       /* the presence of an embedded qr// with code means
-                        * we should always recompile: the text of the
-                        * qr// may not have changed, but it may be a
-                        * different closure than last time */
-                       recompile = 1;
-                       Renew(pRExC_state->code_blocks,
-                           pRExC_state->num_code_blocks + ri->num_code_blocks,
-                           struct reg_code_block);
-                       pRExC_state->num_code_blocks += ri->num_code_blocks;
-                       for (i=0; i < ri->num_code_blocks; i++) {
-                           struct reg_code_block *src, *dst;
-                           STRLEN offset =  orig_patlen
-                               + ReANY((REGEXP *)rx)->pre_prefix;
-                           assert(n < pRExC_state->num_code_blocks);
-                           src = &ri->code_blocks[i];
-                           dst = &pRExC_state->code_blocks[n];
-                           dst->start      = src->start + offset;
-                           dst->end        = src->end   + offset;
-                           dst->block      = src->block;
-                           dst->src_regex  = (REGEXP*) SvREFCNT_inc( (SV*)
-                                                   src->src_regex
-                                                       ? src->src_regex
-                                                       : (REGEXP*)rx);
-                           n++;
-                       }
-                   }
-               }
-           }
-           SvSETMAGIC(pat);
-       }
-       else {
-            SV *sv;
-           pat = *patternp;
-            while (SvAMAGIC(pat)
-                    && (sv = AMG_CALLunary(pat, string_amg))
-                    && sv != pat)
+            /* extract any code blocks within any embedded qr//'s */
+            if (rx && SvTYPE(rx) == SVt_REGEXP
+                && RX_ENGINE((REGEXP*)rx)->op_comp)
             {
-                pat = sv;
-                SvGETMAGIC(pat);
+
+                RXi_GET_DECL(ReANY((REGEXP *)rx), ri);
+                if (ri->num_code_blocks) {
+                    int i;
+                    /* the presence of an embedded qr// with code means
+                     * we should always recompile: the text of the
+                     * qr// may not have changed, but it may be a
+                     * different closure than last time */
+                    recompile = 1;
+                    Renew(pRExC_state->code_blocks,
+                        pRExC_state->num_code_blocks + ri->num_code_blocks,
+                        struct reg_code_block);
+                    pRExC_state->num_code_blocks += ri->num_code_blocks;
+                    for (i=0; i < ri->num_code_blocks; i++) {
+                        struct reg_code_block *src, *dst;
+                        STRLEN offset =  orig_patlen
+                            + ReANY((REGEXP *)rx)->pre_prefix;
+                        assert(n < pRExC_state->num_code_blocks);
+                        src = &ri->code_blocks[i];
+                        dst = &pRExC_state->code_blocks[n];
+                        dst->start         = src->start + offset;
+                        dst->end           = src->end   + offset;
+                        dst->block         = src->block;
+                        dst->src_regex  = (REGEXP*) SvREFCNT_inc( (SV*)
+                                                src->src_regex
+                                                    ? src->src_regex
+                                                    : (REGEXP*)rx);
+                        n++;
+                    }
+                }
             }
         }
+        if (pat_count > 1)
+            SvSETMAGIC(pat);
 
-       /* handle bare regex: foo =~ $re */
+       /* handle bare (possibly after overloading) regex: foo =~ $re */
        {
            SV *re = pat;
            if (SvROK(re))
@@ -5509,58 +5523,13 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
                SvREFCNT_inc(re);
                Safefree(pRExC_state->code_blocks);
                 DEBUG_PARSE_r(PerlIO_printf(Perl_debug_log,
-                    "Precompiled pattern%s\n", orig_rx_flags & RXf_SPLIT ? " 
for split" : ""));
+                    "Precompiled pattern%s\n",
+                        orig_rx_flags & RXf_SPLIT ? " for split" : ""));
 
                return (REGEXP*)re;
            }
        }
     }
-    else {
-       /* not a list of SVs, so must be a list of OPs */
-       assert(expr);
-       if (expr->op_type == OP_LIST) {
-           int i = -1;
-           bool is_code = 0;
-           OP *o;
-
-            DEBUG_PARSE_r(PerlIO_printf(Perl_debug_log,
-                "Compiling OP_LIST%s\n", orig_rx_flags & RXf_SPLIT ? " for 
split" : ""));
-
-           pat = newSVpvn("", 0);
-           SAVEFREESV(pat);
-           if (code_is_utf8)
-               SvUTF8_on(pat);
-
-           /* given a list of CONSTs and DO blocks in expr, append all
-            * the CONSTs to pat, and record the start and end of each
-            * code block in code_blocks[] (each DO{} op is followed by an
-            * OP_CONST containing the corresponding literal '(?{...})
-            * text)
-            */
-           for (o = cLISTOPx(expr)->op_first; o; o = o->op_sibling) {
-               if (o->op_type == OP_CONST) {
-                   sv_catsv(pat, cSVOPo_sv);
-                   if (is_code) {
-                       pRExC_state->code_blocks[i].end = SvCUR(pat)-1;
-                       is_code = 0;
-                   }
-               }
-               else if (o->op_type == OP_NULL && (o->op_flags & OPf_SPECIAL)) {
-                   assert(i+1 < pRExC_state->num_code_blocks);
-                   pRExC_state->code_blocks[++i].start = SvCUR(pat);
-                   pRExC_state->code_blocks[i].block = o;
-                   pRExC_state->code_blocks[i].src_regex = NULL;
-                   is_code = 1;
-               }
-           }
-       }
-       else {
-           assert(expr->op_type == OP_CONST);
-            DEBUG_PARSE_r(PerlIO_printf(Perl_debug_log,
-                "Compiling OP_CONST%s\n", orig_rx_flags & RXf_SPLIT ? " for 
split" : ""));
-           pat = cSVOPx_sv(expr);
-       }
-    }
 
     exp = SvPV_nomg(pat, plen);
     xend = exp + plen;
@@ -5650,6 +5619,13 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
         }
     }
 
+    if ((pm_flags & PMf_USE_RE_EVAL)
+               /* this second condition covers the non-regex literal case,
+                * i.e.  $foo =~ '(?{})'. */
+               || (IN_PERL_COMPILETIME && (PL_hints & HINT_RE_EVAL))
+    )
+       runtime_code = S_has_runtime_code(aTHX_ pRExC_state, exp, plen);
+
     /* return old regex if pattern hasn't changed */
     /* XXX: note in the below we have to check the flags as well as the 
pattern.
      *
@@ -5663,24 +5639,12 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
         && ( RX_COMPFLAGS(old_re) == ( orig_rx_flags & RXf_PMf_FLAGCOPYMASK ) )
        && RX_PRECOMP(old_re)
        && RX_PRELEN(old_re) == plen
-        && memEQ(RX_PRECOMP(old_re), exp, plen))
+        && memEQ(RX_PRECOMP(old_re), exp, plen)
+       && !runtime_code /* with runtime code, always recompile */ )
     {
-       /* with runtime code, always recompile */
-       runtime_code = S_has_runtime_code(aTHX_ pRExC_state, expr, pm_flags,
-                                           exp, plen);
-       if (!runtime_code) {
-           Safefree(pRExC_state->code_blocks);
-           return old_re;
-       }
+        Safefree(pRExC_state->code_blocks);
+        return old_re;
     }
-    else if ((pm_flags & PMf_USE_RE_EVAL)
-               /* this second condition covers the non-regex literal case,
-                * i.e.  $foo =~ '(?{})'. */
-               || ( !PL_reg_state.re_reparsing && IN_PERL_COMPILETIME
-                   && (PL_hints & HINT_RE_EVAL))
-    )
-       runtime_code = S_has_runtime_code(aTHX_ pRExC_state, expr, pm_flags,
-                           exp, plen);
 
     rx_flags = orig_rx_flags;
 
diff --git a/regexec.c b/regexec.c
index d376e26..45bd09e 100644
--- a/regexec.c
+++ b/regexec.c
@@ -4878,8 +4878,6 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
                 */
                Copy(&PL_reg_state, &saved_state, 1, struct re_save_state);
 
-               PL_reg_state.re_reparsing = FALSE;
-
                if (!caller_cv)
                    caller_cv = find_runcv(NULL);
 
diff --git a/regexp.h b/regexp.h
index 6b16d14..31fb879 100644
--- a/regexp.h
+++ b/regexp.h
@@ -776,7 +776,6 @@ typedef struct regmatch_slab {
 struct re_save_state {
     bool re_state_eval_setup_done;     /* from regexec.c */
     bool re_state_reg_match_utf8;      /* from regexec.c */
-    bool re_reparsing;                 /* runtime (?{}) fed back into parser */
     /* Space for U8 */
     I32 re_state_reg_oldpos;           /* from regexec.c */
     I32 re_state_reg_maxiter;          /* max wait until caching pos */
diff --git a/t/re/overload.t b/t/re/overload.t
index 4e99bd3..38d5140 100644
--- a/t/re/overload.t
+++ b/t/re/overload.t
@@ -33,4 +33,149 @@ no  warnings 'syntax';
     is $1, $TAG, "void context //g against overloaded object";
 }
 
+{
+    # an overloaded stringify returning itself shouldn't loop indefinitely
+
+
+    {
+       package Self;
+       use overload q{""} => sub {
+                   return shift;
+               },
+           fallback => 1;
+    }
+
+    my $obj = bless [], 'Self';
+    my $r = qr/$obj/;
+    pass("self object, 1 arg");
+    $r = qr/foo$obj/;
+    pass("self object, 2 args");
+}
+
+{
+    # [perl #116823]
+    # when overloading regex string constants, a different code path
+    # was taken if the regex was compile-time, leading to overloaded
+    # regex constant string segments not being handled correctly.
+    # They were just treated as OP_CONST strings to be concatted together.
+    # In particular, if the overload returned a regex object, it would
+    # just be stringified rather than having any code blocks processed.
+
+    BEGIN {
+       overload::constant qr => sub {
+           my ($raw, $cooked, $type) = @_;
+           return $cooked unless defined $::CONST_QR_CLASS;
+           if ($type =~ /qq?/) {
+               return bless \$cooked, $::CONST_QR_CLASS;
+           } else {
+               return $cooked;
+           }
+       };
+    }
+
+    {
+       # returns a qr// object
+
+       package OL_QR;
+       use overload q{""} => sub {
+               my $re = shift;
+               return qr/(?{ $OL_QR::count++ })$$re/;
+           },
+       fallback => 1;
+
+    }
+
+    {
+       # returns a string
+
+       package OL_STR;
+       use overload q{""} => sub {
+               my $re = shift;
+               return qq/(?{ \$OL_STR::count++ })$$re/;
+           },
+       fallback => 1;
+
+    }
+
+
+    my $qr;
+
+    $::CONST_QR_CLASS = 'OL_QR';
+
+    $OL_QR::count = 0;
+    $qr = eval q{ qr/^foo$/; };
+    ok("foo" =~ $qr, "compile-time, OL_QR, single constant segment");
+    is($OL_QR::count, 1, "flag");
+
+    $OL_QR::count = 0;
+    $qr = eval q{ qr/^foo$(?{ $OL_QR::count++ })/; };
+    ok("foo" =~ $qr, "compile-time, OL_QR, multiple constant segments");
+    is($OL_QR::count, 2, "qr2 flag");
+
+
+    # test /foo.../ when foo is given string overloading,
+    # for various permutations of '...'
+
+    $::CONST_QR_CLASS = 'OL_STR';
+
+    for my $has_re_eval (0, 1) {
+       for my $has_qr (0, 1) {
+           for my $has_code (0, 1) {
+               for my $has_runtime (0, 1) {
+                   for my $has_runtime_code (0, 1) {
+                       if ($has_runtime_code) {
+                           next unless $has_runtime;
+                       }
+                       note( "re_eval=$has_re_eval "
+                           . "qr=$has_qr "
+                           . "code=$has_code "
+                           . "runtime=$has_runtime "
+                           . "runtime_code=$has_runtime_code");
+                       my $eval = '';
+                       $eval .= q{use re 'eval'; } if $has_re_eval;
+                       $eval .= q{$match = $str =~ };
+                       $eval .= q{qr} if $has_qr;
+                       $eval .= q{/^abc};
+                       $eval .= q{(?{$blocks++})} if $has_code;
+                       $eval .= q{$runtime} if $has_runtime;
+                       $eval .= q{/; 1;};
+
+                       my $runtime = q{def};
+                       $runtime .= q{(?{$run_blocks++})} if $has_runtime_code;
+
+                       my $blocks = 0;
+                       my $run_blocks = 0;
+                       my $match;
+                       my $str = "abc";
+                       $str .= "def" if $runtime;
+
+                       my $result = eval $eval;
+                       my $err = $@;
+                       $result = $result ? 1 : 0;
+
+                       if (!$has_re_eval) {
+                           is($result, 0, "EVAL: $eval");
+                           like($err, qr/Eval-group not allowed at runtime/,
+                               "\$\@:   $eval");
+                           next;
+                       }
+
+                       is($result, 1, "EVAL: $eval");
+                       diag("\$@=[$err]") unless $result;
+
+                       is($match, 1, "MATCH: $eval");
+                       is($blocks, $has_code, "blocks");
+                       is($run_blocks, $has_runtime_code, "run_blocks");
+
+                   }
+               }
+           }
+       }
+    }
+
+
+    undef $::CONST_QR_CLASS;
+}
+
+
 done_testing();
diff --git a/toke.c b/toke.c
index 275c957..43adb3e 100644
--- a/toke.c
+++ b/toke.c
@@ -2525,6 +2525,7 @@ S_sublex_push(pTHX)
     SAVEGENERICPV(PL_lex_brackstack);
     SAVEGENERICPV(PL_lex_casestack);
     SAVEGENERICPV(PL_parser->lex_shared);
+    SAVEBOOL(PL_parser->lex_re_reparsing);
 
     /* The here-doc parser needs to be able to peek into outer lexing
        scopes to find the body of the here-doc.  So we put PL_linestr and
@@ -2568,6 +2569,9 @@ S_sublex_push(pTHX)
     else
        PL_lex_inpat = NULL;
 
+    PL_parser->lex_re_reparsing = cBOOL(PL_in_eval & EVAL_RE_REPARSING);
+    PL_in_eval &= ~EVAL_RE_REPARSING;
+
     return '(';
 }
 
@@ -3751,7 +3755,9 @@ S_scan_const(pTHX_ char *start)
     /* return the substring (via pl_yylval) only if we parsed anything */
     if (s > PL_bufptr) {
        SvREFCNT_inc_simple_void_NN(sv);
-       if ( PL_hints & ( PL_lex_inpat ? HINT_NEW_RE : HINT_NEW_STRING ) ) {
+       if (   (PL_hints & ( PL_lex_inpat ? HINT_NEW_RE : HINT_NEW_STRING ))
+            && ! PL_parser->lex_re_reparsing)
+        {
            const char *const key = PL_lex_inpat ? "qr" : "q";
            const STRLEN keylen = PL_lex_inpat ? 2 : 1;
            const char *type;
@@ -9047,7 +9053,9 @@ S_checkcomma(pTHX_ const char *s, const char *name, const 
char *what)
     }
 }
 
-/* Either returns sv, or mortalizes/frees sv and returns a new SV*.
+/* S_new_constant(): do any overload::constant lookup.
+
+   Either returns sv, or mortalizes/frees sv and returns a new SV*.
    Best used as sv=new_constant(..., sv, ...).
    If s, pv are NULL, calls subroutine with one argument,
    and <type> is used with error messages only.
@@ -9502,8 +9510,7 @@ S_scan_pat(pTHX_ char *start, I32 type)
 {
     dVAR;
     PMOP *pm;
-    char *s = scan_str(start,!!PL_madskills,FALSE, PL_reg_state.re_reparsing,
-                       TRUE /* look for escaped bracketed metas */ );
+    char *s;
     const char * const valid_flags =
        (const char *)((type == OP_QR) ? QR_PAT_MODS : M_PAT_MODS);
     char charset = '\0';    /* character set modifier */
@@ -9513,9 +9520,9 @@ S_scan_pat(pTHX_ char *start, I32 type)
 
     PERL_ARGS_ASSERT_SCAN_PAT;
 
-    /* this was only needed for the initial scan_str; set it to false
-     * so that any (?{}) code blocks etc are parsed normally */
-    PL_reg_state.re_reparsing = FALSE;
+    s = scan_str(start,!!PL_madskills,FALSE, (PL_in_eval & EVAL_RE_REPARSING),
+                       TRUE /* look for escaped bracketed metas */ );
+
     if (!s) {
        const char * const delimiter = skipspace(start);
        Perl_croak(aTHX_

--
Perl5 Master Repository

Reply via email to