[perl.git] branch blead, updated. v5.25.9-146-gca81c32aad

Dave Mitchell Tue, 14 Feb 2017 13:04:50 -0800

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/ca81c32aad9c602582b8f87a832594c8712bf6a6?hp=57c819f845c985ed9979bfa76b1b8ca1708370f0>


- Log -----------------------------------------------------------------
commit ca81c32aad9c602582b8f87a832594c8712bf6a6
Merge: 57c819f845 77584140f7
Author: David Mitchell <[email protected]>
Date:   Tue Feb 14 17:50:11 2017 +0000

    [MERGE] regex (?{...}) and WHILEM scope fixups

commit 77584140f7cbfe714083cacfa671085466e98a7b
Author: David Mitchell <[email protected]>
Date:   Tue Feb 14 17:10:34 2017 +0000

    S_regmatch: eliminate WHILEM_A_min paren saving
    
    In something like
    
        "a1b2c3d4..." =~ /(?:(\w)(\d))*..../
    
    A WHILEM state is pushed for each iteration of the '*'. Part of this
    state saving includes the previous indices for each of the captures within
    the body of the thing being iterated over. So we save the following sets of
    values for $1,$2:
    
        ()()
        (a)(1)
        (b)(2)
        (c)(3)
        (d)(4)
    
    Then if at any point we backtrack, we can undo one or more iterations and
    restore the older values of $1,$2.
    
    However, when the match is non-greedy, as in A*?B, then on failure of B
    and backtracking we attempt *more* A's rather than removing some already
    matched A's. So there's never any need to save all the current paren state
    for each iteration.
    
    This eliminates a lot of per-iteration overhead for minimal WHILEMs and
    makes the following run about 25% faster:
    
    $s = ("a" x 1000);
    $s =~ /^(?:(.)(.))*?[XY]/ for 1..10_000;

M       regexec.c
M       t/perf/benchmarks

commit bb414e1295cbc3c4c2a55aaf82d832d6c8bf76ec
Author: David Mitchell <[email protected]>
Date:   Tue Feb 14 16:28:31 2017 +0000

    S_regmatch: eliminate WHILEM_B paren saving
    
    In something like
    
        "a1b2c3d4..." =~ /(?:(\w)(\d))*..../
    
    A WHILEM state is pushed for each iteration of the '*'. Part of this
    state saving includes the previous indices for each of the captures within
    the body of the thing being iterated over. So we save the following sets of
    values for $1,$2:
    
        ()()
        (a)(1)
        (b)(2)
        (c)(3)
        (d)(4)
    
    Then if at any point we backtrack, we can undo one or more iterations and
    restore the older values of $1,$2.
    
    For /A*B/ where A is a complex sub-pattern like (\w)(\d), we currently save
    the paren state each time we're about to attempt to iterate another A.
    But it turns out that for non-greedy matching, i.e. A*?B, we also
    save the paren state before executing B. This is unnecessary, as
    B can't alter the capture state of the parens within A. So eliminate it.
    
    If in the future some sneaky regex is found which this commit breaks,
    then as well as restoring the old behaviour, you should look carefully
    to see whether similar paren-saving behaviour for B should be added to
    greedy matches too, i.e. A*B. It was partly the discrepancy between
    saving for A*?B but not for A*B which made me suspect it was redundant.

M       regexec.c

commit cbb658a1562fa3da6a29d865ee9b0ba564affb3f
Author: David Mitchell <[email protected]>
Date:   Tue Feb 14 16:21:40 2017 +0000

    Add a comment on why TRIE.jump does a UNWIND_PAREN
    
    (it wasn't obvious to me)

M       regexec.c

commit 4ee16520199a0e11bf4dbdbac71f0a64e5510271
Author: David Mitchell <[email protected]>
Date:   Tue Feb 14 15:59:57 2017 +0000

    clear savestack on (?{...}) failure and backtrack
    
    RT #126697
    
    In a regex, after executing a (?{...}) code block, if we fail and
    backtrack over the codeblock, we're supposed to unwind the savestack, so
    that for any example any local()s within the code block are undone.
    
    It turns out that a backtracking state isn't pushed for (?{...}), only
    for postponed evals ( i.e.  (??{...})). This means that it relies on one
    of the earlier backtracking states to clear the savestack on its behalf.
    This can't always be relied upon, and the ticket above contains code where
    this falls down; in particular:
    
        'ABC' =~ m{
            \A
            (?:
                (?: AB | A | BC )
                (?{
                    local $count = $count + 1;
                    print "! count=$count; ; pos=${\pos}\n";
                })
            )*
            \z
        }x
    
    Here we end up relying on TRIE_next to do the cleaning up, but TRIE_next
    doesn't, since there's nothing it would be responsible for that needs
    cleaning up.
    
    The solution to this is to push a backtrack state for every (?{...}) as
    well as every (??{...}). The sole job of that state is to do a
    LEAVE_SCOPE(ST.lastcp).
    
    The existing backtrack state EVAL_AB has been renamed EVAL_postponed_AB
    to make it clear it's only used on postponed /(??{A})B/ regexes, and a new
    state has been added, EVAL_B, which is only called when backtracking after
    failing something in the B in /(?{...})B/.

M       regcomp.sym
M       regexec.c
M       regnodes.h
M       t/re/pat_re_eval.t

commit f4197b8e1d1f984eb6635ea41ce14394432b96b1
Author: David Mitchell <[email protected]>
Date:   Tue Feb 14 13:32:16 2017 +0000

    -Mre=Debug,ALL: indicate regex state stack pushes
    
    At this maximal level of debugging output, it displays the top 3 state
    stack entries each time it pushes, but with no obvious indication that
    a push is occurring. This commit changes this output:
    
                                 |   1|  Setting an EVAL scope, savestack=9,
                                 |   2|   #4   WHILEM_A_max
                                 |   2|   #3   WHILEM_A_max
                                 |   2|   #2   CURLYX_end yes
       0 <abcdef> <g>            |   2|   4:POSIXD[\w](5)
    
    to be this (which includes the word "push" and extra indentation for the
    stack dump):
    
                                 |   1|  Setting an EVAL scope, savestack=9,
                                 |   2|   push #4   WHILEM_A_max
                                 |   2|        #3   WHILEM_A_max
                                 |   2|        #2   CURLYX_end yes
       0 <abcdef> <g>            |   2|   4:POSIXD[\w](5)
    
    Also, replace curd (current depth) var with a positive integer offset
    (i) var, to avoid signed/unsigned mixing problems.

M       regexec.c

commit 4b9c7caeaecf4e9df0be3a2e296644f763f775d6
Author: David Mitchell <[email protected]>
Date:   Sat Feb 11 11:53:41 2017 +0000

    fix pad/scope issue in re_evals
    
    RT #129881 heap-buffer-overflow Perl_pad_sv
    
    In some circumstances involving a pattern which has embedded code blocks
    from more than one source, e.g.
    
        my $r = qr{(?{1;}){2}X};
        "" =~ /$r|(?{1;})/;
    
    the wrong PL_comppad could be active while doing a LEAVE_SCOPE() or on
    exit from the pattern.
    
    This was mainly due to the big context stack changes in 5.24.0 - in
    particular, since POP_MULTICALL() now does CX_LEAVE_SCOPE(cx) *before*
    restoring PL_comppad, the (correct) unwinding of any SAVECOMPPAD's was
    being followed by C<PL_comppad = cx->blk_sub.prevcomppad>, which wasn't
    necessarily a sensible value.
    
    To fix this, record the value of PL_savestack_ix at entry to S_regmatch(),
    and set the cx->blk_oldsaveix of the MULTICALL to this value when pushed.
    On exit from S_regmatch, we either POP_MULTICALL which will do a
    LEAVE_SCOPE(cx->blk_oldsaveix), or in the absense of any EVAL, do the
    explicit but equivalent LEAVE_SCOPE(orig_savestack_ix).
    
    Note that this is a change in behaviour to S_regmatch() - formerly it
    wouldn't necessarily clear the savestack completely back the point of
    entry - that would get left to do by its caller, S_regtry(), or indirectly
    by Perl_regexec_flags(). This shouldn't make any practical difference, but
    is tidier and less likely to introduce bugs later.

M       regexec.c
M       t/re/pat_re_eval.t
-----------------------------------------------------------------------

Summary of changes:
 regcomp.sym        |   2 +-
 regexec.c          | 120 ++++++++++++++++++++++++++-------------
 regnodes.h         | 164 +++++++++++++++++++++++++++--------------------------
 t/perf/benchmarks  |  11 ++++
 t/re/pat_re_eval.t |  40 ++++++++++++-
 5 files changed, 218 insertions(+), 119 deletions(-)

diff --git a/regcomp.sym b/regcomp.sym
index ac67955270..999d965565 100644
--- a/regcomp.sym
+++ b/regcomp.sym
@@ -243,7 +243,7 @@ PSEUDO      PSEUDO,     off       ; Pseudo opcode for 
internal use.
 #
 #
 TRIE            next:FAIL
-EVAL            AB:FAIL
+EVAL            B,postponed_AB:FAIL
 CURLYX          end:FAIL
 WHILEM          A_pre,A_min,A_max,B_min,B_max:FAIL
 BRANCH          next:FAIL
diff --git a/regexec.c b/regexec.c
index cf6b5483c4..bd7887086f 100644
--- a/regexec.c
+++ b/regexec.c
@@ -5391,12 +5391,12 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char 
*startpos, regnode *prog)
     U8 gimme = G_SCALAR;
     CV *caller_cv = NULL;      /* who called us */
     CV *last_pushed_cv = NULL; /* most recently called (?{}) CV */
-    CHECKPOINT runops_cp;      /* savestack position before executing EVAL */
     U32 maxopenparen = 0;       /* max '(' index seen so far */
     int to_complement;  /* Invert the result? */
     _char_class_number classnum;
     bool is_utf8_pat = reginfo->is_utf8_pat;
     bool match = FALSE;
+    I32 orig_savestack_ix = PL_savestack_ix;
 
 /* Solaris Studio 12.3 messes up fetching PL_charclass['\n'] */
 #if (defined(__SUNPRO_C) && (__SUNPRO_C == 0x5120) && defined(__x86_64) && 
defined(USE_64_BIT_ALL))
@@ -5736,6 +5736,11 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
         {
             U8 *uc;
             if ( ST.jump ) {
+                /* undo any captures done in the tail part of a branch,
+                 * e.g.
+                 *    /(?:X(.)(.)|Y(.)).../
+                 * where the trie just matches X then calls out to do the
+                 * rest of the branch */
                 REGCP_UNWIND(ST.cp);
                 UNWIND_PAREN(ST.lastparen, ST.lastcloseparen);
            }
@@ -6786,7 +6791,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
             goto eval_recurse_doit;
             /* NOTREACHED */
 
-        case EVAL:  /*   /(?{A})B/   /(??{A})B/  and /(?(?{A})X|Y)B/   */      
  
+        case EVAL:  /*   /(?{...})B/   /(??{A})B/  and  /(?(?{...})X|Y)B/   */
             if (cur_eval && cur_eval->locinput==locinput) {
                if ( ++nochange_depth > max_nochange_depth )
                     Perl_croak(aTHX_ "EVAL without pos change exceeded limit 
in regex");
@@ -6805,7 +6810,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
 
                /* save *all* paren positions */
                 regcppush(rex, 0, maxopenparen);
-               REGCP_SET(runops_cp);
+                REGCP_SET(ST.lastcp);
 
                if (!caller_cv)
                    caller_cv = find_runcv(NULL);
@@ -6830,30 +6835,67 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char 
*startpos, regnode *prog)
                    nop = (OP*)rexi->data->data[n];
                }
 
-               /* normally if we're about to execute code from the same
-                * CV that we used previously, we just use the existing
-                * CX stack entry. However, its possible that in the
-                * meantime we may have backtracked, popped from the save
-                * stack, and undone the SAVECOMPPAD(s) associated with
-                * PUSH_MULTICALL; in which case PL_comppad no longer
-                * points to newcv's pad. */
+                /* Some notes about MULTICALL and the context and save stacks.
+                 *
+                 * In something like
+                 *   /...(?{ my $x)}...(?{ my $z)}...(?{ my $z)}.../
+                 * since codeblocks don't introduce a new scope (so that
+                 * local() etc accumulate), at the end of a successful
+                 * match there will be a SAVEt_CLEARSV on the savestack
+                 * for each of $x, $y, $z. If the three code blocks above
+                 * happen to have come from different CVs (e.g. via
+                 * embedded qr//s), then we must ensure that during any
+                 * savestack unwinding, PL_comppad always points to the
+                 * right pad at each moment. We achieve this by
+                 * interleaving SAVEt_COMPPAD's on the savestack whenever
+                 * there is a change of pad.
+                 * In theory whenever we call a code block, we should
+                 * push a CXt_SUB context, then pop it on return from
+                 * that code block. This causes a bit of an issue in that
+                 * normally popping a context also clears the savestack
+                 * back to cx->blk_oldsaveix, but here we specifically
+                 * don't want to clear the save stack on exit from the
+                 * code block.
+                 * Also for efficiency we don't want to keep pushing and
+                 * popping the single SUB context as we backtrack etc.
+                 * So instead, we push a single context the first time
+                 * we need, it, then hang onto it until the end of this
+                 * function. Whenever we encounter a new code block, we
+                 * update the CV etc if that's changed. During the times
+                 * in this function where we're not executing a code
+                 * block, having the SUB context still there is a bit
+                 * naughty - but we hope that no-one notices.
+                 * When the SUB context is initially pushed, we fake up
+                 * cx->blk_oldsaveix to be as if we'd pushed this context
+                 * on first entry to S_regmatch rather than at some random
+                 * point during the regexe execution. That way if we
+                 * croak, popping the context stack will ensure that
+                 * *everything* SAVEd by this function is undone and then
+                 * the context popped, rather than e.g., popping the
+                 * context (and restoring the original PL_comppad) then
+                 * popping more of the savestack and restoiring a bad
+                 * PL_comppad.
+                 */
+
+                /* If this is the first EVAL, push a MULTICALL. On
+                 * subsequent calls, if we're executing a different CV, or
+                 * if PL_comppad has got messed up from backtracking
+                 * through SAVECOMPPADs, then refresh the context.
+                 */
                if (newcv != last_pushed_cv || PL_comppad != last_pad)
                {
                     U8 flags = (CXp_SUB_RE |
                                 ((newcv == caller_cv) ? CXp_SUB_RE_FAKE : 0));
+                    SAVECOMPPAD();
                    if (last_pushed_cv) {
-                        /* PUSH/POP_MULTICALL save and restore the
-                         * caller's PL_comppad; if we call multiple subs
-                         * using the same CX block, we have to save and
-                         * unwind the varying PL_comppad's ourselves,
-                         * especially restoring the right PL_comppad on
-                         * backtrack - so save it on the save stack */
-                        SAVECOMPPAD();
                        CHANGE_MULTICALL_FLAGS(newcv, flags);
                    }
                    else {
                        PUSH_MULTICALL_FLAGS(newcv, flags);
                    }
+                    /* see notes above */
+                    CX_CUR()->blk_oldsaveix = orig_savestack_ix;
+
                    last_pushed_cv = newcv;
                }
                else {
@@ -6970,12 +7012,14 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char 
*startpos, regnode *prog)
                 * in the regexp code uses the pad ! */
                PL_op = oop;
                PL_curcop = ocurcop;
-                regcp_restore(rex, runops_cp, &maxopenparen);
+                regcp_restore(rex, ST.lastcp, &maxopenparen);
                 PL_curpm_under = PL_curpm;
                 PL_curpm = PL_reg_curpm;
 
-               if (logical != 2)
-                   break;
+               if (logical != 2) {
+                    PUSH_STATE_GOTO(EVAL_B, next, locinput);
+                   /* NOTREACHED */
+                }
            }
 
                /* only /(??{})/  from now on */
@@ -7073,11 +7117,11 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char 
*startpos, regnode *prog)
                ST.prev_eval = cur_eval;
                cur_eval = st;
                /* now continue from first node in postoned RE */
-               PUSH_YES_STATE_GOTO(EVAL_AB, startpoint, locinput);
+               PUSH_YES_STATE_GOTO(EVAL_postponed_AB, startpoint, locinput);
                NOT_REACHED; /* NOTREACHED */
        }
 
-       case EVAL_AB: /* cleanup after a successful (??{A})B */
+       case EVAL_postponed_AB: /* cleanup after a successful (??{A})B */
             /* note: this is called twice; first after popping B, then A */
             DEBUG_STACK_r({
                 Perl_re_exec_indentf( aTHX_  "EVAL_AB cur_eval=%p 
prev_eval=%p\n",
@@ -7123,7 +7167,11 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
            sayYES;
 
 
-       case EVAL_AB_fail: /* unsuccessfully ran A or B in (??{A})B */
+       case EVAL_B_fail: /* unsuccessful B in (?{...})B */
+           REGCP_UNWIND(ST.lastcp);
+            sayNO;
+
+       case EVAL_postponed_AB_fail: /* unsuccessfully ran A or B in (??{A})B */
            /* note: this is called twice; first after popping B, then A */
             DEBUG_STACK_r({
                 Perl_re_exec_indentf( aTHX_  "EVAL_AB_fail cur_eval=%p 
prev_eval=%p\n",
@@ -7523,9 +7571,6 @@ NULL
            if (cur_curlyx->u.curlyx.minmod) {
                ST.save_curlyx = cur_curlyx;
                cur_curlyx = cur_curlyx->u.curlyx.prev_curlyx;
-                ST.cp = regcppush(rex, ST.save_curlyx->u.curlyx.parenfloor,
-                            maxopenparen);
-               REGCP_SET(ST.lastcp);
                PUSH_YES_STATE_GOTO(WHILEM_B_min, ST.save_curlyx->u.curlyx.B,
                                     locinput);
                NOT_REACHED; /* NOTREACHED */
@@ -7558,11 +7603,11 @@ NULL
            CACHEsayNO;
            NOT_REACHED; /* NOTREACHED */
 
-       case WHILEM_A_min_fail: /* just failed to match A in a minimal match */
-           /* FALLTHROUGH */
        case WHILEM_A_pre_fail: /* just failed to match even minimal A */
            REGCP_UNWIND(ST.lastcp);
             regcppop(rex, &maxopenparen);
+           /* FALLTHROUGH */
+       case WHILEM_A_min_fail: /* just failed to match A in a minimal match */
            cur_curlyx->u.curlyx.lastloc = ST.save_lastloc;
            cur_curlyx->u.curlyx.count--;
            CACHEsayNO;
@@ -7595,8 +7640,6 @@ NULL
 
        case WHILEM_B_min_fail: /* just failed to match B in a minimal match */
            cur_curlyx = ST.save_curlyx;
-           REGCP_UNWIND(ST.lastcp);
-            regcppop(rex, &maxopenparen);
 
            if (cur_curlyx->u.curlyx.count >= 
/*max*/ARG2(cur_curlyx->u.curlyx.me)) {
                /* Maximum greed exceeded */
@@ -7618,9 +7661,6 @@ NULL
            );
            /* Try grabbing another A and see if it helps. */
            cur_curlyx->u.curlyx.lastloc = locinput;
-            ST.cp = regcppush(rex, cur_curlyx->u.curlyx.parenfloor,
-                            maxopenparen);
-           REGCP_SET(ST.lastcp);
            PUSH_STATE_GOTO(WHILEM_A_min,
                /*A*/ NEXTOPER(ST.save_curlyx->u.curlyx.me) + EXTRA_STEP_2ARGS,
                 locinput);
@@ -8225,7 +8265,7 @@ NULL
 
                 SET_RECURSE_LOCINPUT("FAKE-END[after]", cur_eval->locinput);
 
-                PUSH_YES_STATE_GOTO(EVAL_AB, st->u.eval.prev_eval->u.eval.B,
+                PUSH_YES_STATE_GOTO(EVAL_postponed_AB, 
st->u.eval.prev_eval->u.eval.B,
                                     locinput); /* match B */
            }
 
@@ -8482,16 +8522,17 @@ NULL
            DEBUG_STACK_r({
                regmatch_state *cur = st;
                regmatch_state *curyes = yes_state;
-               int curd = depth;
+               U32 i;
                regmatch_slab *slab = PL_regmatch_slab;
-                for (;curd > -1 && (depth-curd < 3);cur--,curd--) {
+                for (i = 0; i < 3 && i <= depth; cur--,i++) {
                     if (cur < SLAB_FIRST(slab)) {
                        slab = slab->prev;
                        cur = SLAB_LAST(slab);
                     }
-                    Perl_re_exec_indentf( aTHX_ "#%-3d %-10s %s\n",
+                    Perl_re_exec_indentf( aTHX_ "%4s #%-3d %-10s %s\n",
                         depth,
-                        curd, PL_reg_name[cur->resume_state],
+                        i ? "    " : "push",
+                        depth - i, PL_reg_name[cur->resume_state],
                         (curyes == cur) ? "yes" : ""
                     );
                     if (curyes == cur)
@@ -8643,9 +8684,12 @@ NULL
 
     if (last_pushed_cv) {
        dSP;
+        /* see "Some notes about MULTICALL" above */
        POP_MULTICALL;
         PERL_UNUSED_VAR(SP);
     }
+    else
+        LEAVE_SCOPE(orig_savestack_ix);
 
     assert(!result ||  locinput - reginfo->strbeg >= 0);
     return result ?  locinput - reginfo->strbeg : -1;
diff --git a/regnodes.h b/regnodes.h
index f820c5684e..8fe0f41d1b 100644
--- a/regnodes.h
+++ b/regnodes.h
@@ -7,7 +7,7 @@
 /* Regops and State definitions */
 
 #define REGNODE_MAX            92
-#define REGMATCH_STATE_MAX     132
+#define REGMATCH_STATE_MAX     134
 
 #define        END                     0       /* 0000 End of program. */
 #define        SUCCEED                 1       /* 0x01 Return from a 
subroutine, basically. */
@@ -107,44 +107,46 @@
        /* ------------ States ------------- */
 #define        TRIE_next               (REGNODE_MAX + 1)       /* state for 
TRIE */
 #define        TRIE_next_fail          (REGNODE_MAX + 2)       /* state for 
TRIE */
-#define        EVAL_AB                 (REGNODE_MAX + 3)       /* state for 
EVAL */
-#define        EVAL_AB_fail            (REGNODE_MAX + 4)       /* state for 
EVAL */
-#define        CURLYX_end              (REGNODE_MAX + 5)       /* state for 
CURLYX */
-#define        CURLYX_end_fail         (REGNODE_MAX + 6)       /* state for 
CURLYX */
-#define        WHILEM_A_pre            (REGNODE_MAX + 7)       /* state for 
WHILEM */
-#define        WHILEM_A_pre_fail       (REGNODE_MAX + 8)       /* state for 
WHILEM */
-#define        WHILEM_A_min            (REGNODE_MAX + 9)       /* state for 
WHILEM */
-#define        WHILEM_A_min_fail       (REGNODE_MAX + 10)      /* state for 
WHILEM */
-#define        WHILEM_A_max            (REGNODE_MAX + 11)      /* state for 
WHILEM */
-#define        WHILEM_A_max_fail       (REGNODE_MAX + 12)      /* state for 
WHILEM */
-#define        WHILEM_B_min            (REGNODE_MAX + 13)      /* state for 
WHILEM */
-#define        WHILEM_B_min_fail       (REGNODE_MAX + 14)      /* state for 
WHILEM */
-#define        WHILEM_B_max            (REGNODE_MAX + 15)      /* state for 
WHILEM */
-#define        WHILEM_B_max_fail       (REGNODE_MAX + 16)      /* state for 
WHILEM */
-#define        BRANCH_next             (REGNODE_MAX + 17)      /* state for 
BRANCH */
-#define        BRANCH_next_fail        (REGNODE_MAX + 18)      /* state for 
BRANCH */
-#define        CURLYM_A                (REGNODE_MAX + 19)      /* state for 
CURLYM */
-#define        CURLYM_A_fail           (REGNODE_MAX + 20)      /* state for 
CURLYM */
-#define        CURLYM_B                (REGNODE_MAX + 21)      /* state for 
CURLYM */
-#define        CURLYM_B_fail           (REGNODE_MAX + 22)      /* state for 
CURLYM */
-#define        IFMATCH_A               (REGNODE_MAX + 23)      /* state for 
IFMATCH */
-#define        IFMATCH_A_fail          (REGNODE_MAX + 24)      /* state for 
IFMATCH */
-#define        CURLY_B_min_known       (REGNODE_MAX + 25)      /* state for 
CURLY */
-#define        CURLY_B_min_known_fail  (REGNODE_MAX + 26)      /* state for 
CURLY */
-#define        CURLY_B_min             (REGNODE_MAX + 27)      /* state for 
CURLY */
-#define        CURLY_B_min_fail        (REGNODE_MAX + 28)      /* state for 
CURLY */
-#define        CURLY_B_max             (REGNODE_MAX + 29)      /* state for 
CURLY */
-#define        CURLY_B_max_fail        (REGNODE_MAX + 30)      /* state for 
CURLY */
-#define        COMMIT_next             (REGNODE_MAX + 31)      /* state for 
COMMIT */
-#define        COMMIT_next_fail        (REGNODE_MAX + 32)      /* state for 
COMMIT */
-#define        MARKPOINT_next          (REGNODE_MAX + 33)      /* state for 
MARKPOINT */
-#define        MARKPOINT_next_fail     (REGNODE_MAX + 34)      /* state for 
MARKPOINT */
-#define        SKIP_next               (REGNODE_MAX + 35)      /* state for 
SKIP */
-#define        SKIP_next_fail          (REGNODE_MAX + 36)      /* state for 
SKIP */
-#define        CUTGROUP_next           (REGNODE_MAX + 37)      /* state for 
CUTGROUP */
-#define        CUTGROUP_next_fail      (REGNODE_MAX + 38)      /* state for 
CUTGROUP */
-#define        KEEPS_next              (REGNODE_MAX + 39)      /* state for 
KEEPS */
-#define        KEEPS_next_fail         (REGNODE_MAX + 40)      /* state for 
KEEPS */
+#define        EVAL_B                  (REGNODE_MAX + 3)       /* state for 
EVAL */
+#define        EVAL_B_fail             (REGNODE_MAX + 4)       /* state for 
EVAL */
+#define        EVAL_postponed_AB       (REGNODE_MAX + 5)       /* state for 
EVAL */
+#define        EVAL_postponed_AB_fail  (REGNODE_MAX + 6)       /* state for 
EVAL */
+#define        CURLYX_end              (REGNODE_MAX + 7)       /* state for 
CURLYX */
+#define        CURLYX_end_fail         (REGNODE_MAX + 8)       /* state for 
CURLYX */
+#define        WHILEM_A_pre            (REGNODE_MAX + 9)       /* state for 
WHILEM */
+#define        WHILEM_A_pre_fail       (REGNODE_MAX + 10)      /* state for 
WHILEM */
+#define        WHILEM_A_min            (REGNODE_MAX + 11)      /* state for 
WHILEM */
+#define        WHILEM_A_min_fail       (REGNODE_MAX + 12)      /* state for 
WHILEM */
+#define        WHILEM_A_max            (REGNODE_MAX + 13)      /* state for 
WHILEM */
+#define        WHILEM_A_max_fail       (REGNODE_MAX + 14)      /* state for 
WHILEM */
+#define        WHILEM_B_min            (REGNODE_MAX + 15)      /* state for 
WHILEM */
+#define        WHILEM_B_min_fail       (REGNODE_MAX + 16)      /* state for 
WHILEM */
+#define        WHILEM_B_max            (REGNODE_MAX + 17)      /* state for 
WHILEM */
+#define        WHILEM_B_max_fail       (REGNODE_MAX + 18)      /* state for 
WHILEM */
+#define        BRANCH_next             (REGNODE_MAX + 19)      /* state for 
BRANCH */
+#define        BRANCH_next_fail        (REGNODE_MAX + 20)      /* state for 
BRANCH */
+#define        CURLYM_A                (REGNODE_MAX + 21)      /* state for 
CURLYM */
+#define        CURLYM_A_fail           (REGNODE_MAX + 22)      /* state for 
CURLYM */
+#define        CURLYM_B                (REGNODE_MAX + 23)      /* state for 
CURLYM */
+#define        CURLYM_B_fail           (REGNODE_MAX + 24)      /* state for 
CURLYM */
+#define        IFMATCH_A               (REGNODE_MAX + 25)      /* state for 
IFMATCH */
+#define        IFMATCH_A_fail          (REGNODE_MAX + 26)      /* state for 
IFMATCH */
+#define        CURLY_B_min_known       (REGNODE_MAX + 27)      /* state for 
CURLY */
+#define        CURLY_B_min_known_fail  (REGNODE_MAX + 28)      /* state for 
CURLY */
+#define        CURLY_B_min             (REGNODE_MAX + 29)      /* state for 
CURLY */
+#define        CURLY_B_min_fail        (REGNODE_MAX + 30)      /* state for 
CURLY */
+#define        CURLY_B_max             (REGNODE_MAX + 31)      /* state for 
CURLY */
+#define        CURLY_B_max_fail        (REGNODE_MAX + 32)      /* state for 
CURLY */
+#define        COMMIT_next             (REGNODE_MAX + 33)      /* state for 
COMMIT */
+#define        COMMIT_next_fail        (REGNODE_MAX + 34)      /* state for 
COMMIT */
+#define        MARKPOINT_next          (REGNODE_MAX + 35)      /* state for 
MARKPOINT */
+#define        MARKPOINT_next_fail     (REGNODE_MAX + 36)      /* state for 
MARKPOINT */
+#define        SKIP_next               (REGNODE_MAX + 37)      /* state for 
SKIP */
+#define        SKIP_next_fail          (REGNODE_MAX + 38)      /* state for 
SKIP */
+#define        CUTGROUP_next           (REGNODE_MAX + 39)      /* state for 
CUTGROUP */
+#define        CUTGROUP_next_fail      (REGNODE_MAX + 40)      /* state for 
CUTGROUP */
+#define        KEEPS_next              (REGNODE_MAX + 41)      /* state for 
KEEPS */
+#define        KEEPS_next_fail         (REGNODE_MAX + 42)      /* state for 
KEEPS */
 
 /* PL_regkind[] What type of regop or state is this. */
 
@@ -248,8 +250,10 @@ EXTCONST U8 PL_regkind[] = {
        /* ------------ States ------------- */
        TRIE,           /* TRIE_next              */
        TRIE,           /* TRIE_next_fail         */
-       EVAL,           /* EVAL_AB                */
-       EVAL,           /* EVAL_AB_fail           */
+       EVAL,           /* EVAL_B                 */
+       EVAL,           /* EVAL_B_fail            */
+       EVAL,           /* EVAL_postponed_AB      */
+       EVAL,           /* EVAL_postponed_AB_fail */
        CURLYX,         /* CURLYX_end             */
        CURLYX,         /* CURLYX_end_fail        */
        WHILEM,         /* WHILEM_A_pre           */
@@ -592,44 +596,46 @@ EXTCONST char * const PL_reg_name[] = {
        /* ------------ States ------------- */
        "TRIE_next",                    /* REGNODE_MAX +0x01 */
        "TRIE_next_fail",               /* REGNODE_MAX +0x02 */
-       "EVAL_AB",                      /* REGNODE_MAX +0x03 */
-       "EVAL_AB_fail",                 /* REGNODE_MAX +0x04 */
-       "CURLYX_end",                   /* REGNODE_MAX +0x05 */
-       "CURLYX_end_fail",              /* REGNODE_MAX +0x06 */
-       "WHILEM_A_pre",                 /* REGNODE_MAX +0x07 */
-       "WHILEM_A_pre_fail",            /* REGNODE_MAX +0x08 */
-       "WHILEM_A_min",                 /* REGNODE_MAX +0x09 */
-       "WHILEM_A_min_fail",            /* REGNODE_MAX +0x0a */
-       "WHILEM_A_max",                 /* REGNODE_MAX +0x0b */
-       "WHILEM_A_max_fail",            /* REGNODE_MAX +0x0c */
-       "WHILEM_B_min",                 /* REGNODE_MAX +0x0d */
-       "WHILEM_B_min_fail",            /* REGNODE_MAX +0x0e */
-       "WHILEM_B_max",                 /* REGNODE_MAX +0x0f */
-       "WHILEM_B_max_fail",            /* REGNODE_MAX +0x10 */
-       "BRANCH_next",                  /* REGNODE_MAX +0x11 */
-       "BRANCH_next_fail",             /* REGNODE_MAX +0x12 */
-       "CURLYM_A",                     /* REGNODE_MAX +0x13 */
-       "CURLYM_A_fail",                /* REGNODE_MAX +0x14 */
-       "CURLYM_B",                     /* REGNODE_MAX +0x15 */
-       "CURLYM_B_fail",                /* REGNODE_MAX +0x16 */
-       "IFMATCH_A",                    /* REGNODE_MAX +0x17 */
-       "IFMATCH_A_fail",               /* REGNODE_MAX +0x18 */
-       "CURLY_B_min_known",            /* REGNODE_MAX +0x19 */
-       "CURLY_B_min_known_fail",       /* REGNODE_MAX +0x1a */
-       "CURLY_B_min",                  /* REGNODE_MAX +0x1b */
-       "CURLY_B_min_fail",             /* REGNODE_MAX +0x1c */
-       "CURLY_B_max",                  /* REGNODE_MAX +0x1d */
-       "CURLY_B_max_fail",             /* REGNODE_MAX +0x1e */
-       "COMMIT_next",                  /* REGNODE_MAX +0x1f */
-       "COMMIT_next_fail",             /* REGNODE_MAX +0x20 */
-       "MARKPOINT_next",               /* REGNODE_MAX +0x21 */
-       "MARKPOINT_next_fail",          /* REGNODE_MAX +0x22 */
-       "SKIP_next",                    /* REGNODE_MAX +0x23 */
-       "SKIP_next_fail",               /* REGNODE_MAX +0x24 */
-       "CUTGROUP_next",                /* REGNODE_MAX +0x25 */
-       "CUTGROUP_next_fail",           /* REGNODE_MAX +0x26 */
-       "KEEPS_next",                   /* REGNODE_MAX +0x27 */
-       "KEEPS_next_fail",              /* REGNODE_MAX +0x28 */
+       "EVAL_B",                       /* REGNODE_MAX +0x03 */
+       "EVAL_B_fail",                  /* REGNODE_MAX +0x04 */
+       "EVAL_postponed_AB",            /* REGNODE_MAX +0x05 */
+       "EVAL_postponed_AB_fail",       /* REGNODE_MAX +0x06 */
+       "CURLYX_end",                   /* REGNODE_MAX +0x07 */
+       "CURLYX_end_fail",              /* REGNODE_MAX +0x08 */
+       "WHILEM_A_pre",                 /* REGNODE_MAX +0x09 */
+       "WHILEM_A_pre_fail",            /* REGNODE_MAX +0x0a */
+       "WHILEM_A_min",                 /* REGNODE_MAX +0x0b */
+       "WHILEM_A_min_fail",            /* REGNODE_MAX +0x0c */
+       "WHILEM_A_max",                 /* REGNODE_MAX +0x0d */
+       "WHILEM_A_max_fail",            /* REGNODE_MAX +0x0e */
+       "WHILEM_B_min",                 /* REGNODE_MAX +0x0f */
+       "WHILEM_B_min_fail",            /* REGNODE_MAX +0x10 */
+       "WHILEM_B_max",                 /* REGNODE_MAX +0x11 */
+       "WHILEM_B_max_fail",            /* REGNODE_MAX +0x12 */
+       "BRANCH_next",                  /* REGNODE_MAX +0x13 */
+       "BRANCH_next_fail",             /* REGNODE_MAX +0x14 */
+       "CURLYM_A",                     /* REGNODE_MAX +0x15 */
+       "CURLYM_A_fail",                /* REGNODE_MAX +0x16 */
+       "CURLYM_B",                     /* REGNODE_MAX +0x17 */
+       "CURLYM_B_fail",                /* REGNODE_MAX +0x18 */
+       "IFMATCH_A",                    /* REGNODE_MAX +0x19 */
+       "IFMATCH_A_fail",               /* REGNODE_MAX +0x1a */
+       "CURLY_B_min_known",            /* REGNODE_MAX +0x1b */
+       "CURLY_B_min_known_fail",       /* REGNODE_MAX +0x1c */
+       "CURLY_B_min",                  /* REGNODE_MAX +0x1d */
+       "CURLY_B_min_fail",             /* REGNODE_MAX +0x1e */
+       "CURLY_B_max",                  /* REGNODE_MAX +0x1f */
+       "CURLY_B_max_fail",             /* REGNODE_MAX +0x20 */
+       "COMMIT_next",                  /* REGNODE_MAX +0x21 */
+       "COMMIT_next_fail",             /* REGNODE_MAX +0x22 */
+       "MARKPOINT_next",               /* REGNODE_MAX +0x23 */
+       "MARKPOINT_next_fail",          /* REGNODE_MAX +0x24 */
+       "SKIP_next",                    /* REGNODE_MAX +0x25 */
+       "SKIP_next_fail",               /* REGNODE_MAX +0x26 */
+       "CUTGROUP_next",                /* REGNODE_MAX +0x27 */
+       "CUTGROUP_next_fail",           /* REGNODE_MAX +0x28 */
+       "KEEPS_next",                   /* REGNODE_MAX +0x29 */
+       "KEEPS_next_fail",              /* REGNODE_MAX +0x2a */
 };
 #endif /* DOINIT */
 
diff --git a/t/perf/benchmarks b/t/perf/benchmarks
index 233f1fbdc5..dec29bf503 100644
--- a/t/perf/benchmarks
+++ b/t/perf/benchmarks
@@ -1377,4 +1377,15 @@
         setup   => '$_ = ("0" x 100) . ("a" x 100);',
         code    => '/[acgt]+/',
     },
+
+    'regex::whilem::min_captures_fail' => {
+        desc    => '/WHILEM with anon-greedy match and captures that fails',
+        setup   => '$_ = ("a" x 20)',
+        code    => '/^(?:(.)(.))*?[XY]/',
+    },
+    'regex::whilem::max_captures_fail' => {
+        desc    => '/WHILEM with a greedy match and captures that fails',
+        setup   => '$_ = ("a" x 20)',
+        code    => '/^(?:(.)(.))*[XY]/',
+    },
 ];
diff --git a/t/re/pat_re_eval.t b/t/re/pat_re_eval.t
index 4df88aff57..6921d38bac 100644
--- a/t/re/pat_re_eval.t
+++ b/t/re/pat_re_eval.t
@@ -22,7 +22,7 @@ BEGIN {
 }
 
 
-plan tests => 529;  # Update this when adding/deleting tests.
+plan tests => 533;  # Update this when adding/deleting tests.
 
 run_tests() unless caller;
 
@@ -1253,6 +1253,44 @@ sub run_tests {
         like $@, qr/Reference to nonexistent group/, "RT #130650";
     }
 
+    # RT #129881
+    # on exit from a pattern with multiple code blocks from different
+    # CVs, PL_comppad wasn't being restored correctly
+
+    sub {
+        # give first few pad slots known values
+        my ($x1, $x2, $x3, $x4, $x5) = 101..105;
+        # these vars are in a separate pad
+        my $r = qr/((?{my ($y1, $y2) = 201..202; 1;})A){2}X/;
+        # the first alt fails, causing a switch to this anon
+        # sub's pad
+        "AAA" =~ /$r|(?{my ($z1, $z2) = 301..302; 1;})A/;
+        is $x1, 101, "RT #129881: x1";
+        is $x2, 102, "RT #129881: x2";
+        is $x3, 103, "RT #129881: x3";
+    }->();
+
+
+    # RT #126697
+    # savestack wasn't always being unwound on EVAL failure
+    {
+        local our $i = 0;
+        my $max = 0;
+
+        'ABC' =~ m{
+            \A
+            (?:
+                (?: AB | A | BC )
+                (?{
+                    local $i = $i + 1;
+                    $max = $i if $max < $i;
+                })
+            )*
+            \z
+        }x;
+        is $max, 2, "RT #126697";
+    }
+
 
 } # End of sub run_tests
 

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.25.9-146-gca81c32aad

Reply via email to