In perl.git, the branch blead has been updated <http://perl5.git.perl.org/perl.git/commitdiff/ca81c32aad9c602582b8f87a832594c8712bf6a6?hp=57c819f845c985ed9979bfa76b1b8ca1708370f0>
- Log ----------------------------------------------------------------- commit ca81c32aad9c602582b8f87a832594c8712bf6a6 Merge: 57c819f845 77584140f7 Author: David Mitchell <[email protected]> Date: Tue Feb 14 17:50:11 2017 +0000 [MERGE] regex (?{...}) and WHILEM scope fixups commit 77584140f7cbfe714083cacfa671085466e98a7b Author: David Mitchell <[email protected]> Date: Tue Feb 14 17:10:34 2017 +0000 S_regmatch: eliminate WHILEM_A_min paren saving In something like "a1b2c3d4..." =~ /(?:(\w)(\d))*..../ A WHILEM state is pushed for each iteration of the '*'. Part of this state saving includes the previous indices for each of the captures within the body of the thing being iterated over. So we save the following sets of values for $1,$2: ()() (a)(1) (b)(2) (c)(3) (d)(4) Then if at any point we backtrack, we can undo one or more iterations and restore the older values of $1,$2. However, when the match is non-greedy, as in A*?B, then on failure of B and backtracking we attempt *more* A's rather than removing some already matched A's. So there's never any need to save all the current paren state for each iteration. This eliminates a lot of per-iteration overhead for minimal WHILEMs and makes the following run about 25% faster: $s = ("a" x 1000); $s =~ /^(?:(.)(.))*?[XY]/ for 1..10_000; M regexec.c M t/perf/benchmarks commit bb414e1295cbc3c4c2a55aaf82d832d6c8bf76ec Author: David Mitchell <[email protected]> Date: Tue Feb 14 16:28:31 2017 +0000 S_regmatch: eliminate WHILEM_B paren saving In something like "a1b2c3d4..." =~ /(?:(\w)(\d))*..../ A WHILEM state is pushed for each iteration of the '*'. Part of this state saving includes the previous indices for each of the captures within the body of the thing being iterated over. So we save the following sets of values for $1,$2: ()() (a)(1) (b)(2) (c)(3) (d)(4) Then if at any point we backtrack, we can undo one or more iterations and restore the older values of $1,$2. For /A*B/ where A is a complex sub-pattern like (\w)(\d), we currently save the paren state each time we're about to attempt to iterate another A. But it turns out that for non-greedy matching, i.e. A*?B, we also save the paren state before executing B. This is unnecessary, as B can't alter the capture state of the parens within A. So eliminate it. If in the future some sneaky regex is found which this commit breaks, then as well as restoring the old behaviour, you should look carefully to see whether similar paren-saving behaviour for B should be added to greedy matches too, i.e. A*B. It was partly the discrepancy between saving for A*?B but not for A*B which made me suspect it was redundant. M regexec.c commit cbb658a1562fa3da6a29d865ee9b0ba564affb3f Author: David Mitchell <[email protected]> Date: Tue Feb 14 16:21:40 2017 +0000 Add a comment on why TRIE.jump does a UNWIND_PAREN (it wasn't obvious to me) M regexec.c commit 4ee16520199a0e11bf4dbdbac71f0a64e5510271 Author: David Mitchell <[email protected]> Date: Tue Feb 14 15:59:57 2017 +0000 clear savestack on (?{...}) failure and backtrack RT #126697 In a regex, after executing a (?{...}) code block, if we fail and backtrack over the codeblock, we're supposed to unwind the savestack, so that for any example any local()s within the code block are undone. It turns out that a backtracking state isn't pushed for (?{...}), only for postponed evals ( i.e. (??{...})). This means that it relies on one of the earlier backtracking states to clear the savestack on its behalf. This can't always be relied upon, and the ticket above contains code where this falls down; in particular: 'ABC' =~ m{ \A (?: (?: AB | A | BC ) (?{ local $count = $count + 1; print "! count=$count; ; pos=${\pos}\n"; }) )* \z }x Here we end up relying on TRIE_next to do the cleaning up, but TRIE_next doesn't, since there's nothing it would be responsible for that needs cleaning up. The solution to this is to push a backtrack state for every (?{...}) as well as every (??{...}). The sole job of that state is to do a LEAVE_SCOPE(ST.lastcp). The existing backtrack state EVAL_AB has been renamed EVAL_postponed_AB to make it clear it's only used on postponed /(??{A})B/ regexes, and a new state has been added, EVAL_B, which is only called when backtracking after failing something in the B in /(?{...})B/. M regcomp.sym M regexec.c M regnodes.h M t/re/pat_re_eval.t commit f4197b8e1d1f984eb6635ea41ce14394432b96b1 Author: David Mitchell <[email protected]> Date: Tue Feb 14 13:32:16 2017 +0000 -Mre=Debug,ALL: indicate regex state stack pushes At this maximal level of debugging output, it displays the top 3 state stack entries each time it pushes, but with no obvious indication that a push is occurring. This commit changes this output: | 1| Setting an EVAL scope, savestack=9, | 2| #4 WHILEM_A_max | 2| #3 WHILEM_A_max | 2| #2 CURLYX_end yes 0 <abcdef> <g> | 2| 4:POSIXD[\w](5) to be this (which includes the word "push" and extra indentation for the stack dump): | 1| Setting an EVAL scope, savestack=9, | 2| push #4 WHILEM_A_max | 2| #3 WHILEM_A_max | 2| #2 CURLYX_end yes 0 <abcdef> <g> | 2| 4:POSIXD[\w](5) Also, replace curd (current depth) var with a positive integer offset (i) var, to avoid signed/unsigned mixing problems. M regexec.c commit 4b9c7caeaecf4e9df0be3a2e296644f763f775d6 Author: David Mitchell <[email protected]> Date: Sat Feb 11 11:53:41 2017 +0000 fix pad/scope issue in re_evals RT #129881 heap-buffer-overflow Perl_pad_sv In some circumstances involving a pattern which has embedded code blocks from more than one source, e.g. my $r = qr{(?{1;}){2}X}; "" =~ /$r|(?{1;})/; the wrong PL_comppad could be active while doing a LEAVE_SCOPE() or on exit from the pattern. This was mainly due to the big context stack changes in 5.24.0 - in particular, since POP_MULTICALL() now does CX_LEAVE_SCOPE(cx) *before* restoring PL_comppad, the (correct) unwinding of any SAVECOMPPAD's was being followed by C<PL_comppad = cx->blk_sub.prevcomppad>, which wasn't necessarily a sensible value. To fix this, record the value of PL_savestack_ix at entry to S_regmatch(), and set the cx->blk_oldsaveix of the MULTICALL to this value when pushed. On exit from S_regmatch, we either POP_MULTICALL which will do a LEAVE_SCOPE(cx->blk_oldsaveix), or in the absense of any EVAL, do the explicit but equivalent LEAVE_SCOPE(orig_savestack_ix). Note that this is a change in behaviour to S_regmatch() - formerly it wouldn't necessarily clear the savestack completely back the point of entry - that would get left to do by its caller, S_regtry(), or indirectly by Perl_regexec_flags(). This shouldn't make any practical difference, but is tidier and less likely to introduce bugs later. M regexec.c M t/re/pat_re_eval.t ----------------------------------------------------------------------- Summary of changes: regcomp.sym | 2 +- regexec.c | 120 ++++++++++++++++++++++++++------------- regnodes.h | 164 +++++++++++++++++++++++++++-------------------------- t/perf/benchmarks | 11 ++++ t/re/pat_re_eval.t | 40 ++++++++++++- 5 files changed, 218 insertions(+), 119 deletions(-) diff --git a/regcomp.sym b/regcomp.sym index ac67955270..999d965565 100644 --- a/regcomp.sym +++ b/regcomp.sym @@ -243,7 +243,7 @@ PSEUDO PSEUDO, off ; Pseudo opcode for internal use. # # TRIE next:FAIL -EVAL AB:FAIL +EVAL B,postponed_AB:FAIL CURLYX end:FAIL WHILEM A_pre,A_min,A_max,B_min,B_max:FAIL BRANCH next:FAIL diff --git a/regexec.c b/regexec.c index cf6b5483c4..bd7887086f 100644 --- a/regexec.c +++ b/regexec.c @@ -5391,12 +5391,12 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) U8 gimme = G_SCALAR; CV *caller_cv = NULL; /* who called us */ CV *last_pushed_cv = NULL; /* most recently called (?{}) CV */ - CHECKPOINT runops_cp; /* savestack position before executing EVAL */ U32 maxopenparen = 0; /* max '(' index seen so far */ int to_complement; /* Invert the result? */ _char_class_number classnum; bool is_utf8_pat = reginfo->is_utf8_pat; bool match = FALSE; + I32 orig_savestack_ix = PL_savestack_ix; /* Solaris Studio 12.3 messes up fetching PL_charclass['\n'] */ #if (defined(__SUNPRO_C) && (__SUNPRO_C == 0x5120) && defined(__x86_64) && defined(USE_64_BIT_ALL)) @@ -5736,6 +5736,11 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) { U8 *uc; if ( ST.jump ) { + /* undo any captures done in the tail part of a branch, + * e.g. + * /(?:X(.)(.)|Y(.)).../ + * where the trie just matches X then calls out to do the + * rest of the branch */ REGCP_UNWIND(ST.cp); UNWIND_PAREN(ST.lastparen, ST.lastcloseparen); } @@ -6786,7 +6791,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) goto eval_recurse_doit; /* NOTREACHED */ - case EVAL: /* /(?{A})B/ /(??{A})B/ and /(?(?{A})X|Y)B/ */ + case EVAL: /* /(?{...})B/ /(??{A})B/ and /(?(?{...})X|Y)B/ */ if (cur_eval && cur_eval->locinput==locinput) { if ( ++nochange_depth > max_nochange_depth ) Perl_croak(aTHX_ "EVAL without pos change exceeded limit in regex"); @@ -6805,7 +6810,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) /* save *all* paren positions */ regcppush(rex, 0, maxopenparen); - REGCP_SET(runops_cp); + REGCP_SET(ST.lastcp); if (!caller_cv) caller_cv = find_runcv(NULL); @@ -6830,30 +6835,67 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) nop = (OP*)rexi->data->data[n]; } - /* normally if we're about to execute code from the same - * CV that we used previously, we just use the existing - * CX stack entry. However, its possible that in the - * meantime we may have backtracked, popped from the save - * stack, and undone the SAVECOMPPAD(s) associated with - * PUSH_MULTICALL; in which case PL_comppad no longer - * points to newcv's pad. */ + /* Some notes about MULTICALL and the context and save stacks. + * + * In something like + * /...(?{ my $x)}...(?{ my $z)}...(?{ my $z)}.../ + * since codeblocks don't introduce a new scope (so that + * local() etc accumulate), at the end of a successful + * match there will be a SAVEt_CLEARSV on the savestack + * for each of $x, $y, $z. If the three code blocks above + * happen to have come from different CVs (e.g. via + * embedded qr//s), then we must ensure that during any + * savestack unwinding, PL_comppad always points to the + * right pad at each moment. We achieve this by + * interleaving SAVEt_COMPPAD's on the savestack whenever + * there is a change of pad. + * In theory whenever we call a code block, we should + * push a CXt_SUB context, then pop it on return from + * that code block. This causes a bit of an issue in that + * normally popping a context also clears the savestack + * back to cx->blk_oldsaveix, but here we specifically + * don't want to clear the save stack on exit from the + * code block. + * Also for efficiency we don't want to keep pushing and + * popping the single SUB context as we backtrack etc. + * So instead, we push a single context the first time + * we need, it, then hang onto it until the end of this + * function. Whenever we encounter a new code block, we + * update the CV etc if that's changed. During the times + * in this function where we're not executing a code + * block, having the SUB context still there is a bit + * naughty - but we hope that no-one notices. + * When the SUB context is initially pushed, we fake up + * cx->blk_oldsaveix to be as if we'd pushed this context + * on first entry to S_regmatch rather than at some random + * point during the regexe execution. That way if we + * croak, popping the context stack will ensure that + * *everything* SAVEd by this function is undone and then + * the context popped, rather than e.g., popping the + * context (and restoring the original PL_comppad) then + * popping more of the savestack and restoiring a bad + * PL_comppad. + */ + + /* If this is the first EVAL, push a MULTICALL. On + * subsequent calls, if we're executing a different CV, or + * if PL_comppad has got messed up from backtracking + * through SAVECOMPPADs, then refresh the context. + */ if (newcv != last_pushed_cv || PL_comppad != last_pad) { U8 flags = (CXp_SUB_RE | ((newcv == caller_cv) ? CXp_SUB_RE_FAKE : 0)); + SAVECOMPPAD(); if (last_pushed_cv) { - /* PUSH/POP_MULTICALL save and restore the - * caller's PL_comppad; if we call multiple subs - * using the same CX block, we have to save and - * unwind the varying PL_comppad's ourselves, - * especially restoring the right PL_comppad on - * backtrack - so save it on the save stack */ - SAVECOMPPAD(); CHANGE_MULTICALL_FLAGS(newcv, flags); } else { PUSH_MULTICALL_FLAGS(newcv, flags); } + /* see notes above */ + CX_CUR()->blk_oldsaveix = orig_savestack_ix; + last_pushed_cv = newcv; } else { @@ -6970,12 +7012,14 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) * in the regexp code uses the pad ! */ PL_op = oop; PL_curcop = ocurcop; - regcp_restore(rex, runops_cp, &maxopenparen); + regcp_restore(rex, ST.lastcp, &maxopenparen); PL_curpm_under = PL_curpm; PL_curpm = PL_reg_curpm; - if (logical != 2) - break; + if (logical != 2) { + PUSH_STATE_GOTO(EVAL_B, next, locinput); + /* NOTREACHED */ + } } /* only /(??{})/ from now on */ @@ -7073,11 +7117,11 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) ST.prev_eval = cur_eval; cur_eval = st; /* now continue from first node in postoned RE */ - PUSH_YES_STATE_GOTO(EVAL_AB, startpoint, locinput); + PUSH_YES_STATE_GOTO(EVAL_postponed_AB, startpoint, locinput); NOT_REACHED; /* NOTREACHED */ } - case EVAL_AB: /* cleanup after a successful (??{A})B */ + case EVAL_postponed_AB: /* cleanup after a successful (??{A})B */ /* note: this is called twice; first after popping B, then A */ DEBUG_STACK_r({ Perl_re_exec_indentf( aTHX_ "EVAL_AB cur_eval=%p prev_eval=%p\n", @@ -7123,7 +7167,11 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) sayYES; - case EVAL_AB_fail: /* unsuccessfully ran A or B in (??{A})B */ + case EVAL_B_fail: /* unsuccessful B in (?{...})B */ + REGCP_UNWIND(ST.lastcp); + sayNO; + + case EVAL_postponed_AB_fail: /* unsuccessfully ran A or B in (??{A})B */ /* note: this is called twice; first after popping B, then A */ DEBUG_STACK_r({ Perl_re_exec_indentf( aTHX_ "EVAL_AB_fail cur_eval=%p prev_eval=%p\n", @@ -7523,9 +7571,6 @@ NULL if (cur_curlyx->u.curlyx.minmod) { ST.save_curlyx = cur_curlyx; cur_curlyx = cur_curlyx->u.curlyx.prev_curlyx; - ST.cp = regcppush(rex, ST.save_curlyx->u.curlyx.parenfloor, - maxopenparen); - REGCP_SET(ST.lastcp); PUSH_YES_STATE_GOTO(WHILEM_B_min, ST.save_curlyx->u.curlyx.B, locinput); NOT_REACHED; /* NOTREACHED */ @@ -7558,11 +7603,11 @@ NULL CACHEsayNO; NOT_REACHED; /* NOTREACHED */ - case WHILEM_A_min_fail: /* just failed to match A in a minimal match */ - /* FALLTHROUGH */ case WHILEM_A_pre_fail: /* just failed to match even minimal A */ REGCP_UNWIND(ST.lastcp); regcppop(rex, &maxopenparen); + /* FALLTHROUGH */ + case WHILEM_A_min_fail: /* just failed to match A in a minimal match */ cur_curlyx->u.curlyx.lastloc = ST.save_lastloc; cur_curlyx->u.curlyx.count--; CACHEsayNO; @@ -7595,8 +7640,6 @@ NULL case WHILEM_B_min_fail: /* just failed to match B in a minimal match */ cur_curlyx = ST.save_curlyx; - REGCP_UNWIND(ST.lastcp); - regcppop(rex, &maxopenparen); if (cur_curlyx->u.curlyx.count >= /*max*/ARG2(cur_curlyx->u.curlyx.me)) { /* Maximum greed exceeded */ @@ -7618,9 +7661,6 @@ NULL ); /* Try grabbing another A and see if it helps. */ cur_curlyx->u.curlyx.lastloc = locinput; - ST.cp = regcppush(rex, cur_curlyx->u.curlyx.parenfloor, - maxopenparen); - REGCP_SET(ST.lastcp); PUSH_STATE_GOTO(WHILEM_A_min, /*A*/ NEXTOPER(ST.save_curlyx->u.curlyx.me) + EXTRA_STEP_2ARGS, locinput); @@ -8225,7 +8265,7 @@ NULL SET_RECURSE_LOCINPUT("FAKE-END[after]", cur_eval->locinput); - PUSH_YES_STATE_GOTO(EVAL_AB, st->u.eval.prev_eval->u.eval.B, + PUSH_YES_STATE_GOTO(EVAL_postponed_AB, st->u.eval.prev_eval->u.eval.B, locinput); /* match B */ } @@ -8482,16 +8522,17 @@ NULL DEBUG_STACK_r({ regmatch_state *cur = st; regmatch_state *curyes = yes_state; - int curd = depth; + U32 i; regmatch_slab *slab = PL_regmatch_slab; - for (;curd > -1 && (depth-curd < 3);cur--,curd--) { + for (i = 0; i < 3 && i <= depth; cur--,i++) { if (cur < SLAB_FIRST(slab)) { slab = slab->prev; cur = SLAB_LAST(slab); } - Perl_re_exec_indentf( aTHX_ "#%-3d %-10s %s\n", + Perl_re_exec_indentf( aTHX_ "%4s #%-3d %-10s %s\n", depth, - curd, PL_reg_name[cur->resume_state], + i ? " " : "push", + depth - i, PL_reg_name[cur->resume_state], (curyes == cur) ? "yes" : "" ); if (curyes == cur) @@ -8643,9 +8684,12 @@ NULL if (last_pushed_cv) { dSP; + /* see "Some notes about MULTICALL" above */ POP_MULTICALL; PERL_UNUSED_VAR(SP); } + else + LEAVE_SCOPE(orig_savestack_ix); assert(!result || locinput - reginfo->strbeg >= 0); return result ? locinput - reginfo->strbeg : -1; diff --git a/regnodes.h b/regnodes.h index f820c5684e..8fe0f41d1b 100644 --- a/regnodes.h +++ b/regnodes.h @@ -7,7 +7,7 @@ /* Regops and State definitions */ #define REGNODE_MAX 92 -#define REGMATCH_STATE_MAX 132 +#define REGMATCH_STATE_MAX 134 #define END 0 /* 0000 End of program. */ #define SUCCEED 1 /* 0x01 Return from a subroutine, basically. */ @@ -107,44 +107,46 @@ /* ------------ States ------------- */ #define TRIE_next (REGNODE_MAX + 1) /* state for TRIE */ #define TRIE_next_fail (REGNODE_MAX + 2) /* state for TRIE */ -#define EVAL_AB (REGNODE_MAX + 3) /* state for EVAL */ -#define EVAL_AB_fail (REGNODE_MAX + 4) /* state for EVAL */ -#define CURLYX_end (REGNODE_MAX + 5) /* state for CURLYX */ -#define CURLYX_end_fail (REGNODE_MAX + 6) /* state for CURLYX */ -#define WHILEM_A_pre (REGNODE_MAX + 7) /* state for WHILEM */ -#define WHILEM_A_pre_fail (REGNODE_MAX + 8) /* state for WHILEM */ -#define WHILEM_A_min (REGNODE_MAX + 9) /* state for WHILEM */ -#define WHILEM_A_min_fail (REGNODE_MAX + 10) /* state for WHILEM */ -#define WHILEM_A_max (REGNODE_MAX + 11) /* state for WHILEM */ -#define WHILEM_A_max_fail (REGNODE_MAX + 12) /* state for WHILEM */ -#define WHILEM_B_min (REGNODE_MAX + 13) /* state for WHILEM */ -#define WHILEM_B_min_fail (REGNODE_MAX + 14) /* state for WHILEM */ -#define WHILEM_B_max (REGNODE_MAX + 15) /* state for WHILEM */ -#define WHILEM_B_max_fail (REGNODE_MAX + 16) /* state for WHILEM */ -#define BRANCH_next (REGNODE_MAX + 17) /* state for BRANCH */ -#define BRANCH_next_fail (REGNODE_MAX + 18) /* state for BRANCH */ -#define CURLYM_A (REGNODE_MAX + 19) /* state for CURLYM */ -#define CURLYM_A_fail (REGNODE_MAX + 20) /* state for CURLYM */ -#define CURLYM_B (REGNODE_MAX + 21) /* state for CURLYM */ -#define CURLYM_B_fail (REGNODE_MAX + 22) /* state for CURLYM */ -#define IFMATCH_A (REGNODE_MAX + 23) /* state for IFMATCH */ -#define IFMATCH_A_fail (REGNODE_MAX + 24) /* state for IFMATCH */ -#define CURLY_B_min_known (REGNODE_MAX + 25) /* state for CURLY */ -#define CURLY_B_min_known_fail (REGNODE_MAX + 26) /* state for CURLY */ -#define CURLY_B_min (REGNODE_MAX + 27) /* state for CURLY */ -#define CURLY_B_min_fail (REGNODE_MAX + 28) /* state for CURLY */ -#define CURLY_B_max (REGNODE_MAX + 29) /* state for CURLY */ -#define CURLY_B_max_fail (REGNODE_MAX + 30) /* state for CURLY */ -#define COMMIT_next (REGNODE_MAX + 31) /* state for COMMIT */ -#define COMMIT_next_fail (REGNODE_MAX + 32) /* state for COMMIT */ -#define MARKPOINT_next (REGNODE_MAX + 33) /* state for MARKPOINT */ -#define MARKPOINT_next_fail (REGNODE_MAX + 34) /* state for MARKPOINT */ -#define SKIP_next (REGNODE_MAX + 35) /* state for SKIP */ -#define SKIP_next_fail (REGNODE_MAX + 36) /* state for SKIP */ -#define CUTGROUP_next (REGNODE_MAX + 37) /* state for CUTGROUP */ -#define CUTGROUP_next_fail (REGNODE_MAX + 38) /* state for CUTGROUP */ -#define KEEPS_next (REGNODE_MAX + 39) /* state for KEEPS */ -#define KEEPS_next_fail (REGNODE_MAX + 40) /* state for KEEPS */ +#define EVAL_B (REGNODE_MAX + 3) /* state for EVAL */ +#define EVAL_B_fail (REGNODE_MAX + 4) /* state for EVAL */ +#define EVAL_postponed_AB (REGNODE_MAX + 5) /* state for EVAL */ +#define EVAL_postponed_AB_fail (REGNODE_MAX + 6) /* state for EVAL */ +#define CURLYX_end (REGNODE_MAX + 7) /* state for CURLYX */ +#define CURLYX_end_fail (REGNODE_MAX + 8) /* state for CURLYX */ +#define WHILEM_A_pre (REGNODE_MAX + 9) /* state for WHILEM */ +#define WHILEM_A_pre_fail (REGNODE_MAX + 10) /* state for WHILEM */ +#define WHILEM_A_min (REGNODE_MAX + 11) /* state for WHILEM */ +#define WHILEM_A_min_fail (REGNODE_MAX + 12) /* state for WHILEM */ +#define WHILEM_A_max (REGNODE_MAX + 13) /* state for WHILEM */ +#define WHILEM_A_max_fail (REGNODE_MAX + 14) /* state for WHILEM */ +#define WHILEM_B_min (REGNODE_MAX + 15) /* state for WHILEM */ +#define WHILEM_B_min_fail (REGNODE_MAX + 16) /* state for WHILEM */ +#define WHILEM_B_max (REGNODE_MAX + 17) /* state for WHILEM */ +#define WHILEM_B_max_fail (REGNODE_MAX + 18) /* state for WHILEM */ +#define BRANCH_next (REGNODE_MAX + 19) /* state for BRANCH */ +#define BRANCH_next_fail (REGNODE_MAX + 20) /* state for BRANCH */ +#define CURLYM_A (REGNODE_MAX + 21) /* state for CURLYM */ +#define CURLYM_A_fail (REGNODE_MAX + 22) /* state for CURLYM */ +#define CURLYM_B (REGNODE_MAX + 23) /* state for CURLYM */ +#define CURLYM_B_fail (REGNODE_MAX + 24) /* state for CURLYM */ +#define IFMATCH_A (REGNODE_MAX + 25) /* state for IFMATCH */ +#define IFMATCH_A_fail (REGNODE_MAX + 26) /* state for IFMATCH */ +#define CURLY_B_min_known (REGNODE_MAX + 27) /* state for CURLY */ +#define CURLY_B_min_known_fail (REGNODE_MAX + 28) /* state for CURLY */ +#define CURLY_B_min (REGNODE_MAX + 29) /* state for CURLY */ +#define CURLY_B_min_fail (REGNODE_MAX + 30) /* state for CURLY */ +#define CURLY_B_max (REGNODE_MAX + 31) /* state for CURLY */ +#define CURLY_B_max_fail (REGNODE_MAX + 32) /* state for CURLY */ +#define COMMIT_next (REGNODE_MAX + 33) /* state for COMMIT */ +#define COMMIT_next_fail (REGNODE_MAX + 34) /* state for COMMIT */ +#define MARKPOINT_next (REGNODE_MAX + 35) /* state for MARKPOINT */ +#define MARKPOINT_next_fail (REGNODE_MAX + 36) /* state for MARKPOINT */ +#define SKIP_next (REGNODE_MAX + 37) /* state for SKIP */ +#define SKIP_next_fail (REGNODE_MAX + 38) /* state for SKIP */ +#define CUTGROUP_next (REGNODE_MAX + 39) /* state for CUTGROUP */ +#define CUTGROUP_next_fail (REGNODE_MAX + 40) /* state for CUTGROUP */ +#define KEEPS_next (REGNODE_MAX + 41) /* state for KEEPS */ +#define KEEPS_next_fail (REGNODE_MAX + 42) /* state for KEEPS */ /* PL_regkind[] What type of regop or state is this. */ @@ -248,8 +250,10 @@ EXTCONST U8 PL_regkind[] = { /* ------------ States ------------- */ TRIE, /* TRIE_next */ TRIE, /* TRIE_next_fail */ - EVAL, /* EVAL_AB */ - EVAL, /* EVAL_AB_fail */ + EVAL, /* EVAL_B */ + EVAL, /* EVAL_B_fail */ + EVAL, /* EVAL_postponed_AB */ + EVAL, /* EVAL_postponed_AB_fail */ CURLYX, /* CURLYX_end */ CURLYX, /* CURLYX_end_fail */ WHILEM, /* WHILEM_A_pre */ @@ -592,44 +596,46 @@ EXTCONST char * const PL_reg_name[] = { /* ------------ States ------------- */ "TRIE_next", /* REGNODE_MAX +0x01 */ "TRIE_next_fail", /* REGNODE_MAX +0x02 */ - "EVAL_AB", /* REGNODE_MAX +0x03 */ - "EVAL_AB_fail", /* REGNODE_MAX +0x04 */ - "CURLYX_end", /* REGNODE_MAX +0x05 */ - "CURLYX_end_fail", /* REGNODE_MAX +0x06 */ - "WHILEM_A_pre", /* REGNODE_MAX +0x07 */ - "WHILEM_A_pre_fail", /* REGNODE_MAX +0x08 */ - "WHILEM_A_min", /* REGNODE_MAX +0x09 */ - "WHILEM_A_min_fail", /* REGNODE_MAX +0x0a */ - "WHILEM_A_max", /* REGNODE_MAX +0x0b */ - "WHILEM_A_max_fail", /* REGNODE_MAX +0x0c */ - "WHILEM_B_min", /* REGNODE_MAX +0x0d */ - "WHILEM_B_min_fail", /* REGNODE_MAX +0x0e */ - "WHILEM_B_max", /* REGNODE_MAX +0x0f */ - "WHILEM_B_max_fail", /* REGNODE_MAX +0x10 */ - "BRANCH_next", /* REGNODE_MAX +0x11 */ - "BRANCH_next_fail", /* REGNODE_MAX +0x12 */ - "CURLYM_A", /* REGNODE_MAX +0x13 */ - "CURLYM_A_fail", /* REGNODE_MAX +0x14 */ - "CURLYM_B", /* REGNODE_MAX +0x15 */ - "CURLYM_B_fail", /* REGNODE_MAX +0x16 */ - "IFMATCH_A", /* REGNODE_MAX +0x17 */ - "IFMATCH_A_fail", /* REGNODE_MAX +0x18 */ - "CURLY_B_min_known", /* REGNODE_MAX +0x19 */ - "CURLY_B_min_known_fail", /* REGNODE_MAX +0x1a */ - "CURLY_B_min", /* REGNODE_MAX +0x1b */ - "CURLY_B_min_fail", /* REGNODE_MAX +0x1c */ - "CURLY_B_max", /* REGNODE_MAX +0x1d */ - "CURLY_B_max_fail", /* REGNODE_MAX +0x1e */ - "COMMIT_next", /* REGNODE_MAX +0x1f */ - "COMMIT_next_fail", /* REGNODE_MAX +0x20 */ - "MARKPOINT_next", /* REGNODE_MAX +0x21 */ - "MARKPOINT_next_fail", /* REGNODE_MAX +0x22 */ - "SKIP_next", /* REGNODE_MAX +0x23 */ - "SKIP_next_fail", /* REGNODE_MAX +0x24 */ - "CUTGROUP_next", /* REGNODE_MAX +0x25 */ - "CUTGROUP_next_fail", /* REGNODE_MAX +0x26 */ - "KEEPS_next", /* REGNODE_MAX +0x27 */ - "KEEPS_next_fail", /* REGNODE_MAX +0x28 */ + "EVAL_B", /* REGNODE_MAX +0x03 */ + "EVAL_B_fail", /* REGNODE_MAX +0x04 */ + "EVAL_postponed_AB", /* REGNODE_MAX +0x05 */ + "EVAL_postponed_AB_fail", /* REGNODE_MAX +0x06 */ + "CURLYX_end", /* REGNODE_MAX +0x07 */ + "CURLYX_end_fail", /* REGNODE_MAX +0x08 */ + "WHILEM_A_pre", /* REGNODE_MAX +0x09 */ + "WHILEM_A_pre_fail", /* REGNODE_MAX +0x0a */ + "WHILEM_A_min", /* REGNODE_MAX +0x0b */ + "WHILEM_A_min_fail", /* REGNODE_MAX +0x0c */ + "WHILEM_A_max", /* REGNODE_MAX +0x0d */ + "WHILEM_A_max_fail", /* REGNODE_MAX +0x0e */ + "WHILEM_B_min", /* REGNODE_MAX +0x0f */ + "WHILEM_B_min_fail", /* REGNODE_MAX +0x10 */ + "WHILEM_B_max", /* REGNODE_MAX +0x11 */ + "WHILEM_B_max_fail", /* REGNODE_MAX +0x12 */ + "BRANCH_next", /* REGNODE_MAX +0x13 */ + "BRANCH_next_fail", /* REGNODE_MAX +0x14 */ + "CURLYM_A", /* REGNODE_MAX +0x15 */ + "CURLYM_A_fail", /* REGNODE_MAX +0x16 */ + "CURLYM_B", /* REGNODE_MAX +0x17 */ + "CURLYM_B_fail", /* REGNODE_MAX +0x18 */ + "IFMATCH_A", /* REGNODE_MAX +0x19 */ + "IFMATCH_A_fail", /* REGNODE_MAX +0x1a */ + "CURLY_B_min_known", /* REGNODE_MAX +0x1b */ + "CURLY_B_min_known_fail", /* REGNODE_MAX +0x1c */ + "CURLY_B_min", /* REGNODE_MAX +0x1d */ + "CURLY_B_min_fail", /* REGNODE_MAX +0x1e */ + "CURLY_B_max", /* REGNODE_MAX +0x1f */ + "CURLY_B_max_fail", /* REGNODE_MAX +0x20 */ + "COMMIT_next", /* REGNODE_MAX +0x21 */ + "COMMIT_next_fail", /* REGNODE_MAX +0x22 */ + "MARKPOINT_next", /* REGNODE_MAX +0x23 */ + "MARKPOINT_next_fail", /* REGNODE_MAX +0x24 */ + "SKIP_next", /* REGNODE_MAX +0x25 */ + "SKIP_next_fail", /* REGNODE_MAX +0x26 */ + "CUTGROUP_next", /* REGNODE_MAX +0x27 */ + "CUTGROUP_next_fail", /* REGNODE_MAX +0x28 */ + "KEEPS_next", /* REGNODE_MAX +0x29 */ + "KEEPS_next_fail", /* REGNODE_MAX +0x2a */ }; #endif /* DOINIT */ diff --git a/t/perf/benchmarks b/t/perf/benchmarks index 233f1fbdc5..dec29bf503 100644 --- a/t/perf/benchmarks +++ b/t/perf/benchmarks @@ -1377,4 +1377,15 @@ setup => '$_ = ("0" x 100) . ("a" x 100);', code => '/[acgt]+/', }, + + 'regex::whilem::min_captures_fail' => { + desc => '/WHILEM with anon-greedy match and captures that fails', + setup => '$_ = ("a" x 20)', + code => '/^(?:(.)(.))*?[XY]/', + }, + 'regex::whilem::max_captures_fail' => { + desc => '/WHILEM with a greedy match and captures that fails', + setup => '$_ = ("a" x 20)', + code => '/^(?:(.)(.))*[XY]/', + }, ]; diff --git a/t/re/pat_re_eval.t b/t/re/pat_re_eval.t index 4df88aff57..6921d38bac 100644 --- a/t/re/pat_re_eval.t +++ b/t/re/pat_re_eval.t @@ -22,7 +22,7 @@ BEGIN { } -plan tests => 529; # Update this when adding/deleting tests. +plan tests => 533; # Update this when adding/deleting tests. run_tests() unless caller; @@ -1253,6 +1253,44 @@ sub run_tests { like $@, qr/Reference to nonexistent group/, "RT #130650"; } + # RT #129881 + # on exit from a pattern with multiple code blocks from different + # CVs, PL_comppad wasn't being restored correctly + + sub { + # give first few pad slots known values + my ($x1, $x2, $x3, $x4, $x5) = 101..105; + # these vars are in a separate pad + my $r = qr/((?{my ($y1, $y2) = 201..202; 1;})A){2}X/; + # the first alt fails, causing a switch to this anon + # sub's pad + "AAA" =~ /$r|(?{my ($z1, $z2) = 301..302; 1;})A/; + is $x1, 101, "RT #129881: x1"; + is $x2, 102, "RT #129881: x2"; + is $x3, 103, "RT #129881: x3"; + }->(); + + + # RT #126697 + # savestack wasn't always being unwound on EVAL failure + { + local our $i = 0; + my $max = 0; + + 'ABC' =~ m{ + \A + (?: + (?: AB | A | BC ) + (?{ + local $i = $i + 1; + $max = $i if $max < $i; + }) + )* + \z + }x; + is $max, 2, "RT #126697"; + } + } # End of sub run_tests -- Perl5 Master Repository
