In perl.git, the branch smoke-me/remove-regcomp-setjmp has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/810f81c8c1730e648f9fc495bd81de8a639c675c?hp=c10ee8c658902092ff2f64ca6058b98ba96b6920>

- Log -----------------------------------------------------------------
commit 810f81c8c1730e648f9fc495bd81de8a639c675c
Author: Nicholas Clark <[email protected]>
Date:   Mon Jan 21 20:58:49 2013 +0100

    Revert "PATCH: regex longjmp flaws"
    
    This reverts commit 595598ee1f247e72e06e4cfbe0f98406015df5cc.
    
    The netbsd - 5.0.2 compiler pointed out that the recent changes to add
    longjmps to speed up some regex compilations can result in clobbering a
    few values.  These depend on the compiled code, and so didn't show up in
    other compiler's warnings.  This patch reinitializes them after a
    longjmp.
    
    [With a lot of hand editing in regcomp.c, to propagate the changes through
    subsequent commits.]

M       regcomp.c
M       regcomp.h

commit 8bc1a6c9dee5e5ac3d374ca654efba63647b6910
Author: Nicholas Clark <[email protected]>
Date:   Mon Jan 21 20:32:01 2013 +0100

    In Perl_re_op_compile(), tidy up after removing setjmp().
    
    Remove volatile qualifiers. Remove the variable jump_ret. Move the
    initialisation of restudied back to the declaration. This reverts several of
    the changes made by commits 5d51ce98fae3de07 and bbd61b5ffb7621c2.
    
    However, I can't see a cleaner way to avoid code duplication when restarting
    the parse than to approach I've taken here - the label redo_first_pass is
    now inside an if (0) block, which is clear but ugly.

M       embed.fnc
M       proto.h
M       regcomp.c

commit dcaf3459f8ce6b3fd956132377e8ad41b9d49dd8
Author: Nicholas Clark <[email protected]>
Date:   Mon Jan 21 17:15:30 2013 +0100

    Replace the longjmp()s in Perl_re_op_compile() with goto.
    
    The regex parse needs to be restarted if it turns out that it should be done
    as UTF-8, not bytes. Using setjmp()/longjmp() complicates compilation
    considerably, causing warnings about missing use of volatile, and hitting
    code generation errors from clang's ASAN. Using goto is much clearer.

M       regcomp.c

commit 49115de05eb16e7f65afab7e160e15334dfccbd4
Author: Nicholas Clark <[email protected]>
Date:   Sat Jan 19 11:06:10 2013 +0100

    Move the longjmp() that implements REQUIRE_UTF8 up to Perl_re_op_compile().
    
    With longjmp() and setjmp() now in the same function (and all tests 
passing),
    it becomes easy to replace the pair with a goto. Still evil, but "the lesser
    of two evils".

M       regcomp.c

commit da2d0109e35d7269445b5199f6f4d10abba65644
Author: Nicholas Clark <[email protected]>
Date:   Fri Jan 18 17:21:03 2013 +0100

    Add a flag RESTART_UTF8 to the reg*() routines in regcomp.c
    
    Add a flag RESTART_UTF8 along with infrastructure to the reg*() routines to
    permit the parse to be restarted without using longjmp(). However, it's not
    used yet.

M       regcomp.c

commit ba7cf480dd211e0e7572eb612e252b3ca3819980
Author: Nicholas Clark <[email protected]>
Date:   Fri Jan 18 11:32:44 2013 +0100

    In S_regclass(), create listsv as a mortal, claiming a reference if needed.
    
    The SV listsv is sometimes stored in an array generated near the end of
    S_regclass(). In other cases it is not used, and it needs to be freed if
    any of the warnings that S_regclass() can trigger turn out to be fatal.
    
    The simplest solution to this problem is to declare it from the start as a
    mortal, and claim a (new) reference to it if it is *not* to be freed. This
    permits the removal of all other code related to ensuring that it is freed
    at the right time, but not freed prematurely if a call to a warning returns.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit e6731cc09b686669ef22a09a2911891904b62516
Author: Nicholas Clark <[email protected]>
Date:   Thu Jan 17 11:47:13 2013 +0100

    Document when and why S_reg{,branch,piece,atom,class}() return NULL.
    
    As documented in pod/perlreguts.pod, the call graph for regex parsing
    involves several levels of functions in regcomp.c, sometimes recursing more
    than once.
    
    The top level compiling function, S_reg(), calls S_regbranch() to parse each
    single branch of an alternation. In turn, that calls S_regpiece() to parse
    a simple pattern followed by quantifier, which calls S_regatom() to parse
    that simple pattern. S_regatom() can call S_regclass() to handle classes,
    but can also recurse into S_reg() to handle subpatterns and some other
    constructions. Some other routines call call S_reg(), sometimes using an
    alternative pattern that they generate dynamically to represent their input.
    
    These routines all return a pointer to a regnode structure, and take a
    pointer to an integer that holds flags, which is also used to return
    information.
    
    Historically, it has not been clear when and why they return NULL, and
    whether the return value can be ignored. In particular, "Jumbo regexp patch"
    (commit c277df42229d99fe, from Nov 1997), added code with two calls from
    S_reg() to S_regbranch(), one of which checks the return value and generates
    a LONGJMP node if it returns NULL, the other of which is called in void
    context, and so both ignores any return value, or the possibility that it is
    NULL.
    
    After some analysis I have untangled the possible return values from these
    5 functions (and related functions which call S_reg()).
    
    Starting from the top:
    S_reg() will return NULL and set the flags to TRYAGAIN at the end of pragma-
    line constructions that it handles. Otherwise, historically it would return
    NULL if S_regbranch() returned NULL. In turn, S_regbranch() would return
    NULL if S_regpiece() returned NULL without setting TRYAGAIN. If S_regpiece()
    returns TRYAGAIN, S_regbranch() loops, and ultimately will not return NULL.
    
    S_regpiece() returns NULL with TRYAGAIN if S_regatom() returns NULL with
    TRYAGAIN, but (historically) if S_regatom() returns NULL without setting
    the flags to TRYAGAIN, S_regpiece() would to. Where S_regatom() calls
    S_reg() it has similar behaviour when passing back return values, although
    often it is able to loop instead on getting a TRYAGAIN.
    
    Which gets us back to S_reg(), which can only *generate* NULL in conjunction
    with TRYAGAIN. NULL without TRYAGAIN could only be returned if a routine it
    called generated it. All other functions that these call that return regnode
    structures cannot return NULL. Hence
    
    1) in the loop of functions called, there is no source for a return value of
       NULL without the TRYAGAIN flag being set
    2) a return value of NULL with TRYAGAIN set from an inner function does not
       propagate out past S_regbranch()
    
    Hence the only return values that most functions can generate are non-NULL,
    or NULL with TRYAGAIN set, and as S_regbranch() catches these, it cannot
    return NULL. The longest sequence of functions that can return NULL (with
    TRYAGAIN set) is S_reg() -> S_regatom() -> S_regpiece() -> S_regbranch().
    Rapidly returning right round the loop back to S_reg() is not possible.
    
    Hence code added by commit c277df42229d99fe to handle a NULL return from
    S_regbranch(), along with some other code is dead.
    
    I have replaced all unreachable code with FAIL()s that panic.

M       regcomp.c

commit bbc20ad312647f0e1e3c57f1f8fa7ceb67b01ce9
Author: Nicholas Clark <[email protected]>
Date:   Fri Jan 18 16:30:39 2013 +0100

    Return orig_emit from S_regclass() when ret_invlist is true.
    
    The return value isn't used (yet). Previously the code was returning END,
    which is a macro for the regnode number for "End of program" which happens 
to
    be 0. It happens that 0 is valid C for a NULL pointer, but using it in this
    way makes the intent unclear. Not only is orig_emit a valid type, it's
    semantically the correct thing to return, as it's most recently added node.

M       regcomp.c

commit 6fbf949a3ab4df75eaea72f38e8adc688b3749f9
Author: Nicholas Clark <[email protected]>
Date:   Wed Jan 16 21:58:02 2013 +0100

    Test that UTF-8 in the look-ahead of (?(?=...)...) restarts the sizing 
parse.
    
    S_reg() recurses to itself to parse various constructions used as the
    conditionals in conditional matching. Look-aheads and look-behinds can turn
    out to need to be sized as UTF-8, which can cause the inner S_reg() to use
    the macro REQUIRE_UTF8 is used to restart the parse. Test that this is
    handled correctly.

M       t/re/re_tests

commit 3d639504eb1c440ce4affb35dee21989b241771b
Author: Nicholas Clark <[email protected]>
Date:   Wed Jan 16 17:08:03 2013 +0100

    Test that S_grok_bslash_N() copes if S_reg() restarts the sizing parse.
    
    S_reg() can discover midway through parsing the pattern to determine its
    size, that the pattern will actually need to be encoded as UTF-8. If
    calculations so far have been done in terms of bytes, then the macro
    REQUIRE_UTF8 is used to restart the parse, so that sizes can be calculated
    correctly for UTF-8.
    
    It is possible to trigger this restart when processing multi-character
    charnames interpolated into the pattern using \N{}. Test that this is
    handled correctly.

M       t/re/pat_advanced.t

commit ee9c233b469f709a1962d7bdff4d21e5231b9aee
Author: Nicholas Clark <[email protected]>
Date:   Wed Jan 16 11:48:04 2013 +0100

    Perl_sv_uni_display() needs to be aware of RX_WRAPPED()
    
    Commit 8d919b0a35f2b57a changed the storage location of the string in
    SVt_REGEXP. It updated most code to deal with this, but missed the use of
    SvPVX_const() in Perl_sv_uni_display(). This breaks dumping regular
    expressions which have the UTF-8 flag set.

M       ext/Devel-Peek/t/Peek.t
M       utf8.c

commit 60c42c5fa66c9e9ebfc683f2b3a6fa0261d1585a
Author: Nicholas Clark <[email protected]>
Date:   Mon Jan 14 09:46:48 2013 +0100

    Remove unreachable duplicate (?#...) parsing code from S_reg()
    
    I believe that this code was rendered unreachable when perl 5.001 added
    code to S_nextchar() to skip over embedded comments. Adrian Enache noted
    this in March 2003, and proposed a patch which removed it. See
    http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-03/msg00840.html
    
    The patch wasn't applied at that time, and when he sent it again August,
    he omitted that hunk. See
    http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-08/msg01820.html
    
    That version was applied as commit e994fd663a4d8acc.

M       regcomp.c
-----------------------------------------------------------------------

Summary of changes:
 embed.fnc               |    4 +-
 embed.h                 |    2 +-
 ext/Devel-Peek/t/Peek.t |   35 ++++++
 proto.h                 |    4 +-
 regcomp.c               |  299 ++++++++++++++++++++++++++++-------------------
 regcomp.h               |    4 +-
 t/re/pat_advanced.t     |    8 ++
 t/re/re_tests           |    4 +
 utf8.c                  |    5 +-
 9 files changed, 234 insertions(+), 131 deletions(-)

diff --git a/embed.fnc b/embed.fnc
index b2b46f4..10c0111 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1105,7 +1105,7 @@ Ap        |REGEXP*|pregcomp       |NN SV * const 
pattern|const U32 flags
 p      |REGEXP*|re_op_compile  |NULLOK SV ** const patternp \
                                |int pat_count|NULLOK OP *expr \
                                |NN const regexp_engine* eng \
-                               |NULLOK REGEXP *VOL old_re \
+                               |NULLOK REGEXP *old_re \
                                |NULLOK bool *is_bare_re \
                                |U32 rx_flags|U32 pm_flags
 Ap     |REGEXP*|re_compile     |NN SV * const pattern|U32 orig_rx_flags
@@ -2033,7 +2033,7 @@ EsRn      |U32    |add_data       |NN struct RExC_state_t 
*pRExC_state|U32 n \
                                |NN const char *s
 rs     |void   |re_croak2      |NN const char* pat1|NN const char* pat2|...
 Ei     |I32    |regpposixcc    |NN struct RExC_state_t *pRExC_state \
-                               |I32 value|NULLOK SV *free_me|const bool strict
+                               |I32 value|const bool strict
 Es     |I32    |make_trie      |NN struct RExC_state_t *pRExC_state \
                                |NN regnode *startbranch|NN regnode *first \
                                |NN regnode *last|NN regnode *tail \
diff --git a/embed.h b/embed.h
index e9a1980..9a7c439 100644
--- a/embed.h
+++ b/embed.h
@@ -942,7 +942,7 @@
 #define reginsert(a,b,c,d)     S_reginsert(aTHX_ a,b,c,d)
 #define regpatws               S_regpatws
 #define regpiece(a,b,c)                S_regpiece(aTHX_ a,b,c)
-#define regpposixcc(a,b,c,d)   S_regpposixcc(aTHX_ a,b,c,d)
+#define regpposixcc(a,b,c)     S_regpposixcc(aTHX_ a,b,c)
 #define regtail(a,b,c,d)       S_regtail(aTHX_ a,b,c,d)
 #define reguni(a,b,c)          S_reguni(aTHX_ a,b,c)
 #define regwhite               S_regwhite
diff --git a/ext/Devel-Peek/t/Peek.t b/ext/Devel-Peek/t/Peek.t
index b3dbc9b..a8b0d22 100644
--- a/ext/Devel-Peek/t/Peek.t
+++ b/ext/Devel-Peek/t/Peek.t
@@ -938,4 +938,39 @@ unless ($Config{useithreads}) {
     close OUT;
 }
 
+do_test('UTF-8 in a regular expression',
+        qr/\x{100}/,
+'SV = IV\($ADDR\) at $ADDR
+  REFCNT = 1
+  FLAGS = \(ROK\)
+  RV = $ADDR
+  SV = REGEXP\($ADDR\) at $ADDR
+    REFCNT = 1
+    FLAGS = \(OBJECT,FAKE,UTF8\)
+    PV = $ADDR "\(\?\^u:\\\\\\\\x\{100\}\)" \[UTF8 
"\(\?\^u:\\\\\\\\x\{100\}\)"\]
+    CUR = 13
+    STASH = $ADDR      "Regexp"
+    EXTFLAGS = 0x680040 \(CHECK_ALL,USE_INTUIT_NOML,USE_INTUIT_ML\)
+    INTFLAGS = 0x0
+    NPARENS = 0
+    LASTPAREN = 0
+    LASTCLOSEPAREN = 0
+    MINLEN = 1
+    MINLENRET = 1
+    GOFS = 0
+    PRE_PREFIX = 5
+    SUBLEN = 0
+    SUBOFFSET = 0
+    SUBCOFFSET = 0
+    SUBBEG = 0x0
+    ENGINE = $ADDR
+    MOTHER_RE = $ADDR
+    PAREN_NAMES = 0x0
+    SUBSTRS = $ADDR
+    PPRIVATE = $ADDR
+    OFFS = $ADDR
+    QR_ANONCV = 0x0
+    SAVED_COPY = 0x0
+');
+
 done_testing();
diff --git a/proto.h b/proto.h
index 67eea1d..c931bd9 100644
--- a/proto.h
+++ b/proto.h
@@ -3259,7 +3259,7 @@ PERL_CALLCONV SV* Perl_re_intuit_string(pTHX_ REGEXP  
*const r)
 #define PERL_ARGS_ASSERT_RE_INTUIT_STRING      \
        assert(r)
 
-PERL_CALLCONV REGEXP*  Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count, OP *expr, const regexp_engine* eng, REGEXP *VOL old_re, bool 
*is_bare_re, U32 rx_flags, U32 pm_flags)
+PERL_CALLCONV REGEXP*  Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count, OP *expr, const regexp_engine* eng, REGEXP *old_re, bool 
*is_bare_re, U32 rx_flags, U32 pm_flags)
                        __attribute__nonnull__(pTHX_4);
 #define PERL_ARGS_ASSERT_RE_OP_COMPILE \
        assert(eng)
@@ -6700,7 +6700,7 @@ STATIC regnode*   S_regpiece(pTHX_ struct RExC_state_t 
*pRExC_state, I32 *flagp, U
 #define PERL_ARGS_ASSERT_REGPIECE      \
        assert(pRExC_state); assert(flagp)
 
-PERL_STATIC_INLINE I32 S_regpposixcc(pTHX_ struct RExC_state_t *pRExC_state, 
I32 value, SV *free_me, const bool strict)
+PERL_STATIC_INLINE I32 S_regpposixcc(pTHX_ struct RExC_state_t *pRExC_state, 
I32 value, const bool strict)
                        __attribute__nonnull__(pTHX_1);
 #define PERL_ARGS_ASSERT_REGPPOSIXCC   \
        assert(pRExC_state)
diff --git a/regcomp.c b/regcomp.c
index a16b8b3..10d7efc 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -231,8 +231,9 @@ typedef struct RExC_state_t {
  * REGNODE_SIMPLE */
 #define        SIMPLE          0x02
 #define        SPSTART         0x04    /* Starts with * or + */
-#define TRYAGAIN       0x08    /* Weeded out a declaration. */
-#define POSTPONED      0x10    /* (?1),(?&name), (??{...}) or similar */
+#define POSTPONED      0x08    /* (?1),(?&name), (??{...}) or similar */
+#define TRYAGAIN       0x10    /* Weeded out a declaration. */
+#define RESTART_UTF8    0x20    /* Restart, need to calcuate sizes as UTF-8 */
 
 #define REG_NODE_NUM(x) ((x) ? (int)((x)-RExC_emit_start) : -1)
 
@@ -251,10 +252,11 @@ typedef struct RExC_state_t {
 #define PAREN_SET(u8str,paren) PBYTE(u8str,paren) |= PBITVAL(paren)
 #define PAREN_UNSET(u8str,paren) PBYTE(u8str,paren) &= (~PBITVAL(paren))
 
-/* If not already in utf8, do a longjmp back to the beginning */
-#define UTF8_LONGJMP 42 /* Choose a value not likely to ever conflict */
 #define REQUIRE_UTF8   STMT_START {                                       \
-                                     if (! UTF) JMPENV_JUMP(UTF8_LONGJMP); \
+                                     if (!UTF) {                           \
+                                         *flagp = RESTART_UTF8;            \
+                                         return NULL;                      \
+                                     }                                     \
                         } STMT_END
 
 /* This converts the named class defined in regcomp.h to its equivalent class
@@ -5162,7 +5164,7 @@ S_setup_longest(pTHX_ RExC_state_t *pRExC_state, SV* 
sv_longest, SV** rx_utf8, S
 
 REGEXP *
 Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count,
-                   OP *expr, const regexp_engine* eng, REGEXP *VOL old_re,
+                   OP *expr, const regexp_engine* eng, REGEXP *old_re,
                     bool *is_bare_re, U32 orig_rx_flags, U32 pm_flags)
 {
     dVAR;
@@ -5170,32 +5172,29 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
     struct regexp *r;
     regexp_internal *ri;
     STRLEN plen;
-    char  * VOL exp;
+    char *exp;
     char* xend;
     regnode *scan;
     I32 flags;
     I32 minlen = 0;
     U32 rx_flags;
-    SV * VOL pat;
-    SV * VOL code_blocksv = NULL;
+    SV *pat;
+    SV *code_blocksv = NULL;
 
     /* these are all flags - maybe they should be turned
      * into a single int with different bit masks */
     I32 sawlookahead = 0;
     I32 sawplus = 0;
     I32 sawopen = 0;
-    bool used_setjump = FALSE;
     regex_charset initial_charset = get_regex_charset(orig_rx_flags);
     bool code_is_utf8 = 0;
-    bool VOL recompile = 0;
+    bool recompile = 0;
     bool runtime_code = 0;
-    U8 jump_ret = 0;
-    dJMPENV;
     scan_data_t data;
     RExC_state_t RExC_state;
     RExC_state_t * const pRExC_state = &RExC_state;
 #ifdef TRIE_STUDY_OPT    
-    int restudied;
+    int restudied = 0;
     RExC_state_t copyRExC_state;
 #endif    
     GET_RE_DEBUG_FLAGS_DECL;
@@ -5513,6 +5512,7 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
     }
 
     exp = SvPV_nomg(pat, plen);
+    xend = exp + plen;
 
     if (!eng->op_comp) {
        if ((SvUTF8(pat) && IN_BYTES)
@@ -5533,39 +5533,22 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
     RExC_contains_locale = 0;
     pRExC_state->runtime_code_qr = NULL;
 
-    /****************** LONG JUMP TARGET HERE***********************/
-    /* Longjmp back to here if have to switch in midstream to utf8 */
-    if (! RExC_orig_utf8) {
-       JMPENV_PUSH(jump_ret);
-       used_setjump = TRUE;
-    }
-
-    if (jump_ret == 0) {    /* First time through */
-       xend = exp + plen;
-
-        DEBUG_COMPILE_r({
+    DEBUG_COMPILE_r({
             SV *dsv= sv_newmortal();
-            RE_PV_QUOTED_DECL(s, RExC_utf8,
-                dsv, exp, plen, 60);
+            RE_PV_QUOTED_DECL(s, RExC_utf8, dsv, exp, plen, 60);
             PerlIO_printf(Perl_debug_log, "%sCompiling REx%s %s\n",
-                           PL_colors[4],PL_colors[5],s);
+                          PL_colors[4],PL_colors[5],s);
         });
-    }
-    else {  /* longjumped back */
-       U8 *src, *dst;
+
+    if (0) {
+      redo_first_pass:
+        {
+            U8 *const src = (U8*)exp;
+            U8 *dst;
        int n=0;
        STRLEN s = 0, d = 0;
        bool do_end = 0;
 
-        /* If the cause for the longjmp was other than changing to utf8, pop
-         * our own setjmp, and longjmp to the correct handler */
-       if (jump_ret != UTF8_LONGJMP) {
-           JMPENV_POP;
-           JMPENV_JUMP(jump_ret);
-       }
-
-       GET_RE_DEBUG_FLAGS;
-
         /* It's possible to write a regexp in ascii that represents Unicode
         codepoints outside of the byte range, such as via \x{100}. If we
         detect such a sequence we have to convert the entire pattern to utf8
@@ -5581,7 +5564,6 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
         * recalculate the indices.
         * This is essentially an unrolled Perl_bytes_to_utf8() */
 
-       src = (U8*)SvPV_nomg(pat, plen);
        Newx(dst, plen * 2 + 1, U8);
 
        while (s < plen) {
@@ -5614,6 +5596,7 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
        xend = exp + plen;
        SAVEFREEPV(exp);
        RExC_orig_utf8 = RExC_utf8 = 1;
+        }
     }
 
     /* return old regex if pattern hasn't changed */
@@ -5629,9 +5612,6 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
        runtime_code = S_has_runtime_code(aTHX_ pRExC_state, expr, pm_flags,
                                            exp, plen);
        if (!runtime_code) {
-           if (used_setjump) {
-               JMPENV_POP;
-           }
            Safefree(pRExC_state->code_blocks);
            return old_re;
        }
@@ -5645,10 +5625,6 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
        runtime_code = S_has_runtime_code(aTHX_ pRExC_state, expr, pm_flags,
                            exp, plen);
 
-#ifdef TRIE_STUDY_OPT
-    restudied = 0;
-#endif
-
     rx_flags = orig_rx_flags;
 
     if (initial_charset == REGEX_LOCALE_CHARSET) {
@@ -5672,7 +5648,7 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
        if (!S_compile_runtime_code(aTHX_ pRExC_state, exp, plen)) {
            /* whoops, we have a non-utf8 pattern, whilst run-time code
             * got compiled as utf8. Try again with a utf8 pattern */
-            JMPENV_JUMP(UTF8_LONGJMP);
+            goto redo_first_pass;
        }
     }
     assert(!pRExC_state->runtime_code_qr);
@@ -5728,17 +5704,14 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
        SvLEN_set(code_blocksv, 1); /*sufficient to make sv_clear free it*/
     }
     if (reg(pRExC_state, 0, &flags,1) == NULL) {
-       RExC_precomp = NULL;
-       return(NULL);
+        if (flags & RESTART_UTF8) {
+            goto redo_first_pass;
+        }
+        Perl_croak(aTHX_ "panic: reg returned NULL to re_op_compile for sizing 
pass, flags=%#X", flags);
     }
     if (code_blocksv)
        SvLEN_set(code_blocksv,0); /* no you can't have it, sv_clear */
 
-    /* Here, finished first pass.  Get rid of any added setjmp */
-    if (used_setjump) {
-       JMPENV_POP;
-    }
-
     DEBUG_PARSE_r({
         PerlIO_printf(Perl_debug_log, 
             "Required size %"IVdf" nodes\n"
@@ -5904,7 +5877,7 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
     REGC((U8)REG_MAGIC, (char*) RExC_emit++);
     if (reg(pRExC_state, 0, &flags,1) == NULL) {
        ReREFCNT_dec(rx);   
-       return(NULL);
+        Perl_croak(aTHX_ "panic: reg returned NULL to re_op_compile for 
generation pass, flags=%#X", flags);
     }
     /* XXXX To minimize changes to RE engine we always allocate
        3-units-long substrs field. */
@@ -8321,6 +8294,11 @@ S__invlistEQ(pTHX_ SV* const a, SV* const b, bool 
complement_b)
 #define REGTAIL_STUDY(x,y,z) regtail((x),(y),(z),depth+1)
 #endif
 
+/* Returns NULL, setting *flagp to TRYAGAIN at the end of (?) that only sets
+   flags. Returns NULL, setting *flagp to RESTART_UTF8 if the sizing scan
+   needs to be restarted.
+   Otherwise would only return NULL if regbranch() returns NULL, which
+   cannot happen.  */
 STATIC regnode *
 S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth)
     /* paren: Parenthesized? 0=top, 1=(, inside: changed to letter. */
@@ -8630,14 +8608,6 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 
*flagp,U32 depth)
            case '@':           /* (?@...) */
                vFAIL2("Sequence (?%c...) not implemented", (int)paren);
                break;
-           case '#':           /* (?#...) */
-               while (*RExC_parse && *RExC_parse != ')')
-                   RExC_parse++;
-               if (*RExC_parse != ')')
-                   FAIL("Sequence (?#... not terminated");
-               nextchar(pRExC_state);
-               *flagp = TRYAGAIN;
-               return NULL;
            case '0' :           /* (?0) */
            case 'R' :           /* (?R) */
                if (*RExC_parse != ')')
@@ -8803,11 +8773,18 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 
*flagp,U32 depth)
                        || RExC_parse[1] == '<'
                        || RExC_parse[1] == '{') { /* Lookahead or eval. */
                        I32 flag;
+                        regnode *tail;
 
                        ret = reg_node(pRExC_state, LOGICAL);
                        if (!SIZE_ONLY)
                            ret->flags = 1;
-                        REGTAIL(pRExC_state, ret, reg(pRExC_state, 1, 
&flag,depth+1));
+                        
+                        tail = reg(pRExC_state, 1, &flag, depth+1);
+                        if (flag & RESTART_UTF8) {
+                            *flagp = RESTART_UTF8;
+                            return NULL;
+                        }
+                        REGTAIL(pRExC_state, ret, tail);
                        goto insert_if;
                    }
                }
@@ -8875,9 +8852,14 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 
*flagp,U32 depth)
                  insert_if:
                     REGTAIL(pRExC_state, ret, reganode(pRExC_state, IFTHEN, 
0));
                     br = regbranch(pRExC_state, &flags, 1,depth+1);
-                   if (br == NULL)
-                       br = reganode(pRExC_state, LONGJMP, 0);
-                   else
+                   if (br == NULL) {
+                        if (flags & RESTART_UTF8) {
+                            *flagp = RESTART_UTF8;
+                            return NULL;
+                        }
+                        FAIL2("panic: regbranch returned NULL, flags=%#X",
+                              flags);
+                    } else
                         REGTAIL(pRExC_state, br, reganode(pRExC_state, 
LONGJMP, 0));
                    c = *nextchar(pRExC_state);
                    if (flags&HASWIDTH)
@@ -8886,7 +8868,14 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 
*flagp,U32 depth)
                        if (is_define) 
                            vFAIL("(?(DEFINE)....) does not allow branches");
                        lastbr = reganode(pRExC_state, IFTHEN, 0); /* Fake one 
for optimizer. */
-                        regbranch(pRExC_state, &flags, 1,depth+1);
+                        if (!regbranch(pRExC_state, &flags, 1,depth+1)) {
+                            if (flags & RESTART_UTF8) {
+                                *flagp = RESTART_UTF8;
+                                return NULL;
+                            }
+                            FAIL2("panic: regbranch returned NULL, flags=%#X",
+                                  flags);
+                        }
                         REGTAIL(pRExC_state, ret, lastbr);
                        if (flags&HASWIDTH)
                            *flagp |= HASWIDTH;
@@ -9135,8 +9124,13 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 
*flagp,U32 depth)
 
     /*     branch_len = (paren != 0); */
 
-    if (br == NULL)
-       return(NULL);
+    if (br == NULL) {
+        if (flags & RESTART_UTF8) {
+            *flagp = RESTART_UTF8;
+            return NULL;
+        }
+        FAIL2("panic: regbranch returned NULL, flags=%#X", flags);
+    }
     if (*RExC_parse == '|') {
        if (!SIZE_ONLY && RExC_extralen) {
            reginsert(pRExC_state, BRANCHJ, br, depth+1);
@@ -9175,8 +9169,13 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 
*flagp,U32 depth)
         }
         br = regbranch(pRExC_state, &flags, 0, depth+1);
 
-       if (br == NULL)
-           return(NULL);
+       if (br == NULL) {
+            if (flags & RESTART_UTF8) {
+                *flagp = RESTART_UTF8;
+                return NULL;
+            }
+            FAIL2("panic: regbranch returned NULL, flags=%#X", flags);
+        }
         REGTAIL(pRExC_state, lastbr, br);               /* BRANCH -> BRANCH. */
        lastbr = br;
        *flagp |= flags & (SPSTART | HASWIDTH | POSTPONED);
@@ -9333,6 +9332,9 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 
*flagp,U32 depth)
  - regbranch - one alternative of an | operator
  *
  * Implements the concatenation operator.
+ *
+ * Returns NULL, setting *flagp to RESTART_UTF8 if the sizing scan needs to be
+ * restarted.
  */
 STATIC regnode *
 S_regbranch(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, I32 first, U32 depth)
@@ -9372,7 +9374,11 @@ S_regbranch(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
I32 first, U32 depth)
        if (latest == NULL) {
            if (flags & TRYAGAIN)
                continue;
-           return(NULL);
+            if (flags & RESTART_UTF8) {
+                *flagp = RESTART_UTF8;
+                return NULL;
+            }
+            FAIL2("panic: regpiece returned NULL, flags=%#X", flags);
        }
        else if (ret == NULL)
            ret = latest;
@@ -9406,6 +9412,11 @@ S_regbranch(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
I32 first, U32 depth)
  * both the endmarker for their branch list and the body of the last branch.
  * It might seem that this node could be dispensed with entirely, but the
  * endmarker role is not redundant.
+ *
+ * Returns NULL, setting *flagp to TRYAGAIN if regatom() returns NULL with
+ * TRYAGAIN.
+ * Returns NULL, setting *flagp to RESTART_UTF8 if the sizing scan needs to be
+ * restarted.
  */
 STATIC regnode *
 S_regpiece(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth)
@@ -9434,8 +9445,10 @@ S_regpiece(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth)
 
     ret = regatom(pRExC_state, &flags,depth+1);
     if (ret == NULL) {
-       if (flags & TRYAGAIN)
-           *flagp |= TRYAGAIN;
+       if (flags & (TRYAGAIN|RESTART_UTF8))
+           *flagp |= flags & (TRYAGAIN|RESTART_UTF8);
+        else
+            FAIL2("panic: regatom returned NULL, flags=%#X", flags);
        return(NULL);
     }
 
@@ -9665,7 +9678,10 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state, 
regnode** node_p, UV *valuep, I
 
    The function raises an error (via vFAIL), and doesn't return for various
    syntax errors.  Otherwise it returns TRUE and sets <node_p> or <valuep> on
-   success; it returns FALSE otherwise.
+   success; it returns FALSE otherwise. Returns FALSE, setting *flagp to
+   RESTART_UTF8 if the sizing scan needs to be restarted. Such a restart is
+   only possible if node_p is non-NULL.
+
 
    If <valuep> is non-null, it means the caller can accept an input sequence
    consisting of a just a single code point; <*valuep> is set to that value
@@ -9869,7 +9885,14 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state, 
regnode** node_p, UV *valuep, I
        /* The values are Unicode, and therefore not subject to recoding */
        RExC_override_recoding = 1;
 
-       *node_p = reg(pRExC_state, 1, &flags, depth+1);
+       if (!(*node_p = reg(pRExC_state, 1, &flags, depth+1))) {
+            if (flags & RESTART_UTF8) {
+                *flagp = RESTART_UTF8;
+                return FALSE;
+            }
+            FAIL2("panic: reg returned NULL to grok_bslash_N, flags=%#X",
+                  flags);
+        } 
        *flagp |= flags&(HASWIDTH|SPSTART|SIMPLE|POSTPONED);
 
        RExC_parse = endbrace;
@@ -10069,6 +10092,12 @@ S_alloc_maybe_populate_EXACT(pTHX_ RExC_state_t 
*pRExC_state, regnode *node, I32
    escape sequences, with the one for handling literal escapes requiring
    a dummy entry for all of the special escapes that are actually handled
    by the other.
+
+   Returns NULL, setting *flagp to TRYAGAIN if reg() returns NULL with
+   TRYAGAIN.  
+   Returns NULL, setting *flagp to RESTART_UTF8 if the sizing scan needs to be
+   restarted.
+   Otherwise does not return NULL.
 */
 
 STATIC regnode *
@@ -10136,6 +10165,12 @@ tryagain:
            RExC_parse = oregcomp_parse;
            vFAIL("Unmatched [");
        }
+        if (ret == NULL) {
+            if (*flagp & RESTART_UTF8)
+                return NULL;
+            FAIL2("panic: regclass returned NULL to regatom, flags=%#X",
+                  *flagp);
+        }
        nextchar(pRExC_state);
         Set_Node_Length(ret, RExC_parse - oregcomp_parse + 1); /* MJD */
        break;
@@ -10152,7 +10187,11 @@ tryagain:
                    }
                    goto tryagain;
                }
-               return(NULL);
+                if (flags & RESTART_UTF8) {
+                    *flagp = RESTART_UTF8;
+                    return NULL;
+                }
+                FAIL2("panic: reg returned NULL to regatom, flags=%#X", flags);
        }
        *flagp |= flags&(HASWIDTH|SPSTART|SIMPLE|POSTPONED);
        break;
@@ -10344,6 +10383,11 @@ tryagain:
                                          It would be a bug if these returned
                                          non-portables */
                                NULL);
+                /* regclass() can only return RESTART_UTF8 if multi-char folds
+                   are allowed.  */
+                if (!ret)
+                    FAIL2("panic: regclass returned NULL to regatom, 
flags=%#X",
+                          *flagp);
 
                RExC_parse--;
 
@@ -10366,6 +10410,8 @@ tryagain:
             ++RExC_parse;
             if (! grok_bslash_N(pRExC_state, &ret, NULL, flagp, depth, FALSE,
                                 FALSE /* not strict */ )) {
+                if (*flagp & RESTART_UTF8)
+                    return NULL;
                 RExC_parse--;
                 goto defchar;
             }
@@ -10632,6 +10678,8 @@ tryagain:
                                             flagp, depth, FALSE,
                                             FALSE /* not strict */ ))
                         {
+                            if (*flagp & RESTART_UTF8)
+                                FAIL("panic: grok_bslash_N set RESTART_UTF8");
                             RExC_parse = p = oldp;
                             goto loopdone;
                         }
@@ -11180,8 +11228,7 @@ S_regpatws( RExC_state_t *pRExC_state, char *p , const 
bool recognize_comment )
 #define POSIXCC(c) (POSIXCC_DONE(c) || POSIXCC_NOTYET(c))
 
 PERL_STATIC_INLINE I32
-S_regpposixcc(pTHX_ RExC_state_t *pRExC_state, I32 value, SV *free_me,
-                    const bool strict)
+S_regpposixcc(pTHX_ RExC_state_t *pRExC_state, I32 value, const bool strict)
 {
     dVAR;
     I32 namedclass = OOB_NAMEDCLASS;
@@ -11305,7 +11352,6 @@ S_regpposixcc(pTHX_ RExC_state_t *pRExC_state, I32 
value, SV *free_me,
                       the class closes */
                    while (UCHARAT(RExC_parse) && UCHARAT(RExC_parse) != ']')
                        RExC_parse++;
-                   SvREFCNT_dec(free_me);
                    vFAIL3("POSIX syntax [%c %c] is reserved for future 
extensions", c, c);
                }
            } else {
@@ -11441,13 +11487,18 @@ S_handle_sets(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
                         RExC_parse++;
                     }
 
-                    (void) regclass(pRExC_state, flagp,depth+1,
-                                    is_posix_class, /* parse the whole char
-                                                       class only if not a
-                                                       posix class */
-                                    FALSE, /* don't allow multi-char folds */
-                                    TRUE, /* silence non-portable warnings. */
-                                    &current);
+                    /* regclass() can only return RESTART_UTF8 if multi-char
+                       folds are allowed.  */
+                    if (!regclass(pRExC_state, flagp,depth+1,
+                                  is_posix_class, /* parse the whole char
+                                                     class only if not a
+                                                     posix class */
+                                  FALSE, /* don't allow multi-char folds */
+                                  TRUE, /* silence non-portable warnings. */
+                                  &current))
+                        FAIL2("panic: regclass returned NULL to handle_sets, 
flags=%#X",
+                              *flagp);
+
                     /* function call leaves parse pointing to the ']', except
                      * if we faked it */
                     if (is_posix_class) {
@@ -11537,12 +11588,15 @@ S_handle_sets(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
                 vFAIL("Unexpected character");
 
             case '\\':
-                (void) regclass(pRExC_state, flagp,depth+1,
-                                TRUE, /* means parse just the next thing */
-                                FALSE, /* don't allow multi-char folds */
-                                FALSE, /* don't silence non-portable warnings.
-                                        */
-                                &current);
+                /* regclass() can only return RESTART_UTF8 if multi-char
+                   folds are allowed.  */
+                if (!regclass(pRExC_state, flagp,depth+1,
+                              TRUE, /* means parse just the next thing */
+                              FALSE, /* don't allow multi-char folds */
+                              FALSE, /* don't silence non-portable warnings.  
*/
+                              &current))
+                    FAIL2("panic: regclass returned NULL to handle_sets, 
flags=%#X",
+                          *flagp);
                 /* regclass() will return with parsing just the \ sequence,
                  * leaving the parse pointer at the next thing to parse */
                 RExC_parse--;
@@ -11556,13 +11610,16 @@ S_handle_sets(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
                     RExC_parse++;
                 }
 
-                (void) regclass(pRExC_state, flagp,depth+1,
-                                is_posix_class, /* parse the whole char class
-                                                   only if not a posix class */
-                                FALSE, /* don't allow multi-char folds */
-                                FALSE, /* don't silence non-portable warnings.
-                                        */
-                                &current);
+                /* regclass() can only return RESTART_UTF8 if multi-char
+                   folds are allowed.  */
+                if(!regclass(pRExC_state, flagp,depth+1,
+                             is_posix_class, /* parse the whole char class
+                                                only if not a posix class */
+                             FALSE, /* don't allow multi-char folds */
+                             FALSE, /* don't silence non-portable warnings.  */
+                             &current))
+                    FAIL2("panic: regclass returned NULL to handle_sets, 
flags=%#X",
+                          *flagp);
                 /* function call leaves parse pointing to the ']', except if we
                  * faked it */
                 if (is_posix_class) {
@@ -11737,6 +11794,8 @@ S_handle_sets(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
      * already has all folding taken into consideration, and we don't want
      * regclass() to add to that */
     RExC_flags &= ~RXf_PMf_FOLD;
+    /* regclass() can only return RESTART_UTF8 if multi-char folds are allowed.
+     */
     node = regclass(pRExC_state, flagp,depth+1,
                     FALSE, /* means parse the whole char class */
                     FALSE, /* don't allow multi-char folds */
@@ -11744,6 +11803,8 @@ S_handle_sets(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
                              well have generated non-portable code points, but
                              they're valid on this machine */
                     NULL);
+    if (!node)
+        FAIL2("panic: regclass returned NULL to handle_sets, flags=%#X", 
flagp);
     if (save_fold) {
         RExC_flags |= RXf_PMf_FOLD;
     }
@@ -11793,7 +11854,11 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
      * corresponding bit set if that character is in the list.  For characters
      * above 255, a range list or swash is used.  There are extra bits for \w,
      * etc. in locale ANYOFs, as what these match is not determinable at
-     * compile time */
+     * compile time
+     *
+     * Returns NULL, setting *flagp to RESTART_UTF8 if the sizing scan needs
+     * to be restarted.  This can only happen if ret_invlist is non-NULL.
+     */
 
     dVAR;
     UV prevvalue = OOB_UNICODE, save_prevvalue = OOB_UNICODE;
@@ -11879,8 +11944,9 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
        if (LOC) {
            ANYOF_FLAGS(ret) |= ANYOF_LOCALE;
        }
-       listsv = newSVpvs("# comment\n");
+       listsv = newSVpvs_flags("# comment\n", SVs_TEMP);
        initial_listsv_len = SvCUR(listsv);
+        SvTEMP_off(listsv); /* Grr, TEMPs and mortals are conflated.  */
     }
 
     if (skip_white) {
@@ -11908,12 +11974,10 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
            s++;
        if (*s && c == *s && s[1] == ']') {
            SAVEFREESV(RExC_rx_sv);
-           SAVEFREESV(listsv);
            ckWARN3reg(s+2,
                       "POSIX syntax [%c %c] belongs inside character classes",
                       c, c);
            (void)ReREFCNT_inc(RExC_rx_sv);
-           SvREFCNT_inc_simple_void_NN(listsv);
        }
     }
 
@@ -11965,7 +12029,7 @@ parseit:
             && RExC_parse < RExC_end
             && POSIXCC(UCHARAT(RExC_parse)))
         {
-            namedclass = regpposixcc(pRExC_state, value, listsv, strict);
+            namedclass = regpposixcc(pRExC_state, value, strict);
         }
         else if (value == '\\') {
            if (UTF) {
@@ -12010,6 +12074,8 @@ parseit:
                                       TRUE, /* => charclass */
                                       strict))
                     {
+                        if (*flagp & RESTART_UTF8)
+                            FAIL("panic: grok_bslash_N set RESTART_UTF8");
                         goto parseit;
                     }
                 }
@@ -12203,7 +12269,6 @@ parseit:
                     value = grok_oct(--RExC_parse, &numlen, &flags, NULL);
                    RExC_parse += numlen;
                     if (numlen != 3) {
-                        SAVEFREESV(listsv); /* In case warnings are fatalized 
*/
                         if (strict) {
                             RExC_parse += (UTF) ? UTF8SKIP(RExC_parse) : 1;
                             vFAIL("Need exactly 3 octal digits");
@@ -12220,7 +12285,6 @@ parseit:
                                  form_short_octal_warning(RExC_parse, numlen));
                             (void)ReREFCNT_inc(RExC_rx_sv);
                         }
-                        SvREFCNT_inc_simple_void_NN(listsv);
                     }
                    if (PL_encoding && value < 0x100)
                        goto recode_encoding;
@@ -12244,7 +12308,6 @@ parseit:
            default:
                /* Allow \_ to not give an error */
                if (!SIZE_ONLY && isWORDCHAR(value) && value != '_') {
-                   SAVEFREESV(listsv);
                     if (strict) {
                         vFAIL2("Unrecognized escape \\%c in character class",
                                (int)value);
@@ -12256,7 +12319,6 @@ parseit:
                             (int)value);
                         (void)ReREFCNT_inc(RExC_rx_sv);
                     }
-                   SvREFCNT_inc_simple_void_NN(listsv);
                }
                break;
            }   /* End of switch on char following backslash */
@@ -12302,7 +12364,6 @@ parseit:
                    const int w = (RExC_parse >= rangebegin)
                                   ? RExC_parse - rangebegin
                                   : 0;
-                   SAVEFREESV(listsv); /* in case of fatal warnings */
                     if (strict) {
                         vFAIL4("False [] range \"%*.*s\"", w, w, rangebegin);
                     }
@@ -12315,7 +12376,6 @@ parseit:
                         cp_list = add_cp_to_invlist(cp_list, '-');
                         cp_list = add_cp_to_invlist(cp_list, prevvalue);
                     }
-                    SvREFCNT_inc_simple_void_NN(listsv);
                }
 
                range = 0; /* this was not a true range */
@@ -12851,13 +12911,12 @@ parseit:
 
        ret = reg(pRExC_state, 1, &reg_flags, depth+1);
 
-       *flagp |= reg_flags&(HASWIDTH|SIMPLE|SPSTART|POSTPONED);
+       *flagp |= reg_flags&(HASWIDTH|SIMPLE|SPSTART|POSTPONED|RESTART_UTF8);
 
        RExC_parse = save_parse;
        RExC_end = save_end;
        RExC_in_multi_char_class = 0;
         SvREFCNT_dec_NN(multi_char_matches);
-        SvREFCNT_dec_NN(listsv);
         return ret;
     }
 
@@ -13015,7 +13074,6 @@ parseit:
             RExC_parse = (char *) cur_parse;
 
             SvREFCNT_dec(posixes);
-            SvREFCNT_dec_NN(listsv);
             SvREFCNT_dec(cp_list);
             return ret;
         }
@@ -13383,7 +13441,7 @@ parseit:
         else {
             RExC_emit = orig_emit;
         }
-        return END;
+        return orig_emit;
     }
 
     /* If we didn't do folding, it's because some information isn't available
@@ -13517,7 +13575,6 @@ parseit:
             }
 
             SvREFCNT_dec_NN(cp_list);
-            SvREFCNT_dec_NN(listsv);
             return ret;
         }
     }
@@ -13605,7 +13662,6 @@ parseit:
        && ! HAS_NONLOCALE_RUNTIME_PROPERTY_DEFINITION)
     {
        ARG_SET(ret, ANYOF_NONBITMAP_EMPTY);
-       SvREFCNT_dec_NN(listsv);
     }
     else {
        /* av[0] stores the character class description in its textual form:
@@ -13622,8 +13678,7 @@ parseit:
        SV *rv;
 
        av_store(av, 0, (HAS_NONLOCALE_RUNTIME_PROPERTY_DEFINITION)
-                       ? listsv
-                       : (SvREFCNT_dec_NN(listsv), &PL_sv_undef));
+                       ? SvREFCNT_inc(listsv) : &PL_sv_undef);
        if (swash) {
            av_store(av, 1, swash);
            SvREFCNT_dec_NN(cp_list);
diff --git a/regcomp.h b/regcomp.h
index 9607326..21d2e1e 100644
--- a/regcomp.h
+++ b/regcomp.h
@@ -829,11 +829,9 @@ re.pm, especially to the documentation.
     if (re_debug_flags & RE_DEBUG_EXTRA_GPOS) x )
 
 /* initialization */
-/* get_sv() can return NULL during global destruction.  re_debug_flags can get
- * clobbered by a longjmp, so must be initialized */
+/* get_sv() can return NULL during global destruction. */
 #define GET_RE_DEBUG_FLAGS DEBUG_r({ \
         SV * re_debug_flags_sv = NULL; \
-        re_debug_flags = 0;            \
         re_debug_flags_sv = get_sv(RE_DEBUG_FLAGS, 1); \
         if (re_debug_flags_sv) { \
             if (!SvIOK(re_debug_flags_sv)) \
diff --git a/t/re/pat_advanced.t b/t/re/pat_advanced.t
index 29a64dd..cd067f4 100644
--- a/t/re/pat_advanced.t
+++ b/t/re/pat_advanced.t
@@ -1082,6 +1082,14 @@ sub run_tests {
         eval "q(W) =~ /\\N{$name}/";
         ok ! $w, 'Verify that latin1 letter in name doesnt give warning';
 
+        # This tests the code path that restarts the parse when the recursive
+        # call to S_reg() from within S_grok_bslash_N() discovers that the
+        # pattern needs to be recalculated as UTF-8.  use eval to avoid
+        # needing literal Unicode in this source file:
+        my $r = eval "qr/\\N{\x{100}\x{100}}/";
+        isnt $r, undef, "Generated regex for multi-char UTF-8 charname"
+           or diag($@);
+        ok "\x{100}\x{100}" =~ $r, "which matches";
     }
 
     {
diff --git a/t/re/re_tests b/t/re/re_tests
index c41d529..7e7fc85 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -592,6 +592,10 @@ x(~~)*(?:(?:F)?)?  x~~     y       -       -
 (?(?!a)b|a)    a       y       $&      a
 (?(?=a)b|a)    a       n       -       -
 (?(?=a)a|b)    a       y       $&      a
+(?(?!\x{100})\x{100}|b)        \x{100} n       -       -
+(?(?!\x{100})b|\x{100})        \x{100} y       $&      \x{100}
+(?(?=\x{100})b|\x{100})        \x{100} n       -       -
+(?(?=\x{100})\x{100}|b)        \x{100} y       $&      \x{100}
 (?=(a+?))(\1ab)        aaab    y       $2      aab
 ^(?=(a+?))\1ab aaab    n       -       -
 (\w+:)+        one:    y       $1      one:
diff --git a/utf8.c b/utf8.c
index 1bf3f52..ec3a0ba 100644
--- a/utf8.c
+++ b/utf8.c
@@ -4423,9 +4423,12 @@ The pointer to the PV of the C<dsv> is returned.
 char *
 Perl_sv_uni_display(pTHX_ SV *dsv, SV *ssv, STRLEN pvlim, UV flags)
 {
+    const char * const ptr =
+        isREGEXP(ssv) ? RX_WRAPPED((REGEXP*)ssv) : SvPVX_const(ssv);
+
     PERL_ARGS_ASSERT_SV_UNI_DISPLAY;
 
-     return Perl_pv_uni_display(aTHX_ dsv, (const U8*)SvPVX_const(ssv),
+    return Perl_pv_uni_display(aTHX_ dsv, (const U8*)ptr,
                                SvCUR(ssv), pvlim, flags);
 }
 

--
Perl5 Master Repository

Reply via email to