[perl.git] branch blead, updated. v5.23.6-18-g285b5ca

Karl Williamson Tue, 22 Dec 2015 12:20:13 -0800

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/285b5ca0145796a915dec03e87e0176fd4681041?hp=6bb8489d8abc964749726bdb9bae3c5826a3a9c1>


- Log -----------------------------------------------------------------
commit 285b5ca0145796a915dec03e87e0176fd4681041
Author: Karl Williamson <[email protected]>
Date:   Mon Dec 21 20:52:50 2015 -0700

    PATCH: [perl #126261: Assertion failure on missing [ in qr//
    
    This is the result of the regex compiler creating a temporary buffer to
    parse a portion of the input pattern, and then when an error or warning
    occurs in that buffer, trying to use addresses both inside it and the
    original pattern.
    
    The solution here is a general one, that confines the heavy lifting to
    one macro, plus a little setup and tear-down around the use of the
    temporary buffer.  The comments in the code detail how we relate the
    address of the error in the temporary back to the parallel address in
    the input pattern.

M       regcomp.c
M       t/lib/warnings/regcomp

commit 693ceb05538496aff79244ad7529bc8d153302a8
Author: Karl Williamson <[email protected]>
Date:   Mon Dec 21 20:38:14 2015 -0700

    regcomp.c: update RExC_start when parsing outside input
    
    I noticed this while code reading.  In places, regcomp parses not the
    input pattern but a temporary buffer it constructs, based on that input
    pattern.  RExC_start should be updated so it always is pointing to the
    same buffer as the parse pointer; otherwise segfaults can happen.
    
    I have no idea how one currently can get into the situation this
    protects against, so there are no tests added.

M       regcomp.c

commit 711b303b46e294e0fd67c7f4f1c7a525c6ca76b4
Author: Karl Williamson <[email protected]>
Date:   Mon Dec 21 18:26:37 2015 -0700

    regcomp.c: Add a stable pattern end pointer.
    
    RExC_end is set sometimes during pattern compilation to perhaps another
    string in memory.  Messages are output based on the original string, so
    create an end pointer that is in terms of that original string,
    otherwise could get segfaults.

M       regcomp.c

commit ecf931f78454506f85a16cf9a5fe2381b8a6fac9
Author: Karl Williamson <[email protected]>
Date:   Mon Dec 21 18:18:36 2015 -0700

    t/lib/warnings/regcomp: Fix typo in comment

M       t/lib/warnings/regcomp

commit ae4e8c7693c60c5be8ff56e09b8880cfd4dd6657
Author: Karl Williamson <[email protected]>
Date:   Mon Dec 21 17:56:13 2015 -0700

    regcomp.c: Use macro instead of recalculating
    
    There is a macro that does the job that this code does.  Use it.

M       regcomp.c

commit d528642ab6a7a453eb13fed74c19654859a2d5c3
Author: Karl Williamson <[email protected]>
Date:   Sun Dec 20 21:48:04 2015 -0700

    regcomp.c: Move calculations to common macro
    
    This consolidates identical calculations into a single place, which
    makes things easier to maintain.
    
    Probably the reason they previously were dispersed, is because now the
    common macro has to evaluate the same expression more than once.  Since
    the macro is used to return a list, it can't be turned into a single
    statement.
    
    Any decent optimizing compiler will extract the common subexpressions
    and evaluate them just once.  But even if not, the macro is called only
    in the event of a fatal error (in which case speed is not important), or
    to raise a warning, which we expect to be rare, and the extra work is
    negligible in comparison with what is needed to output the message.

M       regcomp.c

commit e2f5e63d76a696ba4c441146b555fcbb3ac3c077
Author: Karl Williamson <[email protected]>
Date:   Mon Dec 21 13:37:20 2015 -0700

    regcomp.h: reword some comments

M       regcomp.h

commit 6f2a89df6e4e9fc861c86650325fb46a163e4cd0
Author: Karl Williamson <[email protected]>
Date:   Mon Dec 21 14:47:05 2015 -0700

    regcomp.c: Make some params to a static fcn const
    
    This is just acting on the TODO comment.

M       embed.fnc
M       proto.h
M       regcomp.c

commit 9d021dec6a8a15e47261428914188c8ab191b3b5
Author: Karl Williamson <[email protected]>
Date:   Thu Nov 19 20:51:04 2015 -0700

    regcomp.c: Add 2 basic assertions
    
    These should be true because an SV* should always have a trailing NUL,
    but a lot of things in this code depend on it.  It's worthwhile to point
    that out; I wasn't sure it was true until I investigated.  And an
    assert() makes sure it is really true

M       regcomp.c

commit aba224f7dfa525c03f6ef72a78544fb49d03a815
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 20 22:23:00 2015 -0600

    pp_hot.c: Add assertion
    
    This will make the cause of any future failures more clear.

M       pp_hot.c

commit dc6b097813c9fe5e64d21ab1f4a4b15db79eef09
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 20 22:21:42 2015 -0600

    perlapi: Clarify 'string' vs. buffer
    
    A string strictly is NUL terminated, but our terminology is lax

M       autodoc.pl
M       handy.h

commit 2863dafa090626922ead4c80d687c71c1a0afc55
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 20 22:08:59 2015 -0600

    utf8.h: Add 2 assertions
    
    This makes sure in DEBUGGING builds that the macro is called correctly.

M       utf8.h
-----------------------------------------------------------------------

Summary of changes:
 autodoc.pl             |   5 +
 embed.fnc              |   6 +-
 handy.h                |  45 ++++-----
 pp_hot.c               |   1 +
 proto.h                |   2 +-
 regcomp.c              | 256 +++++++++++++++++++++++++++++++++----------------
 regcomp.h              |  65 +++++++------
 t/lib/warnings/regcomp |  18 +++-
 utf8.h                 |   6 +-
 9 files changed, 260 insertions(+), 144 deletions(-)

diff --git a/autodoc.pl b/autodoc.pl
index 865ee08..ff548fc 100644
--- a/autodoc.pl
+++ b/autodoc.pl
@@ -396,6 +396,11 @@ not part of the public API, and should not be used by 
extension writers at
 all.  For these reasons, blindly using functions listed in proto.h is to be
 avoided when writing extensions.
 
+In Perl, unlike C, a string of characters may generally contain embedded
+C<NUL> characters.  Sometimes in the documentation a Perl string is referred
+to as a "buffer" to distinguish it from a C string, but sometimes they are
+both just referred to as strings.
+
 Note that all Perl API global variables must be referenced with the C<PL_>
 prefix.  Again, those not listed here are not to be used by extension writers,
 and can be changed or removed without notice; same with macros.
diff --git a/embed.fnc b/embed.fnc
index 2002d83..178892e 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -2175,8 +2175,10 @@ Es       |bool   |grok_bslash_N  |NN RExC_state_t 
*pRExC_state               \
                                |const U32 depth
 Es     |void   |reginsert      |NN RExC_state_t *pRExC_state \
                                |U8 op|NN regnode *opnd|U32 depth
-Es     |void   |regtail        |NN RExC_state_t *pRExC_state \
-                               |NN regnode *p|NN const regnode *val|U32 depth
+Es     |void   |regtail        |NN RExC_state_t * pRExC_state              \
+                               |NN const regnode * const p                 \
+                               |NN const regnode * const val               \
+                               |const U32 depth
 Es     |SV *   |reg_scan_name  |NN RExC_state_t *pRExC_state \
                                |U32 flags
 Es     |U32    |join_exact     |NN RExC_state_t *pRExC_state \
diff --git a/handy.h b/handy.h
index 228662f..60ec83c 100644
--- a/handy.h
+++ b/handy.h
@@ -421,37 +421,38 @@ string/length pair.
 =head1 Miscellaneous Functions
 
 =for apidoc Am|bool|strNE|char* s1|char* s2
-Test two strings to see if they are different.  Returns true or
-false.
+Test two C<NUL>-terminated strings to see if they are different.  Returns true
+or false.
 
 =for apidoc Am|bool|strEQ|char* s1|char* s2
-Test two strings to see if they are equal.  Returns true or false.
+Test two C<NUL>-terminated strings to see if they are equal.  Returns true or
+false.
 
 =for apidoc Am|bool|strLT|char* s1|char* s2
-Test two strings to see if the first, C<s1>, is less than the second,
-C<s2>.  Returns true or false.
+Test two C<NUL>-terminated strings to see if the first, C<s1>, is less than the
+second, C<s2>.  Returns true or false.
 
 =for apidoc Am|bool|strLE|char* s1|char* s2
-Test two strings to see if the first, C<s1>, is less than or equal to the
-second, C<s2>.  Returns true or false.
+Test two C<NUL>-terminated strings to see if the first, C<s1>, is less than or
+equal to the second, C<s2>.  Returns true or false.
 
 =for apidoc Am|bool|strGT|char* s1|char* s2
-Test two strings to see if the first, C<s1>, is greater than the second,
-C<s2>.  Returns true or false.
+Test two C<NUL>-terminated strings to see if the first, C<s1>, is greater than
+the second, C<s2>.  Returns true or false.
 
 =for apidoc Am|bool|strGE|char* s1|char* s2
-Test two strings to see if the first, C<s1>, is greater than or equal to
-the second, C<s2>.  Returns true or false.
+Test two C<NUL>-terminated strings to see if the first, C<s1>, is greater than
+or equal to the second, C<s2>.  Returns true or false.
 
 =for apidoc Am|bool|strnNE|char* s1|char* s2|STRLEN len
-Test two strings to see if they are different.  The C<len> parameter
-indicates the number of bytes to compare.  Returns true or false.  (A
+Test two C<NUL>-terminated strings to see if they are different.  The C<len>
+parameter indicates the number of bytes to compare.  Returns true or false.  (A
 wrapper for C<strncmp>).
 
 =for apidoc Am|bool|strnEQ|char* s1|char* s2|STRLEN len
-Test two strings to see if they are equal.  The C<len> parameter indicates
-the number of bytes to compare.  Returns true or false.  (A wrapper for
-C<strncmp>).
+Test two C<NUL>-terminated strings to see if they are equal.  The C<len>
+parameter indicates the number of bytes to compare.  Returns true or false.  (A
+wrapper for C<strncmp>).
 
 =for apidoc Am|bool|memEQ|char* s1|char* s2|STRLEN len
 Test two buffers (which may contain embedded C<NUL> characters, to see if they
@@ -540,9 +541,9 @@ C<isWORDCHAR_uni(0x100)> returns TRUE, since 0x100 is LATIN 
CAPITAL LETTER A
 WITH MACRON in Unicode, and is a word character.
 
 Variant C<isFOO_utf8> is like C<isFOO_uni>, but the input is a pointer to a
-(known to be well-formed) UTF-8 encoded string (C<U8*> or C<char*>).  The
-classification of just the first (possibly multi-byte) character in the string
-is tested.
+(known to be well-formed) UTF-8 encoded string (C<U8*> or C<char*>, and
+possibly containing embedded C<NUL> characters).  The classification of just
+the first (possibly multi-byte) character in the string is tested.
 
 Variant C<isFOO_LC> is like the C<isFOO_A> and C<isFOO_L1> variants, but the
 result is based on the current locale, which is what C<LC> in the name stands
@@ -559,9 +560,9 @@ returns the same as C<isFOO_LC> for input code points less 
than 256, and
 returns the hard-coded, not-affected-by-locale, Unicode results for larger 
ones.
 
 Variant C<isFOO_LC_utf8> is like C<isFOO_LC_uvchr>, but the input is a pointer
-to a (known to be well-formed) UTF-8 encoded string (C<U8*> or C<char*>).  The
-classification of just the first (possibly multi-byte) character in the string
-is tested.
+to a (known to be well-formed) UTF-8 encoded string (C<U8*> or C<char*>, and
+possibly containing embedded C<NUL> characters).  The classification of just
+the first (possibly multi-byte) character in the string is tested.
 
 =for apidoc Am|bool|isALPHA|char ch
 Returns a boolean indicating whether the specified character is an
diff --git a/pp_hot.c b/pp_hot.c
index ff9e594..f9790a2 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -3130,6 +3130,7 @@ PP(pp_subst)
                              s == m,    /* Yields minend of 0 or 1 */
                             TARG, NULL,
                     REXEC_NOT_FIRST|REXEC_IGNOREPOS|REXEC_FAIL_ON_UNDERFLOW));
+        assert(strend >= s);
        sv_catpvn_nomg_maybeutf8(dstr, s, strend - s, DO_UTF8(TARG));
 
        if (rpm->op_pmflags & PMf_NONDESTRUCT) {
diff --git a/proto.h b/proto.h
index 7d5ea26..0128cc9 100644
--- a/proto.h
+++ b/proto.h
@@ -4793,7 +4793,7 @@ STATIC regnode*   S_regpiece(pTHX_ RExC_state_t 
*pRExC_state, I32 *flagp, U32 dept
 PERL_STATIC_INLINE I32 S_regpposixcc(pTHX_ RExC_state_t *pRExC_state, I32 
value, const bool strict);
 #define PERL_ARGS_ASSERT_REGPPOSIXCC   \
        assert(pRExC_state)
-STATIC void    S_regtail(pTHX_ RExC_state_t *pRExC_state, regnode *p, const 
regnode *val, U32 depth);
+STATIC void    S_regtail(pTHX_ RExC_state_t * pRExC_state, const regnode * 
const p, const regnode * const val, const U32 depth);
 #define PERL_ARGS_ASSERT_REGTAIL       \
        assert(pRExC_state); assert(p); assert(val)
 STATIC void    S_scan_commit(pTHX_ const RExC_state_t *pRExC_state, struct 
scan_data_t *data, SSize_t *minlenp, int is_inf);
diff --git a/regcomp.c b/regcomp.c
index 8474e82..ab7a5d3 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -131,6 +131,7 @@ struct RExC_state_t {
     U32                flags;                  /* RXf_* are we folding, 
multilining? */
     U32                pm_flags;               /* PMf_* stuff from the calling 
PMOP */
     char       *precomp;               /* uncompiled string. */
+    char       *precomp_end;           /* pointer to end of uncompiled string. 
*/
     REGEXP     *rx_sv;                 /* The SV that is the regexp. */
     regexp     *rx;                    /* perl core regexp structure */
     regexp_internal    *rxi;           /* internal data for regexp object
@@ -138,6 +139,8 @@ struct RExC_state_t {
     char       *start;                 /* Start of input for compile */
     char       *end;                   /* End of input for compile */
     char       *parse;                 /* Input-scan pointer. */
+    char        *adjusted_start;        /* 'start', adjusted.  See code use */
+    STRLEN      precomp_adj;            /* an offset beyond precomp.  See code 
use */
     SSize_t    whilem_seen;            /* number of WHILEM in this expr */
     regnode    *emit_start;            /* Start of emitted-code area */
     regnode    *emit_bound;            /* First regnode outside of the
@@ -220,6 +223,9 @@ struct RExC_state_t {
 #define RExC_flags     (pRExC_state->flags)
 #define RExC_pm_flags  (pRExC_state->pm_flags)
 #define RExC_precomp   (pRExC_state->precomp)
+#define RExC_precomp_adj (pRExC_state->precomp_adj)
+#define RExC_adjusted_start  (pRExC_state->adjusted_start)
+#define RExC_precomp_end (pRExC_state->precomp_end)
 #define RExC_rx_sv     (pRExC_state->rx_sv)
 #define RExC_rx                (pRExC_state->rx)
 #define RExC_rxi       (pRExC_state->rxi)
@@ -554,9 +560,64 @@ static const scan_data_t zero_scan_data =
 #define REPORT_LOCATION " in regex; marked by " MARKER1    \
                         " in m/%"UTF8f MARKER2 "%"UTF8f"/"
 
-#define REPORT_LOCATION_ARGS(offset)            \
-                UTF8fARG(UTF, offset, RExC_precomp), \
-                UTF8fARG(UTF, RExC_end - RExC_precomp - offset, RExC_precomp + 
offset)
+/* The code in this file in places uses one level of recursion with parsing
+ * rebased to an alternate string constructed by us in memory.  This can take
+ * the form of something that is completely different from the input, or
+ * something that uses the input as part of the alternate.  In the first case,
+ * there should be no possibility of an error, as we are in complete control of
+ * the alternate string.  But in the second case we don't control the input
+ * portion, so there may be errors in that.  Here's an example:
+ *      /[abc\x{DF}def]/ui
+ * is handled specially because \x{df} folds to a sequence of more than one
+ * character, 'ss'.  What is done is to create and parse an alternate string,
+ * which looks like this:
+ *      /(?:\x{DF}|[abc\x{DF}def])/ui
+ * where it uses the input unchanged in the middle of something it constructs,
+ * which is a branch for the DF outside the character class, and clustering
+ * parens around the whole thing. (It knows enough to skip the DF inside the
+ * class while in this substitute parse.) 'abc' and 'def' may have errors that
+ * need to be reported.  The general situation looks like this:
+ *
+ *              sI                       tI               xI       eI
+ * Input:       ----------------------------------------------------
+ * Constructed:         ---------------------------------------------------
+ *                      sC               tC               xC       eC     EC
+ *
+ * The input string sI..eI is the input pattern.  The string sC..EC is the
+ * constructed substitute parse string.  The portions sC..tC and eC..EC are
+ * constructed by us.  The portion tC..eC is an exact duplicate of the input
+ * pattern tI..eI.  In the diagram, these are vertically aligned.  Suppose that
+ * while parsing, we find an error at xC.  We want to display a message showing
+ * the real input string.  Thus we need to find the point xI in it which
+ * corresponds to xC.  xC >= tC, since the portion of the string sC..tC has
+ * been constructed by us, and so shouldn't have errors.  We get:
+ *
+ *      xI = sI + (tI - sI) + (xC - tC)
+ *
+ * and, the offset into sI is:
+ *
+ *      (xI - sI) = (tI - sI) + (xC - tC)
+ *
+ * When the substitute is constructed, we save (tI -sI) as RExC_precomp_adj,
+ * and we save tC as RExC_adjusted_start.
+ */
+
+#define tI_sI           RExC_precomp_adj
+#define tC              RExC_adjusted_start
+#define sC              RExC_precomp
+#define xI_offset(xC)   ((IV) (tI_sI + (xC - tC)))
+#define xI(xC)          (sC + xI_offset(xC))
+#define eC              RExC_precomp_end
+
+#define REPORT_LOCATION_ARGS(xC)                                            \
+    UTF8fARG(UTF,                                                           \
+             (xI(xC) > eC) /* Don't run off end */                          \
+              ? eC - sC   /* Length before the <--HERE */                   \
+              : xI_offset(xC),                                              \
+             sC),         /* The input pattern printed up to the <--HERE */ \
+    UTF8fARG(UTF,                                                           \
+             (xI(xC) > eC) ? 0 : eC - xI(xC), /* Length after <--HERE */    \
+             (xI(xC) > eC) ? eC : xI(xC))     /* pattern after <--HERE */
 
 /* Used to point after bad bytes for an error message, but avoid skipping
  * past a nul byte. */
@@ -569,7 +630,7 @@ static const scan_data_t zero_scan_data =
  */
 #define _FAIL(code) STMT_START {                                       \
     const char *ellipses = "";                                         \
-    IV len = RExC_end - RExC_precomp;                                  \
+    IV len = RExC_precomp_end - RExC_precomp;                                  
\
                                                                        \
     if (!SIZE_ONLY)                                                    \
        SAVEFREESV(RExC_rx_sv);                                         \
@@ -593,10 +654,8 @@ static const scan_data_t zero_scan_data =
  * Simple_vFAIL -- like FAIL, but marks the current location in the scan
  */
 #define        Simple_vFAIL(m) STMT_START {                                    
\
-    const IV offset =                                                   \
-        (RExC_parse > RExC_end ? RExC_end : RExC_parse) - RExC_precomp; \
     Perl_croak(aTHX_ "%s" REPORT_LOCATION,                             \
-           m, REPORT_LOCATION_ARGS(offset));   \
+           m, REPORT_LOCATION_ARGS(RExC_parse));                       \
 } STMT_END
 
 /*
@@ -612,9 +671,8 @@ static const scan_data_t zero_scan_data =
  * Like Simple_vFAIL(), but accepts two arguments.
  */
 #define        Simple_vFAIL2(m,a1) STMT_START {                        \
-    const IV offset = RExC_parse - RExC_precomp;                       \
-    S_re_croak2(aTHX_ UTF, m, REPORT_LOCATION, a1,                     \
-                      REPORT_LOCATION_ARGS(offset));   \
+    S_re_croak2(aTHX_ UTF, m, REPORT_LOCATION, a1,             \
+                      REPORT_LOCATION_ARGS(RExC_parse));       \
 } STMT_END
 
 /*
@@ -631,9 +689,8 @@ static const scan_data_t zero_scan_data =
  * Like Simple_vFAIL(), but accepts three arguments.
  */
 #define        Simple_vFAIL3(m, a1, a2) STMT_START {                   \
-    const IV offset = RExC_parse - RExC_precomp;               \
     S_re_croak2(aTHX_ UTF, m, REPORT_LOCATION, a1, a2,         \
-           REPORT_LOCATION_ARGS(offset));      \
+           REPORT_LOCATION_ARGS(RExC_parse));                  \
 } STMT_END
 
 /*
@@ -649,9 +706,8 @@ static const scan_data_t zero_scan_data =
  * Like Simple_vFAIL(), but accepts four arguments.
  */
 #define        Simple_vFAIL4(m, a1, a2, a3) STMT_START {               \
-    const IV offset = RExC_parse - RExC_precomp;               \
-    S_re_croak2(aTHX_ UTF, m, REPORT_LOCATION, a1, a2, a3,             \
-           REPORT_LOCATION_ARGS(offset));      \
+    S_re_croak2(aTHX_ UTF, m, REPORT_LOCATION, a1, a2, a3,     \
+           REPORT_LOCATION_ARGS(RExC_parse));                  \
 } STMT_END
 
 #define        vFAIL4(m,a1,a2,a3) STMT_START {                 \
@@ -661,20 +717,18 @@ static const scan_data_t zero_scan_data =
 } STMT_END
 
 /* A specialized version of vFAIL2 that works with UTF8f */
-#define vFAIL2utf8f(m, a1) STMT_START {            \
-    const IV offset = RExC_parse - RExC_precomp;   \
-    if (!SIZE_ONLY)                                \
-        SAVEFREESV(RExC_rx_sv);                    \
-    S_re_croak2(aTHX_ UTF, m, REPORT_LOCATION, a1, \
-            REPORT_LOCATION_ARGS(offset));         \
+#define vFAIL2utf8f(m, a1) STMT_START {             \
+    if (!SIZE_ONLY)                                 \
+        SAVEFREESV(RExC_rx_sv);                     \
+    S_re_croak2(aTHX_ UTF, m, REPORT_LOCATION, a1,  \
+            REPORT_LOCATION_ARGS(RExC_parse));      \
 } STMT_END
 
 #define vFAIL3utf8f(m, a1, a2) STMT_START {             \
-    const IV offset = RExC_parse - RExC_precomp;        \
     if (!SIZE_ONLY)                                     \
         SAVEFREESV(RExC_rx_sv);                         \
     S_re_croak2(aTHX_ UTF, m, REPORT_LOCATION, a1, a2,  \
-            REPORT_LOCATION_ARGS(offset));              \
+            REPORT_LOCATION_ARGS(RExC_parse));          \
 } STMT_END
 
 /* These have asserts in them because of [perl #122671] Many warnings in
@@ -685,84 +739,86 @@ static const scan_data_t zero_scan_data =
 
 /* m is not necessarily a "literal string", in this macro */
 #define reg_warn_non_literal_string(loc, m) STMT_START {                \
-    const IV offset = loc - RExC_precomp;                               \
-    __ASSERT_(PASS2) Perl_warner(aTHX_ packWARN(WARN_REGEXP), "%s" 
REPORT_LOCATION,      \
-            m, REPORT_LOCATION_ARGS(offset));       \
+    __ASSERT_(PASS2) Perl_warner(aTHX_ packWARN(WARN_REGEXP),           \
+                                       "%s" REPORT_LOCATION,            \
+                                  m, REPORT_LOCATION_ARGS(loc));        \
 } STMT_END
 
 #define        ckWARNreg(loc,m) STMT_START {                                   
\
-    const IV offset = loc - RExC_precomp;                              \
-    __ASSERT_(PASS2) Perl_ck_warner(aTHX_ packWARN(WARN_REGEXP), m 
REPORT_LOCATION,    \
-           REPORT_LOCATION_ARGS(offset));              \
+    __ASSERT_(PASS2) Perl_ck_warner(aTHX_ packWARN(WARN_REGEXP),        \
+                                          m REPORT_LOCATION,           \
+                                         REPORT_LOCATION_ARGS(loc));   \
 } STMT_END
 
 #define        vWARN(loc, m) STMT_START {                                      
\
-    const IV offset = loc - RExC_precomp;                              \
-    __ASSERT_(PASS2) Perl_warner(aTHX_ packWARN(WARN_REGEXP), m 
REPORT_LOCATION,       \
-           REPORT_LOCATION_ARGS(offset));              \
+    __ASSERT_(PASS2) Perl_warner(aTHX_ packWARN(WARN_REGEXP),           \
+                                       m REPORT_LOCATION,               \
+                                       REPORT_LOCATION_ARGS(loc));      \
 } STMT_END
 
 #define        vWARN_dep(loc, m) STMT_START {                                  
\
-    const IV offset = loc - RExC_precomp;                              \
-    __ASSERT_(PASS2) Perl_warner(aTHX_ packWARN(WARN_DEPRECATED), m 
REPORT_LOCATION,   \
-           REPORT_LOCATION_ARGS(offset));              \
+    __ASSERT_(PASS2) Perl_warner(aTHX_ packWARN(WARN_DEPRECATED),       \
+                                       m REPORT_LOCATION,               \
+                                      REPORT_LOCATION_ARGS(loc));      \
 } STMT_END
 
 #define        ckWARNdep(loc,m) STMT_START {                                   
\
-    const IV offset = loc - RExC_precomp;                              \
-    __ASSERT_(PASS2) Perl_ck_warner_d(aTHX_ packWARN(WARN_DEPRECATED),         
        \
-           m REPORT_LOCATION,                                          \
-           REPORT_LOCATION_ARGS(offset));              \
+    __ASSERT_(PASS2) Perl_ck_warner_d(aTHX_ packWARN(WARN_DEPRECATED),  \
+                                           m REPORT_LOCATION,          \
+                                           REPORT_LOCATION_ARGS(loc)); \
 } STMT_END
 
-#define        ckWARNregdep(loc,m) STMT_START {                                
\
-    const IV offset = loc - RExC_precomp;                              \
-    __ASSERT_(PASS2) Perl_ck_warner_d(aTHX_ packWARN2(WARN_DEPRECATED, 
WARN_REGEXP),   \
-           m REPORT_LOCATION,                                          \
-           REPORT_LOCATION_ARGS(offset));              \
+#define        ckWARNregdep(loc,m) STMT_START {                                
    \
+    __ASSERT_(PASS2) Perl_ck_warner_d(aTHX_ packWARN2(WARN_DEPRECATED,      \
+                                                      WARN_REGEXP),         \
+                                            m REPORT_LOCATION,             \
+                                            REPORT_LOCATION_ARGS(loc));    \
 } STMT_END
 
-#define        ckWARN2reg_d(loc,m, a1) STMT_START {                            
\
-    const IV offset = loc - RExC_precomp;                              \
-    __ASSERT_(PASS2) Perl_ck_warner_d(aTHX_ packWARN(WARN_REGEXP),             
        \
-           m REPORT_LOCATION,                                          \
-           a1, REPORT_LOCATION_ARGS(offset));  \
+#define        ckWARN2reg_d(loc,m, a1) STMT_START {                            
    \
+    __ASSERT_(PASS2) Perl_ck_warner_d(aTHX_ packWARN(WARN_REGEXP),          \
+                                           m REPORT_LOCATION,              \
+                                           a1, REPORT_LOCATION_ARGS(loc)); \
 } STMT_END
 
-#define        ckWARN2reg(loc, m, a1) STMT_START {                             
\
-    const IV offset = loc - RExC_precomp;                              \
-    __ASSERT_(PASS2) Perl_ck_warner(aTHX_ packWARN(WARN_REGEXP), m 
REPORT_LOCATION,    \
-           a1, REPORT_LOCATION_ARGS(offset));  \
+#define        ckWARN2reg(loc, m, a1) STMT_START {                             
    \
+    __ASSERT_(PASS2) Perl_ck_warner(aTHX_ packWARN(WARN_REGEXP),            \
+                                          m REPORT_LOCATION,               \
+                                          a1, REPORT_LOCATION_ARGS(loc));   \
 } STMT_END
 
-#define        vWARN3(loc, m, a1, a2) STMT_START {                             
\
-    const IV offset = loc - RExC_precomp;                              \
-    __ASSERT_(PASS2) Perl_warner(aTHX_ packWARN(WARN_REGEXP), m 
REPORT_LOCATION,               \
-           a1, a2, REPORT_LOCATION_ARGS(offset));      \
+#define        vWARN3(loc, m, a1, a2) STMT_START {                             
    \
+    __ASSERT_(PASS2) Perl_warner(aTHX_ packWARN(WARN_REGEXP),               \
+                                       m REPORT_LOCATION,                   \
+                                      a1, a2, REPORT_LOCATION_ARGS(loc));  \
 } STMT_END
 
-#define        ckWARN3reg(loc, m, a1, a2) STMT_START {                         
\
-    const IV offset = loc - RExC_precomp;                              \
-    __ASSERT_(PASS2) Perl_ck_warner(aTHX_ packWARN(WARN_REGEXP), m 
REPORT_LOCATION,    \
-           a1, a2, REPORT_LOCATION_ARGS(offset));      \
+#define        ckWARN3reg(loc, m, a1, a2) STMT_START {                         
    \
+    __ASSERT_(PASS2) Perl_ck_warner(aTHX_ packWARN(WARN_REGEXP),            \
+                                          m REPORT_LOCATION,                \
+                                         a1, a2,                           \
+                                          REPORT_LOCATION_ARGS(loc));       \
 } STMT_END
 
 #define        vWARN4(loc, m, a1, a2, a3) STMT_START {                         
\
-    const IV offset = loc - RExC_precomp;                              \
-    __ASSERT_(PASS2) Perl_warner(aTHX_ packWARN(WARN_REGEXP), m 
REPORT_LOCATION,               \
-           a1, a2, a3, REPORT_LOCATION_ARGS(offset)); \
+    __ASSERT_(PASS2) Perl_warner(aTHX_ packWARN(WARN_REGEXP),           \
+                                       m REPORT_LOCATION,               \
+                                      a1, a2, a3,                      \
+                                       REPORT_LOCATION_ARGS(loc));      \
 } STMT_END
 
 #define        ckWARN4reg(loc, m, a1, a2, a3) STMT_START {                     
\
-    const IV offset = loc - RExC_precomp;                              \
-    __ASSERT_(PASS2) Perl_ck_warner(aTHX_ packWARN(WARN_REGEXP), m 
REPORT_LOCATION,    \
-           a1, a2, a3, REPORT_LOCATION_ARGS(offset)); \
+    __ASSERT_(PASS2) Perl_ck_warner(aTHX_ packWARN(WARN_REGEXP),        \
+                                          m REPORT_LOCATION,            \
+                                         a1, a2, a3,                   \
+                                          REPORT_LOCATION_ARGS(loc));   \
 } STMT_END
 
 #define        vWARN5(loc, m, a1, a2, a3, a4) STMT_START {                     
\
-    const IV offset = loc - RExC_precomp;                              \
-    __ASSERT_(PASS2) Perl_warner(aTHX_ packWARN(WARN_REGEXP), m 
REPORT_LOCATION,               \
-           a1, a2, a3, a4, REPORT_LOCATION_ARGS(offset)); \
+    __ASSERT_(PASS2) Perl_warner(aTHX_ packWARN(WARN_REGEXP),           \
+                                       m REPORT_LOCATION,              \
+                                      a1, a2, a3, a4,                  \
+                                       REPORT_LOCATION_ARGS(loc));      \
 } STMT_END
 
 /* Macros for recording node offsets.   20001227 [email protected]
@@ -4813,7 +4869,7 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode 
**scanp,
                    Perl_ck_warner(aTHX_ packWARN(WARN_REGEXP),
                        "Quantifier unexpected on zero-length expression "
                        "in regex m/%"UTF8f"/",
-                        UTF8fARG(UTF, RExC_end - RExC_precomp,
+                        UTF8fARG(UTF, RExC_precomp_end - RExC_precomp,
                                  RExC_precomp));
                    (void)ReREFCNT_inc(RExC_rx_sv);
                }
@@ -6686,6 +6742,7 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
     }
 
     RExC_precomp = exp;
+    RExC_precomp_adj = 0;
     RExC_flags = rx_flags;
     RExC_pm_flags = pm_flags;
 
@@ -6719,8 +6776,9 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
 
     /* First pass: determine size, legality. */
     RExC_parse = exp;
-    RExC_start = exp;
+    RExC_start = RExC_adjusted_start = exp;
     RExC_end = exp + plen;
+    RExC_precomp_end = RExC_end;
     RExC_naughty = 0;
     RExC_npar = 1;
     RExC_nestroot = 0;
@@ -6740,6 +6798,15 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int 
pat_count,
     RExC_recurse_count = 0;
     pRExC_state->code_index = 0;
 
+    /* This NUL is guaranteed because the pattern comes from an SV*, and the sv
+     * code makes sure the final byte is an uncounted NUL.  But should this
+     * ever not be the case, lots of things could read beyond the end of the
+     * buffer: loops like
+     *      while(isFOO(*RExC_parse)) RExC_parse++;
+     *      strchr(RExC_parse, "foo");
+     * etc.  So it is worth noting. */
+    assert(*RExC_end == '\0');
+
     DEBUG_PARSE_r(
        PerlIO_printf(Perl_debug_log, "Starting first pass (sizing)\n");
         RExC_lastnum=0;
@@ -9863,6 +9930,13 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 
*flagp,U32 depth)
 
     *flagp = 0;                                /* Tentatively. */
 
+    /* Having this true makes it feasible to have a lot fewer tests for the
+     * parse pointer being in scope.  For example, we can write
+     *      while(isFOO(*RExC_parse)) RExC_parse++;
+     * instead of
+     *      while(RExC_parse < RExC_end && isFOO(*RExC_parse)) RExC_parse++;
+     */
+    assert(*RExC_end == '\0');
 
     /* Make an OPEN node, if parenthesized. */
     if (paren) {
@@ -11334,6 +11408,7 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state,
        SV * substitute_parse;
        STRLEN len;
        char *orig_end = RExC_end;
+       char *save_start = RExC_start;
         I32 flags;
 
         /* Count the code points, if desired, in the sequence */
@@ -11379,7 +11454,8 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state,
        }
         sv_catpv(substitute_parse, ")");
 
-       RExC_parse = SvPV(substitute_parse, len);
+        RExC_parse = RExC_start = RExC_adjusted_start = SvPV(substitute_parse,
+                                                             len);
 
        /* Don't allow empty number */
        if (len < (STRLEN) 8) {
@@ -11409,6 +11485,7 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state,
         }
 
         /* Restore the saved values */
+       RExC_start = RExC_adjusted_start = save_start;
        RExC_parse = endbrace;
        RExC_end = orig_end;
        RExC_override_recoding = 0;
@@ -13528,10 +13605,7 @@ S_handle_regex_sets(pTHX_ RExC_state_t *pRExC_state, 
SV** return_invlist,
     Perl_ck_warner_d(aTHX_
         packWARN(WARN_EXPERIMENTAL__REGEX_SETS),
         "The regex_sets feature is experimental" REPORT_LOCATION,
-            UTF8fARG(UTF, (RExC_parse - RExC_precomp), RExC_precomp),
-            UTF8fARG(UTF,
-                     RExC_end - RExC_start - (RExC_parse - RExC_precomp),
-                     RExC_precomp + (RExC_parse - RExC_precomp)));
+        REPORT_LOCATION_ARGS(RExC_parse));
 
     /* Everything in this construct is a metacharacter.  Operands begin with
      * either a '\' (for an escape sequence), or a '[' for a bracketed
@@ -15449,11 +15523,17 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
        STRLEN len;
        char *save_end = RExC_end;
        char *save_parse = RExC_parse;
+       char *save_start = RExC_start;
+        STRLEN prefix_end = 0;      /* We copy the character class after a
+                                       prefix supplied here.  This is the size
+                                       + 1 of that prefix */
         bool first_time = TRUE;     /* First multi-char occurrence doesn't get
                                        a "|" */
         I32 reg_flags;
 
         assert(! invert);
+        assert(RExC_precomp_adj == 0); /* Only one level of recursion allowed 
*/
+
 #if 0   /* Have decided not to deal with multi-char folds in inverted classes,
            because too confusing */
         if (invert) {
@@ -15487,6 +15567,7 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
          * multi-character folds, have to include it in recursive parsing */
         if (element_count) {
             sv_catpv(substitute_parse, "|[");
+            prefix_end = SvCUR(substitute_parse);
             sv_catpvn(substitute_parse, orig_parse, RExC_parse - orig_parse);
             sv_catpv(substitute_parse, "]");
         }
@@ -15501,7 +15582,12 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
         }
 #endif
 
-       RExC_parse = SvPV(substitute_parse, len);
+        /* Set up the data structure so that any errors will be properly
+         * reported.  See the comments at the definition of
+         * REPORT_LOCATION_ARGS for details */
+        RExC_precomp_adj = orig_parse - RExC_precomp;
+       RExC_start =  RExC_parse = SvPV(substitute_parse, len);
+        RExC_adjusted_start = RExC_start + prefix_end;
        RExC_end = RExC_parse + len;
         RExC_in_multi_char_class = 1;
        RExC_override_recoding = 1;
@@ -15511,7 +15597,10 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
 
        *flagp |= 
reg_flags&(HASWIDTH|SIMPLE|SPSTART|POSTPONED|RESTART_PASS1|NEED_UTF8);
 
-       RExC_parse = save_parse;
+        /* And restore so can parse the rest of the pattern */
+        RExC_parse = save_parse;
+       RExC_start = RExC_adjusted_start = save_start;
+        RExC_precomp_adj = 0;
        RExC_end = save_end;
        RExC_in_multi_char_class = 0;
        RExC_override_recoding = 0;
@@ -16871,10 +16960,11 @@ S_reginsert(pTHX_ RExC_state_t *pRExC_state, U8 op, 
regnode *opnd, U32 depth)
 - regtail - set the next-pointer at the end of a node chain of p to val.
 - SEE ALSO: regtail_study
 */
-/* TODO: All three parms should be const */
 STATIC void
-S_regtail(pTHX_ RExC_state_t *pRExC_state, regnode *p,
-                const regnode *val,U32 depth)
+S_regtail(pTHX_ RExC_state_t * pRExC_state,
+                const regnode * const p,
+                const regnode * const val,
+                const U32 depth)
 {
     regnode *scan;
     GET_RE_DEBUG_FLAGS_DECL;
@@ -16888,7 +16978,7 @@ S_regtail(pTHX_ RExC_state_t *pRExC_state, regnode *p,
        return;
 
     /* Find last node. */
-    scan = p;
+    scan = (regnode *) p;
     for (;;) {
        regnode * const temp = regnext(scan);
         DEBUG_PARSE_r({
diff --git a/regcomp.h b/regcomp.h
index 5c12a21..a8955f3 100644
--- a/regcomp.h
+++ b/regcomp.h
@@ -403,7 +403,7 @@ struct regnode_ssc {
  *  2)  A subset of item 1) is if all possible code points outside the bitmap
  *      match.  This is a common occurrence when the class is complemented,
  *      like /[^ij]/.  Therefore a bit is reserved to indicate this,
- *      ANYOF_MATCHES_ALL_ABOVE_BITMAP.  If it became necessary, this bit could
+ *      ANYOF_MATCHES_ALL_ABOVE_BITMAP.  If it became necessary, this flag 
could
  *      be replaced by using the normal swash mechanism, but with a performance
  *      penalty.
  *  3)  Under /d rules, it can happen that code points that are in the upper
@@ -420,7 +420,7 @@ struct regnode_ssc {
  *      A swash could be created for this case, but this is relatively common,
  *      and it turns out that it's all or nothing:  if any one of these code
  *      points matches, they all do.  Hence a single bit suffices.  We use a
- *      shared bit that doesn't take up space by itself:
+ *      shared flag that doesn't take up space by itself:
  *      ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER.
  *      This also implies 1), with one exception: [:^cntrl:].
  *  5)  A user-defined \p{} property may not have been defined by the time the
@@ -433,11 +433,11 @@ struct regnode_ssc {
  *      is a better way to accomplish what this feature does.  This case also
  *      implies 1).
  *      ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP
- *      is the shared bit.
+ *      is the shared flag.
  *  6)  /[foo]/il may have folds that are only valid if the runtime locale is a
  *      UTF-8 one.  These are quite rare, so it would be good to avoid the
  *      expense of looking for them.  But /l matching is slow anyway, and we've
- *      traditionally not worried to much about its performance.  And this
+ *      traditionally not worried too much about its performance.  And this
  *      condition requires the ANYOF_LOC_FOLD flag to be set, so testing for
  *      that flag would be sufficient to rule out most cases of this.  So it is
  *      unclear if this should have a flag or not.  But, one is currently
@@ -445,13 +445,13 @@ struct regnode_ssc {
  *      text below indicates how to share it, should another bit be needed).
  *
  * At the moment, there are no spare bits, but this could be changed by various
- * tricks.  Notice that item 6) is not independent of the ANYOF_LOC_FOLD flag
- * below.  Also, the ANYOF_LOC_REQ_UTF8 flag is set only if both these aren't.
- * We can therefore use a 2-bit field to represent these 3 flags, as follows:
- *      00  => ANYOF_LOC_REQ_UTF8
- *      01  => no folding
- *      10  => ANYOF_LOC_FOLD alone
- *      11  => ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES
+ * tricks.
+ *
+ * Note that item ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES is not independent of the
+ * ANYOF_LOC_FOLD flag below.  Also, the ANYOF_LOC_REQ_UTF8 flag is set only if
+ * both these aren't.  We can therefore share ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES
+ * with ANYOF_LOC_REQ_UTF8, so what the shared flag means depends on the
+ * ANYOF_LOC_FOLD flag.
  *
  * Beyond that, note that the information may be conveyed by creating new
  * regnode types.  This is not the best solution, as shown later in this
@@ -459,31 +459,30 @@ struct regnode_ssc {
  * for ANYOF_INVERT, for example.  A complication of this is that the regexec.c
  * REGINCLASS macro assumes that it can just use the bitmap if no flags are
  * set.  This would have to be changed to add extra tests for the node type, or
- * a special bit reserved that means unspecified special handling, and then the
+ * a special flag reserved that means unspecified special handling, and then 
the
  * node-type would be used internally to sort that out.  So we could gain a bit
- * by having an ANYOF_SPECIAL bit, and a node type for INVERT, and another for
+ * by having an ANYOF_SPECIAL flag, and a node type for INVERT, and another for
  * POSIXL, and still another for INVERT_POSIXL.  This example illustrates one
  * problem with this, a combinatorial explosion of node types.  The one node
  * type khw can think of that doesn't have this explosion issue is
- * ANYOF_LOC_REQ_UTF8, but you'd do this only if you haven't done the 2-bit
- * field trick above.  This bit is a natural candidate for being a separate
+ * ANYOF_LOC_REQ_UTF8.  This flag is a natural candidate for being a separate
  * node type because it is a specialization of the current ANYOFL, and because
- * no other ANYOFL-only bits are set when it is; also most of its uses are
+ * no other ANYOFL-only flags are set when it is; also most of its uses are
  * actually outside the reginclass() function, so this could be done with no
- * performance penalty.  But again, the 2-bit field trick combines this bit so
- * it doesn't take up space anyway.  Another issue when turning a bit into a
- * node type, is that a SSC may use that bit -- not just a regular ANYOF[DL]?.
- * In the case of ANYOF_LOC_REQ_UTF8, the only likely problem is accurately
- * settting the SSC node-type to the new one, which would likely involve
- * S_ssc_or and S_ssc_and, and not how the SSC currently gets set to ANYOFL.
+ * performance penalty.  But since it can be shared, as noted above, it doesn't
+ * take up space anyway.  Another issue when turning a flag into a node type, 
is
+ * that a SSC may use that flag -- not just a regular ANYOF[DL]?.  In the case
+ * of ANYOF_LOC_REQ_UTF8, the only likely problem is accurately settting the
+ * SSC node-type to the new one, which would likely involve S_ssc_or and
+ * S_ssc_and, and not how the SSC currently gets set to ANYOFL.
  *
- * Another possibility is to instead rename the ANYOF_POSIXL bit to be
+ * Another possibility is to instead rename the ANYOF_POSIXL flag to be
  * ANYOFL_LARGE, to mean that the ANYOF node has an extra 32 bits beyond what a
  * regular one does.  That's what it effectively means now, with the extra
- * space all for the POSIX class bits.  But those classes actually only occupy
- * 30 bits, so the 2-bit field or 2 of the locale bits could be moved to that
- * extra space.  The downside of this is that ANYOFL nodes with whichever of
- * the bits get moved would have to have the extra space always allocated.
+ * space all for the POSIX class flags.  But those classes actually only occupy
+ * 30 bits, so 2 of the locale flags could be moved to that extra space.  The
+ * downside of this is that ANYOFL nodes with whichever of the flags get moved
+ * would have to have the extra space always allocated.
  *
  * One could completely remove ANYOFL_LARGE and make all ANYOFL nodes large.
  * The 30 bits in the extra word would indicate if a posix class should be
@@ -491,12 +490,12 @@ struct regnode_ssc {
  * anyway, and the SSC could be set to this node type.   REGINCLASS would have
  * to be modified so that if the node type were this, it would call
  * reginclass(), as the flag bit that indicates to do this now would be gone.
- * If the 2-bit field is used and moved to the larger structure, this would
- * free up a total of 4 bits.  If this were done, we could create an
- * ANYOF_INVERT node-type without a combinatorial explosion, getting us to 5
- * bits.  And, keep in mind that ANYOF_MATCHES_ALL_ABOVE_BITMAP is solely for
- * performance, so could be removed.  The other performance-related bits are
- * shareable with bits that are required.
+ * If 2 locale flags are moved to the larger structure, this would free up a
+ * total of 4 bits.  If this were done, we could create an ANYOF_INVERT
+ * node-type without a combinatorial explosion, getting us to 5 bits.  And,
+ * keep in mind that ANYOF_MATCHES_ALL_ABOVE_BITMAP is solely for performance,
+ * so could be removed.  The other performance-related flags are shareable with
+ * flags that are required.
  *
  * Several flags are not used in synthetic start class (SSC) nodes, so could be
  * shared should new flags be needed for SSCs, like SSC_MATCHES_EMPTY_STRING
diff --git a/t/lib/warnings/regcomp b/t/lib/warnings/regcomp
index b9943a0..044e02f 100644
--- a/t/lib/warnings/regcomp
+++ b/t/lib/warnings/regcomp
@@ -20,7 +20,7 @@ EXPECT
 Non-octal character '8'.  Resolved as "\o{123}" at - line 3.
 Non-octal character '8'.  Resolved as "\o{654}" at - line 4.
 ########
-# regcomp.c.c
+# regcomp.c
 BEGIN {
     if (ord('A') == 193) {
         print "SKIPPED\n# Different results on EBCDIC";
@@ -36,3 +36,19 @@ $a = qr/[\c,]/;
 EXPECT
 "\c," is more clearly written simply as "l" at - line 9.
 "\c," is more clearly written simply as "l" at - line 10.
+########
+# This is because currently a different error is output under
+# use re 'strict', so can't go in reg_mesg.t
+# NAME perl #126261, error message causes segfault
+# OPTION fatal
+ qr/abc[\x{df}[.00./i
+EXPECT
+Unmatched [ in regex; marked by <-- HERE in m/abc[ <-- HERE \x{df}[.00./ at - 
line 4.
+########
+# NAME perl #126261, with 'use utf8'
+# OPTION fatal
+use utf8;
+no warnings 'utf8';
+qr/abc[ï¬[.00./i;
+EXPECT
+Unmatched [ in regex; marked by <-- HERE in m/abc[ <-- HERE ï¬[.00./ at - 
line 4.
diff --git a/utf8.h b/utf8.h
index c57576b..1ed8fd8 100644
--- a/utf8.h
+++ b/utf8.h
@@ -465,11 +465,13 @@ only) byte is pointed to by C<s>.
  * (which works for code points up through 0xFF) or NATIVE_TO_UNI which works
  * for any code point */
 #define __BASE_TWO_BYTE_HI(c, translate_function)                              
 \
+           (__ASSERT_(! UVCHR_IS_INVARIANT(c))                                 
 \
             I8_TO_NATIVE_UTF8((translate_function(c) >> 
UTF_ACCUMULATION_SHIFT) \
-                              | UTF_START_MARK(2))
+                              | UTF_START_MARK(2)))
 #define __BASE_TWO_BYTE_LO(c, translate_function)                              
 \
+             (__ASSERT_(! UVCHR_IS_INVARIANT(c))                               
 \
               I8_TO_NATIVE_UTF8((translate_function(c) & 
UTF_CONTINUATION_MASK) \
-                                 | UTF_CONTINUATION_MARK)
+                                 | UTF_CONTINUATION_MARK))
 
 /* The next two macros should not be used.  They were designed to be usable as
  * the case label of a switch statement, but this doesn't work for EBCDIC.  Use

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.23.6-18-g285b5ca

Reply via email to