[perl.git] branch blead, updated. v5.23.7-410-gfc2ed17

Karl Williamson Thu, 18 Feb 2016 19:44:34 -0800

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/fc2ed17c144434ee3dde93c77bd1cd59b52da134?hp=3a3d108484629fe5b421976b8d6fd6f280a1f97a>


- Log -----------------------------------------------------------------
commit fc2ed17c144434ee3dde93c77bd1cd59b52da134
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 17 21:17:46 2016 -0700

    regcomp.c: optimization for qr/[...]/il
    
    Certain matches are calculated as being legal only when the current
    execution time local is a UTF-8 one.  However, a character class can
    have multiple components (and usually does), and some of those components
    may be duplicates of some of these matches, and be valid regardless of
    the locale.  This commit removes them from the tentative list, and if it
    goes to zero, clears it.  This will improve execution time slightly.

M       regcomp.c

commit 7415a9c182cfb100de91c5cb7eb95ea63b23c7c5
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 17 21:15:52 2016 -0700

    regcomp.c: Avoid a segfault
    
    I stumbled across this in adding more code elsewhere, so I don't know
    how to trigger it.  This is in the intersection routine for two
    inversion lists.  The corresponding union code correctly handles the
    case when the input is NULL, so just copy that to here.

M       regcomp.c

commit ac33c516140ee213a8a20ada506f97b3a7776ae4
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 13 15:35:11 2016 -0700

    PATCH: [perl 127537] /\W/ regression with UTF-8
    
    This bug is apparently uncommon in the field, as I was the one who
    discovered it.  It requires a UTF-8 pattern containing a complemented
    posix class, like \W or \S, in an inverted character class, like
    [^\Wfoo] in a pattern that also has a synthetic start class generated by
    the regex optimizer for it .
    
    The fix is trivial.

M       pod/perldelta.pod
M       regcomp.c
M       t/re/re_tests

commit ce54a8b9b1353b2e7e84528e499a996fb0697a95
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 13 11:53:50 2016 -0700

    regcomp.c, toke.c: swap functions being inline static
    
    grok_bslash_x() is so large that no compiler will inline it.  Move it to
    dquote.c from dq_inline.c.  Conversely, move form_octal_warning() to
    dq_inline.c.  It is so tiny that the function call overhead is scarcely
    smaller than the function body.
    
    This also moves things in embed.fnc so all these functions.  are not
    visible outside the few files they are supposed to be used in.

M       dquote.c
M       dquote_inline.h
M       embed.fnc
M       embed.h
M       proto.h
-----------------------------------------------------------------------

Summary of changes:
 dquote.c          | 134 ++++++++++++++++++++++++++++++++++++++++++++---------
 dquote_inline.h   | 135 ++++++++++--------------------------------------------
 embed.fnc         |  12 ++---
 embed.h           |  10 ++--
 pod/perldelta.pod |   8 ++++
 proto.h           |  31 ++++++-------
 regcomp.c         |  25 ++++++++--
 t/re/re_tests     |   2 +
 8 files changed, 192 insertions(+), 165 deletions(-)

diff --git a/dquote.c b/dquote.c
index 895f17d..e02308e 100644
--- a/dquote.c
+++ b/dquote.c
@@ -8,6 +8,7 @@
 #include "EXTERN.h"
 #define PERL_IN_DQUOTE_C
 #include "perl.h"
+#include "dquote_inline.h"
 
 /* XXX Add documentation after final interface and behavior is decided */
 /* May want to show context for error, so would pass S_grok_bslash_c(pTHX_ 
const char* current, const char* start, const bool output_warning)
@@ -161,33 +162,124 @@ Perl_grok_bslash_o(pTHX_ char **s, UV *uv, const char** 
error_msg,
     return TRUE;
 }
 
-char*
-Perl_form_short_octal_warning(pTHX_
-                           const char * const s, /* Points to first non-octal 
*/
-                           const STRLEN len      /* Length of octals string, so
-                                                    (s-len) points to first
-                                                    octal */
-) {
-    /* Return a character string consisting of a warning message for when a
-     * string constant in octal is weird, like "\078".  */
+bool
+Perl_grok_bslash_x(pTHX_ char **s, UV *uv, const char** error_msg,
+                      const bool output_warning, const bool strict,
+                      const bool silence_non_portable,
+                      const bool UTF)
+{
 
-    const char * sans_leading_zeros = s - len;
+/*  Documentation to be supplied when interface nailed down finally
+ *  This returns FALSE if there is an error which the caller need not recover
+ *  from; otherwise TRUE.
+ *  It guarantees that the returned codepoint, *uv, when expressed as
+ *  utf8 bytes, would fit within the skipped "\x{...}" bytes.
+ *
+ *  On input:
+ *     s   is the address of a pointer to a NULL terminated string that begins
+ *         with 'x', and the previous character was a backslash.  At exit, *s
+ *         will be advanced to the byte just after those absorbed by this
+ *         function.  Hence the caller can continue parsing from there.  In
+ *         the case of an error, this routine has generally positioned *s to
+ *         point just to the right of the first bad spot, so that a message
+ *         that has a "<--" to mark the spot will be correctly positioned.
+ *     uv  points to a UV that will hold the output value, valid only if the
+ *         return from the function is TRUE
+ *      error_msg is a pointer that will be set to an internal buffer giving an
+ *         error message upon failure (the return is FALSE).  Untouched if
+ *         function succeeds
+ *     output_warning says whether to output any warning messages, or suppress
+ *         them
+ *     strict is true if anything out of the ordinary should cause this to
+ *         fail instead of warn or be silent.  For example, it requires
+ *         exactly 2 digits following the \x (when there are no braces).
+ *         3 digits could be a mistake, so is forbidden in this mode.
+ *      silence_non_portable is true if to suppress warnings about the code
+ *          point returned being too large to fit on all platforms.
+ *     UTF is true iff the string *s is encoded in UTF-8.
+ */
+    char* e;
+    STRLEN numbers_len;
+    I32 flags = PERL_SCAN_DISALLOW_PREFIX;
+#ifdef DEBUGGING
+    char *start = *s - 1;
+    assert(*start == '\\');
+#endif
 
-    PERL_ARGS_ASSERT_FORM_SHORT_OCTAL_WARNING;
+    PERL_ARGS_ASSERT_GROK_BSLASH_X;
 
-    assert(*s == '8' || *s == '9');
+    assert(**s == 'x');
+    (*s)++;
 
-    /* Remove the leading zeros, retaining one zero so won't be zero length */
-    while (*sans_leading_zeros == '0') sans_leading_zeros++;
-    if (sans_leading_zeros == s) {
-        sans_leading_zeros--;
+    if (strict || ! output_warning) {
+        flags |= PERL_SCAN_SILENT_ILLDIGIT;
     }
 
-    return Perl_form(aTHX_
-                     "'%.*s' resolved to '\\o{%.*s}%c'",
-                     (int) (len + 2), s - len - 1,
-                     (int) (s - sans_leading_zeros), sans_leading_zeros,
-                     *s);
+    if (**s != '{') {
+        STRLEN len = (strict) ? 3 : 2;
+
+       *uv = grok_hex(*s, &len, &flags, NULL);
+       *s += len;
+        if (strict && len != 2) {
+            if (len < 2) {
+                *s += (UTF) ? UTF8SKIP(*s) : 1;
+                *error_msg = "Non-hex character";
+            }
+            else {
+                *error_msg = "Use \\x{...} for more than two hex characters";
+            }
+            return FALSE;
+        }
+       return TRUE;
+    }
+
+    e = strchr(*s, '}');
+    if (!e) {
+        (*s)++;  /* Move past the '{' */
+        while (isXDIGIT(**s)) { /* Position beyond the legal digits */
+            (*s)++;
+        }
+        /* XXX The corresponding message above for \o is just '\\o{'; other
+         * messages for other constructs include the '}', so are inconsistent.
+         */
+       *error_msg = "Missing right brace on \\x{}";
+       return FALSE;
+    }
+
+    (*s)++;    /* Point to expected first digit (could be first byte of utf8
+                  sequence if not a digit) */
+    numbers_len = e - *s;
+    if (numbers_len == 0) {
+        if (strict) {
+            (*s)++;    /* Move past the } */
+            *error_msg = "Number with no digits";
+            return FALSE;
+        }
+        *s = e + 1;
+        *uv = 0;
+        return TRUE;
+    }
+
+    flags |= PERL_SCAN_ALLOW_UNDERSCORES;
+    if (silence_non_portable) {
+        flags |= PERL_SCAN_SILENT_NON_PORTABLE;
+    }
+
+    *uv = grok_hex(*s, &numbers_len, &flags, NULL);
+    /* Note that if has non-hex, will ignore everything starting with that up
+     * to the '}' */
+
+    if (strict && numbers_len != (STRLEN) (e - *s)) {
+        *s += numbers_len;
+        *s += (UTF) ? UTF8SKIP(*s) : 1;
+        *error_msg = "Non-hex character";
+        return FALSE;
+    }
+
+    /* Return past the '}' */
+    *s = e + 1;
+
+    return TRUE;
 }
 
 /*
diff --git a/dquote_inline.h b/dquote_inline.h
index 050b14f..1c7694d 100644
--- a/dquote_inline.h
+++ b/dquote_inline.h
@@ -33,124 +33,35 @@ S_regcurly(const char *s)
     return *s == '}';
 }
 
-PERL_STATIC_INLINE bool
-S_grok_bslash_x(pTHX_ char **s, UV *uv, const char** error_msg,
-                      const bool output_warning, const bool strict,
-                      const bool silence_non_portable,
-                      const bool UTF)
+/* This is inline not for speed, but because it is so tiny */
+
+PERL_STATIC_INLINE char*
+S_form_short_octal_warning(pTHX_
+                           const char * const s, /* Points to first non-octal 
*/
+                           const STRLEN len      /* Length of octals string, so
+                                                    (s-len) points to first
+                                                    octal */
+)
 {
+    /* Return a character string consisting of a warning message for when a
+     * string constant in octal is weird, like "\078".  */
 
-/*  Documentation to be supplied when interface nailed down finally
- *  This returns FALSE if there is an error which the caller need not recover
- *  from; otherwise TRUE.
- *  It guarantees that the returned codepoint, *uv, when expressed as
- *  utf8 bytes, would fit within the skipped "\x{...}" bytes.
- *
- *  On input:
- *     s   is the address of a pointer to a NULL terminated string that begins
- *         with 'x', and the previous character was a backslash.  At exit, *s
- *         will be advanced to the byte just after those absorbed by this
- *         function.  Hence the caller can continue parsing from there.  In
- *         the case of an error, this routine has generally positioned *s to
- *         point just to the right of the first bad spot, so that a message
- *         that has a "<--" to mark the spot will be correctly positioned.
- *     uv  points to a UV that will hold the output value, valid only if the
- *         return from the function is TRUE
- *      error_msg is a pointer that will be set to an internal buffer giving an
- *         error message upon failure (the return is FALSE).  Untouched if
- *         function succeeds
- *     output_warning says whether to output any warning messages, or suppress
- *         them
- *     strict is true if anything out of the ordinary should cause this to
- *         fail instead of warn or be silent.  For example, it requires
- *         exactly 2 digits following the \x (when there are no braces).
- *         3 digits could be a mistake, so is forbidden in this mode.
- *      silence_non_portable is true if to suppress warnings about the code
- *          point returned being too large to fit on all platforms.
- *     UTF is true iff the string *s is encoded in UTF-8.
- */
-    char* e;
-    STRLEN numbers_len;
-    I32 flags = PERL_SCAN_DISALLOW_PREFIX;
-#ifdef DEBUGGING
-    char *start = *s - 1;
-    assert(*start == '\\');
-#endif
-
-    PERL_ARGS_ASSERT_GROK_BSLASH_X;
+    const char * sans_leading_zeros = s - len;
 
-    assert(**s == 'x');
-    (*s)++;
-
-    if (strict || ! output_warning) {
-        flags |= PERL_SCAN_SILENT_ILLDIGIT;
-    }
+    PERL_ARGS_ASSERT_FORM_SHORT_OCTAL_WARNING;
 
-    if (**s != '{') {
-        STRLEN len = (strict) ? 3 : 2;
+    assert(*s == '8' || *s == '9');
 
-       *uv = grok_hex(*s, &len, &flags, NULL);
-       *s += len;
-        if (strict && len != 2) {
-            if (len < 2) {
-                *s += (UTF) ? UTF8SKIP(*s) : 1;
-                *error_msg = "Non-hex character";
-            }
-            else {
-                *error_msg = "Use \\x{...} for more than two hex characters";
-            }
-            return FALSE;
-        }
-       return TRUE;
+    /* Remove the leading zeros, retaining one zero so won't be zero length */
+    while (*sans_leading_zeros == '0') sans_leading_zeros++;
+    if (sans_leading_zeros == s) {
+        sans_leading_zeros--;
     }
 
-    e = strchr(*s, '}');
-    if (!e) {
-        (*s)++;  /* Move past the '{' */
-        while (isXDIGIT(**s)) { /* Position beyond the legal digits */
-            (*s)++;
-        }
-        /* XXX The corresponding message above for \o is just '\\o{'; other
-         * messages for other constructs include the '}', so are inconsistent.
-         */
-       *error_msg = "Missing right brace on \\x{}";
-       return FALSE;
-    }
-
-    (*s)++;    /* Point to expected first digit (could be first byte of utf8
-                  sequence if not a digit) */
-    numbers_len = e - *s;
-    if (numbers_len == 0) {
-        if (strict) {
-            (*s)++;    /* Move past the } */
-            *error_msg = "Number with no digits";
-            return FALSE;
-        }
-        *s = e + 1;
-        *uv = 0;
-        return TRUE;
-    }
-
-    flags |= PERL_SCAN_ALLOW_UNDERSCORES;
-    if (silence_non_portable) {
-        flags |= PERL_SCAN_SILENT_NON_PORTABLE;
-    }
-
-    *uv = grok_hex(*s, &numbers_len, &flags, NULL);
-    /* Note that if has non-hex, will ignore everything starting with that up
-     * to the '}' */
-
-    if (strict && numbers_len != (STRLEN) (e - *s)) {
-        *s += numbers_len;
-        *s += (UTF) ? UTF8SKIP(*s) : 1;
-        *error_msg = "Non-hex character";
-        return FALSE;
-    }
-
-    /* Return past the '}' */
-    *s = e + 1;
-
-    return TRUE;
+    return Perl_form(aTHX_
+                     "'%.*s' resolved to '\\o{%.*s}%c'",
+                     (int) (len + 2), s - len - 1,
+                     (int) (s - sans_leading_zeros), sans_leading_zeros,
+                     *s);
 }
-
 #endif  /* DQUOTE_INLINE_H */
diff --git a/embed.fnc b/embed.fnc
index f5ace28..a2cad1a 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -801,14 +801,13 @@ Ap        |void   |vload_module|U32 flags|NN SV* 
name|NULLOK SV* ver|NULLOK va_list* args
 p      |OP*    |localize       |NN OP *o|I32 lex
 ApdR   |I32    |looks_like_number|NN SV *const sv
 Apd    |UV     |grok_bin       |NN const char* start|NN STRLEN* len_p|NN I32* 
flags|NULLOK NV *result
-#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_TOKE_C)
-EMiR   |bool   |grok_bslash_x  |NN char** s|NN UV* uv           \
+#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_TOKE_C) || 
defined(PERL_IN_DQUOTE_C)
+EMpRX  |bool   |grok_bslash_x  |NN char** s|NN UV* uv           \
                                |NN const char** error_msg       \
                                |const bool output_warning       \
                                |const bool strict               \
                                |const bool silence_non_portable \
                                |const bool utf8
-#endif
 EMpRX  |char   |grok_bslash_c  |const char source|const bool output_warning
 EMpRX  |bool   |grok_bslash_o  |NN char** s|NN UV* uv           \
                                |NN const char** error_msg       \
@@ -816,8 +815,10 @@ EMpRX      |bool   |grok_bslash_o  |NN char** s|NN UV* uv  
         \
                                |const bool strict               \
                                |const bool silence_non_portable \
                                |const bool utf8
-EMpPRX |char*|form_short_octal_warning|NN const char * const s  \
+EMiR   |char*|form_short_octal_warning|NN const char * const s  \
                                |const STRLEN len
+EiPRn  |I32    |regcurly       |NN const char *s
+#endif
 Apd    |UV     |grok_hex       |NN const char* start|NN STRLEN* len_p|NN I32* 
flags|NULLOK NV *result
 Apd    |int    |grok_infnan    |NN const char** sp|NN const char *send
 Apd    |int    |grok_number    |NN const char *pv|STRLEN len|NULLOK UV *valuep
@@ -1199,9 +1200,6 @@ Ap        |char*  |re_intuit_start|NN REGEXP * const rx \
                                |const U32 flags \
                                |NULLOK re_scream_pos_data *data
 Ap     |SV*    |re_intuit_string|NN REGEXP  *const r
-#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_TOKE_C)
-EiPRn  |I32    |regcurly       |NN const char *s
-#endif
 Ap     |I32    |regexec_flags  |NN REGEXP *const rx|NN char *stringarg \
                                |NN char *strend|NN char *strbeg \
                                |SSize_t minend|NN SV *sv \
diff --git a/embed.h b/embed.h
index a12a3e6..eb8ffa5 100644
--- a/embed.h
+++ b/embed.h
@@ -928,10 +928,7 @@
 #define av_reify(a)            Perl_av_reify(aTHX_ a)
 #define current_re_engine()    Perl_current_re_engine(aTHX)
 #define cv_ckproto_len_flags(a,b,c,d,e)        Perl_cv_ckproto_len_flags(aTHX_ 
a,b,c,d,e)
-#define form_short_octal_warning(a,b)  Perl_form_short_octal_warning(aTHX_ a,b)
 #define grok_atoUV             Perl_grok_atoUV
-#define grok_bslash_c(a,b)     Perl_grok_bslash_c(aTHX_ a,b)
-#define grok_bslash_o(a,b,c,d,e,f,g)   Perl_grok_bslash_o(aTHX_ a,b,c,d,e,f,g)
 #define mg_find_mglob(a)       Perl_mg_find_mglob(aTHX_ a)
 #define multideref_stringify(a,b)      Perl_multideref_stringify(aTHX_ a,b)
 #define op_clear(a)            Perl_op_clear(aTHX_ a)
@@ -1072,8 +1069,11 @@
 #  if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C) || 
defined(PERL_IN_UTF8_C) || defined(PERL_IN_TOKE_C)
 #define _core_swash_init(a,b,c,d,e,f,g)        Perl__core_swash_init(aTHX_ 
a,b,c,d,e,f,g)
 #  endif
-#  if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_TOKE_C)
-#define grok_bslash_x(a,b,c,d,e,f,g)   S_grok_bslash_x(aTHX_ a,b,c,d,e,f,g)
+#  if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_TOKE_C) || 
defined(PERL_IN_DQUOTE_C)
+#define form_short_octal_warning(a,b)  S_form_short_octal_warning(aTHX_ a,b)
+#define grok_bslash_c(a,b)     Perl_grok_bslash_c(aTHX_ a,b)
+#define grok_bslash_o(a,b,c,d,e,f,g)   Perl_grok_bslash_o(aTHX_ a,b,c,d,e,f,g)
+#define grok_bslash_x(a,b,c,d,e,f,g)   Perl_grok_bslash_x(aTHX_ a,b,c,d,e,f,g)
 #define regcurly               S_regcurly
 #  endif
 #  if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_UTF8_C)
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index 55db093..9c64130 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -577,6 +577,14 @@ specification is very close to one of the 14 legal POSIX 
classes.  (See
 L<perlrecharclass/POSIX Character Classes>.)
 [perl #8904]
 
+=item *
+
+Certain regex patterns involving a complemented posix class in an
+inverted bracketed character class, and matching something else
+optionally would improperly fail to match.  An example of one that could
+fail is C</qr/_?[^\Wbar]\x{100}/>.  This has been fixed.
+[perl #127537]
+
 =back
 
 =head1 Known Problems
diff --git a/proto.h b/proto.h
index c3adf2d..8f2b730 100644
--- a/proto.h
+++ b/proto.h
@@ -880,12 +880,6 @@ PERL_CALLCONV char*        Perl_form(pTHX_ const char* 
pat, ...)
 #define PERL_ARGS_ASSERT_FORM  \
        assert(pat)
 
-PERL_CALLCONV char*    Perl_form_short_octal_warning(pTHX_ const char * const 
s, const STRLEN len)
-                       __attribute__warn_unused_result__
-                       __attribute__pure__;
-#define PERL_ARGS_ASSERT_FORM_SHORT_OCTAL_WARNING      \
-       assert(s)
-
 PERL_CALLCONV void     Perl_free_tied_hv_pool(pTHX);
 PERL_CALLCONV void     Perl_free_tmps(pTHX);
 PERL_CALLCONV AV*      Perl_get_av(pTHX_ const char *name, I32 flags);
@@ -947,14 +941,6 @@ PERL_CALLCONV bool Perl_grok_atoUV(const char* pv, UV* 
valptr, const char** endp
 PERL_CALLCONV UV       Perl_grok_bin(pTHX_ const char* start, STRLEN* len_p, 
I32* flags, NV *result);
 #define PERL_ARGS_ASSERT_GROK_BIN      \
        assert(start); assert(len_p); assert(flags)
-PERL_CALLCONV char     Perl_grok_bslash_c(pTHX_ const char source, const bool 
output_warning)
-                       __attribute__warn_unused_result__;
-
-PERL_CALLCONV bool     Perl_grok_bslash_o(pTHX_ char** s, UV* uv, const char** 
error_msg, const bool output_warning, const bool strict, const bool 
silence_non_portable, const bool utf8)
-                       __attribute__warn_unused_result__;
-#define PERL_ARGS_ASSERT_GROK_BSLASH_O \
-       assert(s); assert(uv); assert(error_msg)
-
 PERL_CALLCONV UV       Perl_grok_hex(pTHX_ const char* start, STRLEN* len_p, 
I32* flags, NV *result);
 #define PERL_ARGS_ASSERT_GROK_HEX      \
        assert(start); assert(len_p); assert(flags)
@@ -4983,8 +4969,21 @@ PERL_CALLCONV SV*        Perl__core_swash_init(pTHX_ 
const char* pkg, const char* name,
 #define PERL_ARGS_ASSERT__CORE_SWASH_INIT      \
        assert(pkg); assert(name); assert(listsv)
 #endif
-#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_TOKE_C)
-PERL_STATIC_INLINE bool        S_grok_bslash_x(pTHX_ char** s, UV* uv, const 
char** error_msg, const bool output_warning, const bool strict, const bool 
silence_non_portable, const bool utf8)
+#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_TOKE_C) || 
defined(PERL_IN_DQUOTE_C)
+PERL_STATIC_INLINE char*       S_form_short_octal_warning(pTHX_ const char * 
const s, const STRLEN len)
+                       __attribute__warn_unused_result__;
+#define PERL_ARGS_ASSERT_FORM_SHORT_OCTAL_WARNING      \
+       assert(s)
+
+PERL_CALLCONV char     Perl_grok_bslash_c(pTHX_ const char source, const bool 
output_warning)
+                       __attribute__warn_unused_result__;
+
+PERL_CALLCONV bool     Perl_grok_bslash_o(pTHX_ char** s, UV* uv, const char** 
error_msg, const bool output_warning, const bool strict, const bool 
silence_non_portable, const bool utf8)
+                       __attribute__warn_unused_result__;
+#define PERL_ARGS_ASSERT_GROK_BSLASH_O \
+       assert(s); assert(uv); assert(error_msg)
+
+PERL_CALLCONV bool     Perl_grok_bslash_x(pTHX_ char** s, UV* uv, const char** 
error_msg, const bool output_warning, const bool strict, const bool 
silence_non_portable, const bool utf8)
                        __attribute__warn_unused_result__;
 #define PERL_ARGS_ASSERT_GROK_BSLASH_X \
        assert(s); assert(uv); assert(error_msg)
diff --git a/regcomp.c b/regcomp.c
index 5dbccfb..a30f2a2 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -1420,8 +1420,10 @@ S_get_ANYOF_cp_list_for_ssc(pTHX_ const RExC_state_t 
*pRExC_state,
     }
 
     /* If this can match all upper Latin1 code points, have to add them
-     * as well */
-    if (OP(node) == ANYOFD
+     * as well.  But don't add them if inverting, as when that gets done below,
+     * it would exclude all these characters, including the ones it shouldn't
+     * that were added just above */
+    if (! (ANYOF_FLAGS(node) & ANYOF_INVERT) && OP(node) == ANYOFD
         && (ANYOF_FLAGS(node) & 
ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER))
     {
         _invlist_union(invlist, PL_UpperLatin1, &invlist);
@@ -9115,8 +9117,10 @@ Perl__invlist_intersection_maybe_complement_2nd(pTHX_ 
SV* const a, SV* const b,
         /* Here, 'a' or 'b' is empty and not using the complement of 'b'.  The
          * intersection must be empty */
        if (*i == a) {
-            if (! (make_temp = cBOOL(SvTEMP(a)))) {
-                SvREFCNT_dec_NN(a);
+            if (a != NULL) {
+                if (! (make_temp = cBOOL(SvTEMP(a)))) {
+                    SvREFCNT_dec_NN(a);
+                }
             }
        }
        else if (*i == b) {
@@ -16980,6 +16984,19 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
      * fetching).  We know to set the flag if we have a non-NULL list for UTF-8
      * locales, or the class matches at least one 0-255 range code point */
     if (LOC && FOLD) {
+
+        /* Some things on the list might be unconditionally included because of
+         * other components.  Remove them, and clean up the list if it goes to
+         * 0 elements */
+        if (only_utf8_locale_list && cp_list) {
+            _invlist_subtract(only_utf8_locale_list, cp_list,
+                              &only_utf8_locale_list);
+
+            if (_invlist_len(only_utf8_locale_list) == 0) {
+                SvREFCNT_dec_NN(only_utf8_locale_list);
+                only_utf8_locale_list = NULL;
+            }
+        }
         if (only_utf8_locale_list) {
             ANYOF_FLAGS(ret)
                  |=  ANYOFL_FOLD
diff --git a/t/re/re_tests b/t/re/re_tests
index b226123..d32b031 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1615,6 +1615,8 @@ a(.)\4294967298   ab\o{42}94967298        ya      $1      
b       \d not converted to native; \o{} is
 ^m?(\d)(.*)\1$ 5b5     y       $1      5
 ^m?(\d)(.*)\1$ aba     n       -       -
 
+^_?[^\W_0-9]\w\z       \xAA\x{100}     y       $&      \xAA\x{100}             
[perl #127537]
+
 # 17F is 'Long s';  This makes sure the a's in /aa can be separate
 /s/ai  \x{17F} y       $&      \x{17F}
 /s/aia \x{17F} n       -       -

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.23.7-410-gfc2ed17

Reply via email to