[perl.git] branch blead, updated. v5.23.5-246-g709be74

Karl Williamson Thu, 17 Dec 2015 21:15:13 -0800

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/709be747a32edc503b4645d9c5396bd4b40100d2?hp=e244340e4067bb332773529bbff797bd14f103de>


- Log -----------------------------------------------------------------
commit 709be747a32edc503b4645d9c5396bd4b40100d2
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 17 10:22:44 2015 -0700

    Optimize some qr/[...]/ classes
    
    Bracketed character classes generally generate an ANYOF-type regnode,
    which consists of a bitmap for the lower code points, and an inversion
    list or swash to handle ones not in the bitmap.  They take up more
    memory than other regnode types.  There are already some optimizations
    that use a smaller and/or faster regnode instead.  For example, some
    people prefer not to use a backslash to escape metacharacters, instead
    writing something like /abc[.]def/.  This has for some time generated
    the same thing as /abc\.def/ does, namely a single EXACT node, which is
    both smaller and faster than an ANYOF node in the middle of two EXACT
    nodes.
    
    This commit adds some optimizations that hadn't been done previously.
    Now things like /[\p{Word}]/ will optimize to \w, for example.  I had
    not done this before, because my tests had shown very little performance
    difference, but I had added most of the code to regcomp.c so it wouldn't
    get lost, #ifdef'd out.
    
    It turns out that I hadn't tested on code points above the bitmap, which
    with this commit have a small, but appreciable speed up in matching, so
    this commit enables and finishes that code.
    
    Prior to this commit, things like /[[:word:]]/ were optimized to \w, but
    things like /[_[:word:]]/ were not.  This commit fixes that.
    
    If the following command is run on a perl compiled with -O2 and no
    DEBUGGING:
    
        blead Porting/bench.pl --raw --benchfile=charclass_perf 
--perlargs=-Ilib /path_to_prior_perl="before this commit" 
/path_to_this_perl=after
    
    and the file 'charclass_perf' contains
        [
            'regex::charclass::ascii' => {
                desc    => 'charclass, ascii range',
                setup   => 'my $a = qr/[\p{Word}]/',
                code    => '"A" =~ $a'
            },
            'regex::charclass::upper_latin1' => {
                desc    => 'charclass, upper latin1 range',
                setup   => 'my $a = qr/[\p{Word}]/',
                code    => '"\x{e0}" =~ $a'
            },
            'regex::charclass::above_latin1' => {
                desc    => 'charclass, above latin1 range',
                setup   => 'my $a = qr/[\p{Word}]/',
                code    => '"\x{100}" =~ $a'
            },
            'regex::charclass::high_Unicode' => {
                desc    => 'charclass, high Unicode code point',
                setup   => 'my $a = qr/[\p{Word}]/',
                code    => '"\x{10FFFF}" =~ $a'
            },
        ];
    
    the following results are obtained:
    
    The numbers represent raw counts per loop iteration.
    
    regex::charclass::above_latin1
    charclass, above latin1 range
    
           before this commit    after
           ------------------ --------
        Ir             3344.0   2888.0
        Dr              971.0    855.0
        Dw              604.0    541.0
      COND              575.0    504.0
       IND               25.0     25.0
    
    COND_m               11.0     10.7
     IND_m               10.0     10.0
    
     Ir_m1                8.9      6.0
     Dr_m1                3.0      3.2
     Dw_m1                1.5      1.4
    
     Ir_mm                0.0      0.0
     Dr_mm                0.0      0.0
     Dw_mm                0.0      0.0
    
    regex::charclass::ascii
    charclass, ascii range
    
           before this commit    after
           ------------------ --------
        Ir             2661.0   2649.0
        Dr              798.0    795.0
        Dw              516.0    517.0
      COND              467.0    465.0
       IND               23.0     23.0
    
    COND_m               10.0      8.8
     IND_m               10.0     10.0
    
     Ir_m1                7.9      0.0
     Dr_m1                2.9      3.1
     Dw_m1                1.3      1.3
    
     Ir_mm                0.0      0.0
     Dr_mm                0.0      0.0
     Dw_mm                0.0      0.0
    
    regex::charclass::high_Unicode
    charclass, high Unicode code point
    
           before this commit    after
           ------------------ --------
        Ir             3344.0   2888.0
        Dr              971.0    855.0
        Dw              604.0    541.0
      COND              575.0    504.0
       IND               25.0     25.0
    
    COND_m               11.0     10.7
     IND_m               10.0     10.0
    
     Ir_m1                8.9      6.0
     Dr_m1                3.0      3.2
     Dw_m1                1.5      1.4
    
     Ir_mm                0.0      0.0
     Dr_mm                0.0      0.0
     Dw_mm                0.0      0.0
    
    regex::charclass::upper_latin1
    charclass, upper latin1 range
    
           before this commit    after
           ------------------ --------
        Ir             2661.0   2651.0
        Dr              798.0    796.0
        Dw              516.0    517.0
      COND              467.0    466.0
       IND               23.0     23.0
    
    COND_m               11.0      8.8
     IND_m               10.0     10.0
    
     Ir_m1                7.9      0.0
     Dr_m1                2.9      3.3
     Dw_m1                1.5      1.2
    
     Ir_mm                0.0      0.0
     Dr_mm                0.0      0.0
     Dw_mm                0.0      0.0

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit fe8d1b7c2c8ac6874949446f5ec0fe66157d18dc
Author: Karl Williamson <[email protected]>
Date:   Wed Dec 16 13:24:45 2015 -0700

    regcomp.h: Add comments

M       regcomp.h

commit 8c9b4e639b8164433ff657146b42306a354ce3cf
Author: Karl Williamson <[email protected]>
Date:   Wed Dec 16 12:06:46 2015 -0700

    regex matching: Don't do unnecessary work
    
    This commit sets a flag at pattern compilation time to indicate if
    a rare case is present that requires special handling, so that that
    handling can be avoided unless necessary.

M       regcomp.c
M       regcomp.h
M       regexec.c

commit b3b1cf1722eaa296a49e261c8e670d45491983b5
Author: Karl Williamson <[email protected]>
Date:   Wed Dec 16 11:40:18 2015 -0700

    regcomp.h: Renumber 2 flag bits
    
    This changes the spare bit to be adjacent to the LOC_FOLD bit, in
    preparation for the next commit, which will use that bit for a
    LOC_FOLD-related use.

M       regcomp.h

commit 108316fb65dc7243a1c5d87b4b29068b7d62d32e
Author: Karl Williamson <[email protected]>
Date:   Wed Dec 16 11:05:17 2015 -0700

    regex: Free a ANYOF node bit
    
    This is done by combining 2 mutually exclusive bits into one.  I hadn't
    seen this possibility before because the name of one of them misled me.
    It also misled me into turning on one that flag unnecessarily, and to
    miss opportunities to not have to create a swash at runtime.  This
    commit corrects those things as well.

M       regcomp.c
M       regcomp.h
M       regexec.c

commit 4130e767d71ebdb250e9f52a2eee2f7b9e51af25
Author: Karl Williamson <[email protected]>
Date:   Tue Dec 15 22:42:18 2015 -0700

    regcomp.c: Move comments adjacent to their object

M       regcomp.c

commit 27e95afa2559f6d024333c9d5391a85ba671fefa
Author: Karl Williamson <[email protected]>
Date:   Tue Dec 15 22:20:20 2015 -0700

    regcomp.c: Try simplifications in some qr/[...]/d
    
    Characters in a bracketed character class can come from a bunch of
    sources, all bundled together.  Some things under /d match only when the
    target string is UTF-8; some match only when it isn't UTF-8.  Other
    sources may introduce ones that match regardless.  It may be that some
    things are specified as conditionally matching from one source, and as
    unconditionally matching from another.  We can subtract the
    unconditionals from the conditionals, leaving a simpler set of things
    that must be conditionally matched.  In some cases, the conditional set
    may go to zero, allowing other optimizations to happen that otherwise
    couldn't.  An example is
    
        qr/[\W\xAB]/
    
    which before this commit compiled to:
    
        ANYOFD[^0-9A-Z_a-z\x{80}-\x{AA}\x{AC}-\x{FF}][{non-utf8-latin1-all}
        {utf8}0080-00A9 00AC-00B4 00B6-00B9 00BB-00BF 00D7 00F7
        02C2-02C5...] (12)
    
    and after it, compiles to
    
        ANYOFD[^0-9A-Z_a-z\x{AA}\x{B5}\x{BA}\x{C0}-\x{D6}\x{D8}-\x{F6}
        \x{F8}-\x{FF}][{non-utf8-latin1-all}{utf8}02C2-02C5...] (12)
    
    Notice that the {utf8} component has been stripped of everything below
    256.  That means no swash has to be created at runtime when matching
    code points below 256, unlike the case before this commit.
    
    A starker example, though unlikely in real life except in
    machine-generated code, is
    
        qr/[\w\W]/
    
    Before this commit, it would generate:
    
        ANYOFD[\x{00}-\x{7F}][{non-utf8-latin1-all}{above_bitmap_all}
        {utf8}0080-00FF]
    
    and afterwards, simply:
    
        SANY

M       regcomp.c

commit 1ef7f4929efb9a297a63a5a8b98a5c701f91e931
Author: Karl Williamson <[email protected]>
Date:   Tue Dec 15 21:46:42 2015 -0700

    regcomp.c: Change variable name to be clearer
    
    This name confused me, and led to suboptimal code.  The new name is more
    cumbersome, but won't confuse (at least it won't confuse me).

M       regcomp.c
-----------------------------------------------------------------------

Summary of changes:
 embed.fnc |   2 +-
 embed.h   |   3 +
 proto.h   |   3 +
 regcomp.c | 226 ++++++++++++++++++++++++++++++++++++++++++++------------------
 regcomp.h | 193 ++++++++++++++++++++++++++++++++++++++---------------
 regexec.c |  26 ++++++--
 6 files changed, 331 insertions(+), 122 deletions(-)

diff --git a/embed.fnc b/embed.fnc
index 877438a..dd764e1 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1170,7 +1170,7 @@ Ap        |SV*    |regclass_swash |NULLOK const regexp 
*prog \
                                |NULLOK SV **listsvp|NULLOK SV **altsvp
 #if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_PERL_C) || 
defined(PERL_IN_UTF8_C)
 AMpR   |SV*    |_new_invlist_C_array|NN const UV* const list
-: Not used currently: EXMs     |bool   |_invlistEQ     |NN SV* const a|NN SV* 
const b|const bool complement_b
+EXMp   |bool   |_invlistEQ     |NN SV* const a|NN SV* const b|const bool 
complement_b
 #endif
 Ap     |I32    |pregexec       |NN REGEXP * const prog|NN char* stringarg \
                                |NN char* strend|NN char* strbeg \
diff --git a/embed.h b/embed.h
index fa98971..75015fe 100644
--- a/embed.h
+++ b/embed.h
@@ -1031,6 +1031,9 @@
 #  if defined(PERL_IN_REGCOMP_C) || defined (PERL_IN_DUMP_C)
 #define _invlist_dump(a,b,c,d) Perl__invlist_dump(aTHX_ a,b,c,d)
 #  endif
+#  if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_PERL_C) || 
defined(PERL_IN_UTF8_C)
+#define _invlistEQ(a,b,c)      Perl__invlistEQ(aTHX_ a,b,c)
+#  endif
 #  if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C)
 #define _load_PL_utf8_foldclosures()   Perl__load_PL_utf8_foldclosures(aTHX)
 #define regprop(a,b,c,d,e)     Perl_regprop(aTHX_ a,b,c,d,e)
diff --git a/proto.h b/proto.h
index 76a44bc..9fb3ead 100644
--- a/proto.h
+++ b/proto.h
@@ -4853,6 +4853,9 @@ PERL_CALLCONV void        Perl__invlist_dump(pTHX_ PerlIO 
*file, I32 level, const char*
        assert(file); assert(indent); assert(invlist)
 #endif
 #if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_PERL_C) || 
defined(PERL_IN_UTF8_C)
+PERL_CALLCONV bool     Perl__invlistEQ(pTHX_ SV* const a, SV* const b, const 
bool complement_b);
+#define PERL_ARGS_ASSERT__INVLISTEQ    \
+       assert(a); assert(b)
 PERL_CALLCONV SV*      Perl__new_invlist_C_array(pTHX_ const UV* const list)
                        __attribute__warn_unused_result__;
 #define PERL_ARGS_ASSERT__NEW_INVLIST_C_ARRAY  \
diff --git a/regcomp.c b/regcomp.c
index e3675a0..de9a5b9 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -1308,7 +1308,8 @@ S_ssc_and(pTHX_ const RExC_state_t *pRExC_state, 
regnode_ssc *ssc,
         else {
             anded_flags = ANYOF_FLAGS(and_with)
             &( ANYOF_COMMON_FLAGS
-              |ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER);
+              |ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER
+              
|ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP);
         }
     }
 
@@ -1463,7 +1464,8 @@ S_ssc_or(pTHX_ const RExC_state_t *pRExC_state, 
regnode_ssc *ssc,
         if (OP(or_with) != ANYOFD) {
             ored_flags
             |= ANYOF_FLAGS(or_with)
-             & ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER;
+             & ( ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER
+                
|ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP);
         }
     }
 
@@ -1665,7 +1667,8 @@ S_ssc_finalize(pTHX_ RExC_state_t *pRExC_state, 
regnode_ssc *ssc)
      * by the time we reach here */
     assert(! (ANYOF_FLAGS(ssc)
         & ~( ANYOF_COMMON_FLAGS
-            |ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER)));
+            |ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER
+            
|ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP)));
 
     populate_ANYOF_from_invlist( (regnode *) ssc, &invlist);
 
@@ -9380,7 +9383,7 @@ Perl__load_PL_utf8_foldclosures (pTHX)
 
 #ifdef PERL_ARGS_ASSERT__INVLISTEQ
 bool
-S__invlistEQ(pTHX_ SV* const a, SV* const b, const bool complement_b)
+Perl__invlistEQ(pTHX_ SV* const a, SV* const b, const bool complement_b)
 {
     /* Return a boolean as to if the two passed in inversion lists are
      * identical.  The final argument, if TRUE, says to take the complement of
@@ -13096,9 +13099,6 @@ S_populate_ANYOF_from_invlist(pTHX_ regnode *node, SV** 
invlist_ptr)
             if (end == UV_MAX && start <= NUM_ANYOF_CODE_POINTS) {
                 ANYOF_FLAGS(node) |= ANYOF_MATCHES_ALL_ABOVE_BITMAP;
             }
-            else if (end >= NUM_ANYOF_CODE_POINTS) {
-                ANYOF_FLAGS(node) |= ANYOF_HAS_UTF8_NONBITMAP_MATCHES;
-            }
 
            /* Quit if are above what we should change */
            if (start >= NUM_ANYOF_CODE_POINTS) {
@@ -14369,8 +14369,9 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
     bool has_user_defined_property = FALSE;
 
     /* inversion list of code points this node matches only when the target
-     * string is in UTF-8.  (Because is under /d) */
-    SV* depends_list = NULL;
+     * string is in UTF-8.  These are all non-ASCII, < 256.  (Because is under
+     * /d) */
+    SV* has_upper_latin1_only_utf8_matches = NULL;
 
     /* Inversion list of code points this node matches regardless of things
      * like locale, folding, utf8ness of the target string */
@@ -14423,9 +14424,7 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
     ret = reganode(pRExC_state,
                    (LOC)
                     ? ANYOFL
-                    : (DEPENDS_SEMANTICS)
-                      ? ANYOFD
-                      : ANYOF,
+                    : ANYOF,
                    0);
 
     if (SIZE_ONLY) {
@@ -14779,15 +14778,9 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
                         optimizable = FALSE;    /* Will have to leave this an
                                                    ANYOF node */
 
-                        /* We don't know yet, so have to assume that the
-                         * property could match something in the upper Latin1
-                         * range, hence something that isn't utf8.  Note that
-                         * this would cause things in <depends_list> to match
-                         * inappropriately, except that any \p{}, including
-                         * this one forces Unicode semantics, which means there
-                         * is no <depends_list> */
-                        ANYOF_FLAGS(ret)
-                                      |= ANYOF_HAS_NONBITMAP_NON_UTF8_MATCHES;
+                        /* We don't know yet what this matches, so have to flag
+                         * it */
+                        ANYOF_FLAGS(ret) |= 
ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP;
                     }
                     else {
 
@@ -15785,9 +15778,10 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
                                                             PL_fold_latin1[j]);
                             }
                             else {
-                                depends_list =
-                                 add_cp_to_invlist(depends_list,
-                                                   PL_fold_latin1[j]);
+                                has_upper_latin1_only_utf8_matches
+                                    = add_cp_to_invlist(
+                                            has_upper_latin1_only_utf8_matches,
+                                            PL_fold_latin1[j]);
                             }
                         }
 
@@ -15851,8 +15845,10 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
                             else {
                                 /* Similarly folds involving non-ascii Latin1
                                 * characters under /d are added to their list 
*/
-                                depends_list = add_cp_to_invlist(depends_list,
-                                                                 c);
+                                has_upper_latin1_only_utf8_matches
+                                        = add_cp_to_invlist(
+                                           has_upper_latin1_only_utf8_matches,
+                                           c);
                             }
                         }
                     }
@@ -15928,13 +15924,15 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
                 cp_list = posixes;
             }
 
-            if (depends_list) {
-                _invlist_union(depends_list, nonascii_but_latin1_properties,
-                               &depends_list);
+            if (has_upper_latin1_only_utf8_matches) {
+                _invlist_union(has_upper_latin1_only_utf8_matches,
+                               nonascii_but_latin1_properties,
+                               &has_upper_latin1_only_utf8_matches);
                 SvREFCNT_dec_NN(nonascii_but_latin1_properties);
             }
             else {
-                depends_list = nonascii_but_latin1_properties;
+                has_upper_latin1_only_utf8_matches
+                                            = nonascii_but_latin1_properties;
             }
         }
     }
@@ -15948,8 +15946,8 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
      * class that isn't a Unicode property, and which matches above Unicode, \W
      * or [\x{110000}] for example.
      * (Note that in this case, unlike the Posix one above, there is no
-     * <depends_list>, because having a Unicode property forces Unicode
-     * semantics */
+     * <has_upper_latin1_only_utf8_matches>, because having a Unicode property
+     * forces Unicode semantics */
     if (properties) {
         if (cp_list) {
 
@@ -15998,7 +15996,8 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
      * locales, or the class matches at least one 0-255 range code point */
     if (LOC && FOLD) {
         if (only_utf8_locale_list) {
-            ANYOF_FLAGS(ret) |= ANYOF_LOC_FOLD;
+            ANYOF_FLAGS(ret) |=  ANYOF_LOC_FOLD
+                                |ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES;
         }
         else if (cp_list) { /* Look to see if a 0-255 code point is in list */
             UV start, end;
@@ -16010,14 +16009,83 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
         }
     }
 
+#define MATCHES_ALL_NON_UTF8_NON_ASCII(ret)                                 \
+    (   DEPENDS_SEMANTICS                                                   \
+     && ANYOF_FLAGS(ret)                                                    \
+        & ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER)
+
+    /* See if we can simplify things under /d */
+    if (   has_upper_latin1_only_utf8_matches
+        || MATCHES_ALL_NON_UTF8_NON_ASCII(ret))
+    {
+        if (has_upper_latin1_only_utf8_matches) {
+            if (MATCHES_ALL_NON_UTF8_NON_ASCII(ret)) {
+
+                /* Here, we have two, almost opposite, constraints in effect
+                 * for upper latin1 characters.  The macro means they all match
+                 * when the target string ISN'T in UTF-8.
+                 * 'has_upper_latin1_only_utf8_matches' contains the chars that
+                 * match only if the target string IS UTF-8.  Therefore the
+                 * ones in 'has_upper_latin1_only_utf8_matches' match
+                 * regardless of UTF-8, so can be added to the regular list,
+                 * and 'has_upper_latin1_only_utf8_matches' cleared */
+                _invlist_union(cp_list,
+                               has_upper_latin1_only_utf8_matches,
+                               &cp_list);
+                SvREFCNT_dec_NN(has_upper_latin1_only_utf8_matches);
+                has_upper_latin1_only_utf8_matches = NULL;
+            }
+            else if (cp_list) {
+
+                /* Here, 'cp_list' gives chars that always match, and
+                 * 'has_upper_latin1_only_utf8_matches' gives chars that were
+                 * specified to match only if the target string is in UTF-8.
+                 * It may be that these overlap, so we can subtract the
+                 * unconditionally matching from the conditional ones, to make
+                 * the conditional list as small as possible, perhaps even
+                 * clearing it, in which case more optimizations are possible
+                 * later */
+                _invlist_subtract(has_upper_latin1_only_utf8_matches,
+                                  cp_list,
+                                  &has_upper_latin1_only_utf8_matches);
+                if (_invlist_len(has_upper_latin1_only_utf8_matches) == 0) {
+                    SvREFCNT_dec_NN(has_upper_latin1_only_utf8_matches);
+                    has_upper_latin1_only_utf8_matches = NULL;
+                }
+            }
+        }
+
+        /* Similarly, if the unconditional matches include every upper latin1
+         * character, we can clear that flag to permit later optimizations */
+        if (cp_list && MATCHES_ALL_NON_UTF8_NON_ASCII(ret)) {
+            SV* only_non_utf8_list = invlist_clone(PL_UpperLatin1);
+            _invlist_subtract(only_non_utf8_list, cp_list, 
&only_non_utf8_list);
+            if (_invlist_len(only_non_utf8_list) == 0) {
+                ANYOF_FLAGS(ret) &= 
~ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER;
+            }
+            SvREFCNT_dec_NN(only_non_utf8_list);
+            only_non_utf8_list = NULL;;
+        }
+
+        /* If we haven't gotten rid of all conditional matching, we change the
+         * regnode type to indicate that */
+        if (   has_upper_latin1_only_utf8_matches
+            || MATCHES_ALL_NON_UTF8_NON_ASCII(ret))
+        {
+            OP(ret) = ANYOFD;
+            optimizable = FALSE;
+        }
+    }
+#undef MATCHES_ALL_NON_UTF8_NON_ASCII
+
     /* Optimize inverted simple patterns (e.g. [^a-z]) when everything is known
      * at compile time.  Besides not inverting folded locale now, we can't
      * invert if there are things such as \w, which aren't known until runtime
      * */
     if (cp_list
         && invert
+        && OP(ret) != ANYOFD
         && ! (ANYOF_FLAGS(ret) & (ANYOF_LOCALE_FLAGS))
-       && ! depends_list
        && ! HAS_NONLOCALE_RUNTIME_PROPERTY_DEFINITION)
     {
         _invlist_invert(cp_list);
@@ -16066,9 +16134,10 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
      * space).  _invlistEQ() could be used if one ever wanted to do something
      * like this at this point in the code */
 
-    if (optimizable && cp_list && ! invert && ! depends_list) {
+    if (optimizable && cp_list && ! invert) {
         UV start, end;
         U8 op = END;  /* The optimzation node-type */
+        int posix_class;
         const char * cur_parse= RExC_parse;
 
         invlist_iterinit(cp_list);
@@ -16151,6 +16220,37 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
         }
         invlist_iterfinish(cp_list);
 
+        if (op == END) {
+
+            /* Here, didn't find an optimization.  See if this matches any of
+             * the POSIX classes.  These run slightly faster for above-Unicode
+             * code points, so don't bother with POSIXA ones nor the 2 that
+             * have no above-Unicode matches */
+            for (posix_class = 0;
+                 posix_class <= _HIGHEST_REGCOMP_DOT_H_SYNC;
+                 posix_class++)
+            {
+                int try_inverted;
+                if (posix_class == _CC_ASCII || posix_class == _CC_CNTRL) {
+                    continue;
+                }
+                for (try_inverted = 0; try_inverted < 2; try_inverted++) {
+
+                    /* Check if matches normal or inverted */
+                    if (_invlistEQ(cp_list,
+                                   PL_XPosix_ptrs[posix_class],
+                                   try_inverted))
+                    {
+                        op = (try_inverted)
+                             ? NPOSIXU
+                             : POSIXU;
+                        *flagp |= HASWIDTH|SIMPLE;
+                        goto found_posix;
+                    }
+                }
+            }
+          found_posix: ;
+        }
         if (op != END) {
             RExC_parse = (char *)orig_parse;
             RExC_emit = (regnode *)orig_emit;
@@ -16168,6 +16268,9 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
                                            TRUE /* downgradable to EXACT */
                                           );
             }
+            else if (PL_regkind[op] == POSIXD || PL_regkind[op] == NPOSIXD) {
+                FLAGS(ret) = posix_class;
+            }
 
             SvREFCNT_dec_NN(cp_list);
             return ret;
@@ -16188,16 +16291,19 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
 
     /* Here, the bitmap has been populated with all the Latin1 code points that
      * always match.  Can now add to the overall list those that match only
-     * when the target string is UTF-8 (<depends_list>). */
-    if (depends_list) {
+     * when the target string is UTF-8 (<has_upper_latin1_only_utf8_matches>).
+     * */
+    if (has_upper_latin1_only_utf8_matches) {
        if (cp_list) {
-           _invlist_union(cp_list, depends_list, &cp_list);
-           SvREFCNT_dec_NN(depends_list);
+           _invlist_union(cp_list,
+                           has_upper_latin1_only_utf8_matches,
+                           &cp_list);
+           SvREFCNT_dec_NN(has_upper_latin1_only_utf8_matches);
        }
        else {
-           cp_list = depends_list;
+           cp_list = has_upper_latin1_only_utf8_matches;
        }
-        ANYOF_FLAGS(ret) |= ANYOF_HAS_UTF8_NONBITMAP_MATCHES;
+        ANYOF_FLAGS(ret) |= 
ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP;
     }
 
     /* If there is a swash and more than one element, we can't use the swash in
@@ -16265,18 +16371,13 @@ S_set_ANYOF_arg(pTHX_ RExC_state_t* const pRExC_state,
 
     if (! cp_list && ! runtime_defns && ! only_utf8_locale_list) {
         assert(! (ANYOF_FLAGS(node)
-                  & (ANYOF_HAS_UTF8_NONBITMAP_MATCHES
-                     |ANYOF_HAS_NONBITMAP_NON_UTF8_MATCHES)));
+                & 
ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP));
        ARG_SET(node, ANYOF_ONLY_HAS_BITMAP);
     }
     else {
        AV * const av = newAV();
        SV *rv;
 
-        assert(ANYOF_FLAGS(node)
-               & (ANYOF_HAS_UTF8_NONBITMAP_MATCHES
-                  |ANYOF_HAS_NONBITMAP_NON_UTF8_MATCHES|ANYOF_LOC_FOLD));
-
        av_store(av, 0, (runtime_defns)
                        ? SvREFCNT_inc(runtime_defns) : &PL_sv_undef);
        if (swash) {
@@ -16340,10 +16441,6 @@ Perl__get_regclass_nonbitmap_data(pTHX_ const regexp 
*prog,
 
     PERL_ARGS_ASSERT__GET_REGCLASS_NONBITMAP_DATA;
 
-    assert(ANYOF_FLAGS(node)
-        & (ANYOF_HAS_UTF8_NONBITMAP_MATCHES
-           |ANYOF_HAS_NONBITMAP_NON_UTF8_MATCHES|ANYOF_LOC_FOLD));
-
     if (data && data->count) {
        const U32 n = ARG(node);
 
@@ -16355,9 +16452,6 @@ Perl__get_regclass_nonbitmap_data(pTHX_ const regexp 
*prog,
 
            si = *ary;  /* ary[0] = the string to initialize the swash with */
 
-           /* Elements 3 and 4 are either both present or both absent. [3] is
-            * any inversion list generated at compile time; [4] indicates if
-            * that inversion list has any user-defined properties in it. */
             if (av_tindex(av) >= 2) {
                 if (only_utf8_locale_ptr
                     && ary[2]
@@ -16370,6 +16464,10 @@ Perl__get_regclass_nonbitmap_data(pTHX_ const regexp 
*prog,
                     *only_utf8_locale_ptr = NULL;
                 }
 
+                /* Elements 3 and 4 are either both present or both absent. [3]
+                 * is any inversion list generated at compile time; [4]
+                 * indicates if that inversion list has any user-defined
+                 * properties in it. */
                 if (av_tindex(av) >= 3) {
                     invlist = ary[3];
                     if (SvUV(ary[4])) {
@@ -17288,10 +17386,10 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const 
regnode *o, const regmatch_
             }
         }
 
-       if ((flags & (ANYOF_MATCHES_ALL_ABOVE_BITMAP
-                      |ANYOF_HAS_UTF8_NONBITMAP_MATCHES
-                      |ANYOF_HAS_NONBITMAP_NON_UTF8_MATCHES
-                      |ANYOF_LOC_FOLD)))
+       if ((flags
+                & ( ANYOF_MATCHES_ALL_ABOVE_BITMAP
+                   
|ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP
+                   |ANYOF_LOC_FOLD)))
         {
             if (do_sep) {
                 Perl_sv_catpvf(aTHX_ sv,"%s][%s",PL_colors[1],PL_colors[0]);
@@ -17330,11 +17428,13 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const 
regnode *o, const regmatch_
                     if (*s == '\n') {
                         const char * const t = ++s;
 
-                        if (flags & ANYOF_HAS_NONBITMAP_NON_UTF8_MATCHES) {
-                            sv_catpvs(sv, "{outside bitmap}");
-                        }
-                        else {
-                            sv_catpvs(sv, "{utf8}");
+                        if (flags & 
ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP) {
+                            if (OP(o) == ANYOFD) {
+                                sv_catpvs(sv, "{utf8}");
+                            }
+                            else {
+                                sv_catpvs(sv, "{outside bitmap}");
+                            }
                         }
 
                         if (byte_output) {
diff --git a/regcomp.h b/regcomp.h
index 0b69f6e..5c12a21 100644
--- a/regcomp.h
+++ b/regcomp.h
@@ -373,51 +373,130 @@ struct regnode_ssc {
 #define PASS1 SIZE_ONLY
 #define PASS2 (! SIZE_ONLY)
 
-/* If the bitmap fully represents what this ANYOF node can match, the
- * ARG is set to this special value (since 0, 1, ... are legal, but will never
- * reach this high). */
+/* An ANYOF node is basically a bitmap with the index being a code point.  If
+ * the bit for that code point is 1, the code point matches;  if 0, it doesn't
+ * match (complemented if inverted).  There is an additional mechanism to deal
+ * with cases where the bitmap is insufficient in and of itself.  This #define
+ * indicates if the bitmap does fully represent what this ANYOF node can match.
+ * The ARG is set to this special value (since 0, 1, ... are legal, but will
+ * never reach this high). */
 #define ANYOF_ONLY_HAS_BITMAP  ((U32) -1)
 
-/* Below are the flags for node->flags of ANYOF.  These are in short supply,
- * with none currently available.  The ABOVE_BITMAP_ALL bit could be freed up
- * by resorting to creating a swash containing everything above 255.  This
- * seems likely to introduce a performance penalty (but actual numbers haven't
- * been done), so its probably better do some of the other possibilities below
- * in preference to this.
+/* When the bimap isn't completely sufficient for handling the ANYOF node,
+ * flags (in node->flags of the ANYOF node) get set to indicate this.  These
+ * are perennially in short supply.  Beyond several cases where warnings need
+ * to be raised under certain circumstances, currently, there are six cases
+ * where the bitmap alone isn't sufficient.  We could use six flags to
+ * represent the 6 cases, but to save flags bits, we play some games.  The
+ * cases are:
  *
- * If just one bit is required, it seems to me (khw) that the best option would
- * be to turn the ANYOF_LOC_REQ_UTF8 bit into a separate node type: a
- * specialization of the ANYOFL type, freeing up the currently occupied bit.
- * When turning a bit into a node type, one has to take into consideration that
- * a SSC may use that bit -- not just a regular ANYOF[DL]?.  In the case of
- * ANYOF_LOC_REQ_UTF8, the only likely problem is accurately settting the SSC
- * node-type to the new one, which would likely involve S_ssc_or and S_ssc_and,
- * and not how the SSC currently gets set to ANYOFL.  This bit is a natural
- * candidate for being a separate node type because it is a specialization of
- * the current ANYOFL, and because no other ANYOFL-only bits are set when it
- * is; also most of its uses are actually outside the reginclass() function, so
- * this could be done with no performance penalty.  The other potential bits
- * seem to me to have a potential issue with a combinatorial explosion of node
- * types, because of not having that mutual exclusivity, where you may end up
- * having to have a node type for bitX being set, one for bitY, and one for
- * both bitXY.
+ *  1)  The bitmap has a compiled-in very finite size.  So something else needs
+ *      to be used to specify if a code point that is too large for the bitmap
+ *      actually matches.  The mechanism currently is a swash or inversion
+ *      list.  ANYOF_ONLY_HAS_BITMAP, described above, being TRUE indicates
+ *      there are no matches of too-large code points.  But if it is FALSE,
+ *      then almost certainly there are matches too large for the bitmap.  (The
+ *      other cases, described below, either imply this one or are extremely
+ *      rare in practice.)  So we can just assume that a too-large code point
+ *      will need something beyond the bitmap if ANYOF_ONLY_HAS_BITMAP is
+ *      FALSE, instead of having a separate flag for this.
+ *  2)  A subset of item 1) is if all possible code points outside the bitmap
+ *      match.  This is a common occurrence when the class is complemented,
+ *      like /[^ij]/.  Therefore a bit is reserved to indicate this,
+ *      ANYOF_MATCHES_ALL_ABOVE_BITMAP.  If it became necessary, this bit could
+ *      be replaced by using the normal swash mechanism, but with a performance
+ *      penalty.
+ *  3)  Under /d rules, it can happen that code points that are in the upper
+ *      latin1 range (\x80-\xFF or their equivalents on EBCDIC platforms) match
+ *      only if the runtime target string being matched against is UTF-8.  For
+ *      example /[\w[:punct:]]/d.  This happens only for posix classes (with a
+ *      couple of exceptions, like \d), and all such ones also have
+ *      above-bitmap matches.  Thus, 3) implies 1) as well.  Note that /d rules
+ *      are no longer encouraged; 'use 5.14' or higher deselects them.  But a
+ *      flag is required so that they can be properly handled.  But it can be a
+ *      shared flag: see 5) below.
+ *  4)  Also under /d rules, something like /[\Wfoo] will match everything in
+ *      the \x80-\xFF range, unless the string being matched against is UTF-8.
+ *      A swash could be created for this case, but this is relatively common,
+ *      and it turns out that it's all or nothing:  if any one of these code
+ *      points matches, they all do.  Hence a single bit suffices.  We use a
+ *      shared bit that doesn't take up space by itself:
+ *      ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER.
+ *      This also implies 1), with one exception: [:^cntrl:].
+ *  5)  A user-defined \p{} property may not have been defined by the time the
+ *      regex is compiled.  In this case, we don't know until runtime what it
+ *      will match, so we have to assume it could match anything, including
+ *      code points that ordinarily would be in the bitmap.  A flag bit is
+ *      necessary to indicate this , though it can be shared with the item 3)
+ *      flag, as that only occurs under /d, and this only occurs under non-d.
+ *      This case is quite uncommon in the field, and the /(?[ ...])/ construct
+ *      is a better way to accomplish what this feature does.  This case also
+ *      implies 1).
+ *      ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP
+ *      is the shared bit.
+ *  6)  /[foo]/il may have folds that are only valid if the runtime locale is a
+ *      UTF-8 one.  These are quite rare, so it would be good to avoid the
+ *      expense of looking for them.  But /l matching is slow anyway, and we've
+ *      traditionally not worried to much about its performance.  And this
+ *      condition requires the ANYOF_LOC_FOLD flag to be set, so testing for
+ *      that flag would be sufficient to rule out most cases of this.  So it is
+ *      unclear if this should have a flag or not.  But, one is currently
+ *      allocated for this purpose, ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES (and the
+ *      text below indicates how to share it, should another bit be needed).
  *
- * If you don't want to do this, or two bits are required, one could instead
- * rename the ANYOF_POSIXL bit to be ANYOFL_LARGE, to mean that the ANYOF node
- * has an extra 32 bits beyond what a regular one does.  That's what it
- * effectively means now, with the extra space all for the POSIX class bits.
- * But those classes actually only occupy 30 bits, so the ANYOF_LOC_REQ_BIT (if
- * an extra node type for it hasn't been created) and/or the ANYOF_LOC_FOLD
- * bits could be moved there.  The downside of this is that ANYOFL nodes with
- * whichever of the bits get moved would have to have the extra space always
- * allocated.
+ * At the moment, there are no spare bits, but this could be changed by various
+ * tricks.  Notice that item 6) is not independent of the ANYOF_LOC_FOLD flag
+ * below.  Also, the ANYOF_LOC_REQ_UTF8 flag is set only if both these aren't.
+ * We can therefore use a 2-bit field to represent these 3 flags, as follows:
+ *      00  => ANYOF_LOC_REQ_UTF8
+ *      01  => no folding
+ *      10  => ANYOF_LOC_FOLD alone
+ *      11  => ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES
  *
- * If three bits are required, one could additionally make a node type for
- * ANYOFL_LARGE, removing that as a bit, and move both the above bits to that
- * extra word.  There isn't an SSC problem as all SSCs are this large anyway,
- * and the SSC could be set to this node type.   REGINCLASS would have to be
- * modified so that if the node type were this, it would call reginclass().
- * as the flag bit that does this now would be gone.
+ * Beyond that, note that the information may be conveyed by creating new
+ * regnode types.  This is not the best solution, as shown later in this
+ * paragraph, but it is something that is feasible.  We could have a regnode
+ * for ANYOF_INVERT, for example.  A complication of this is that the regexec.c
+ * REGINCLASS macro assumes that it can just use the bitmap if no flags are
+ * set.  This would have to be changed to add extra tests for the node type, or
+ * a special bit reserved that means unspecified special handling, and then the
+ * node-type would be used internally to sort that out.  So we could gain a bit
+ * by having an ANYOF_SPECIAL bit, and a node type for INVERT, and another for
+ * POSIXL, and still another for INVERT_POSIXL.  This example illustrates one
+ * problem with this, a combinatorial explosion of node types.  The one node
+ * type khw can think of that doesn't have this explosion issue is
+ * ANYOF_LOC_REQ_UTF8, but you'd do this only if you haven't done the 2-bit
+ * field trick above.  This bit is a natural candidate for being a separate
+ * node type because it is a specialization of the current ANYOFL, and because
+ * no other ANYOFL-only bits are set when it is; also most of its uses are
+ * actually outside the reginclass() function, so this could be done with no
+ * performance penalty.  But again, the 2-bit field trick combines this bit so
+ * it doesn't take up space anyway.  Another issue when turning a bit into a
+ * node type, is that a SSC may use that bit -- not just a regular ANYOF[DL]?.
+ * In the case of ANYOF_LOC_REQ_UTF8, the only likely problem is accurately
+ * settting the SSC node-type to the new one, which would likely involve
+ * S_ssc_or and S_ssc_and, and not how the SSC currently gets set to ANYOFL.
+ *
+ * Another possibility is to instead rename the ANYOF_POSIXL bit to be
+ * ANYOFL_LARGE, to mean that the ANYOF node has an extra 32 bits beyond what a
+ * regular one does.  That's what it effectively means now, with the extra
+ * space all for the POSIX class bits.  But those classes actually only occupy
+ * 30 bits, so the 2-bit field or 2 of the locale bits could be moved to that
+ * extra space.  The downside of this is that ANYOFL nodes with whichever of
+ * the bits get moved would have to have the extra space always allocated.
+ *
+ * One could completely remove ANYOFL_LARGE and make all ANYOFL nodes large.
+ * The 30 bits in the extra word would indicate if a posix class should be
+ * looked up or not.  There isn't an SSC problem as all SSCs are this large
+ * anyway, and the SSC could be set to this node type.   REGINCLASS would have
+ * to be modified so that if the node type were this, it would call
+ * reginclass(), as the flag bit that indicates to do this now would be gone.
+ * If the 2-bit field is used and moved to the larger structure, this would
+ * free up a total of 4 bits.  If this were done, we could create an
+ * ANYOF_INVERT node-type without a combinatorial explosion, getting us to 5
+ * bits.  And, keep in mind that ANYOF_MATCHES_ALL_ABOVE_BITMAP is solely for
+ * performance, so could be removed.  The other performance-related bits are
+ * shareable with bits that are required.
  *
  * Several flags are not used in synthetic start class (SSC) nodes, so could be
  * shared should new flags be needed for SSCs, like SSC_MATCHES_EMPTY_STRING
@@ -443,23 +522,32 @@ struct regnode_ssc {
  * then.  Only set under /l; never in an SSC  */
 #define ANYOF_LOC_FOLD                          0x04
 
+/* If set, ANYOF_LOC_FOLD is also set, and there are potential matches that
+ * will be valid only if the locale is a UTF-8 one. */
+#define ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES        0x08
+
 /* If set, means to warn if runtime locale isn't a UTF-8 one.  Only under /l.
- * If set, none of INVERT, LOC_FOLD, POSIXL, HAS_NONBITMAP_NON_UTF8_MATCHES can
+ * If set, none of INVERT, LOC_FOLD, POSIXL,
+ * ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP can
  * be set.  Can be in an SSC */
-#define ANYOF_LOC_REQ_UTF8                      0x08
+#define ANYOF_LOC_REQ_UTF8                      0x10
 
 /* If set, the node matches every code point NUM_ANYOF_CODE_POINTS and above.
  * Can be in an SSC */
-#define ANYOF_MATCHES_ALL_ABOVE_BITMAP          0x10
+#define ANYOF_MATCHES_ALL_ABOVE_BITMAP          0x20
 
-/* If set, the node can match something outside the bitmap that isn't in utf8;
- * never set under /d nor in an SSC */
-#define ANYOF_HAS_NONBITMAP_NON_UTF8_MATCHES    0x20
-
-/* Are there things outside the bitmap that will match only if the target
- * string is encoded in UTF-8?  (This is not set if ANYOF_ABOVE_BITMAP_ALL is
- * set).  Can be in SSC */
-#define ANYOF_HAS_UTF8_NONBITMAP_MATCHES        0x40
+/* Shared bit:
+ *      Under /d it means the ANYOFD node matches more things if the target
+ *          string is encoded in UTF-8; any such things will be non-ASCII,
+ *          characters that are < 256, and can be accessed via the swash.
+ *      When not under /d, it means the ANYOF node contains a user-defined
+ *      property that wasn't yet defined at the time the regex was compiled,
+ *      and so must be looked up at runtime, by creating a swash
+ * (These uses are mutually exclusive because a user-defined property is
+ * specified by \p{}, and \p{} implies /u which deselects /d).  The long macro
+ * name is to make sure that you are cautioned about its shared nature.  Only
+ * the non-/d meaning can be in an SSC */
+#define 
ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP  0x40
 
 /* Shared bit:
  *      Under /d it means the ANYOFD node matches all non-ASCII Latin1
@@ -479,8 +567,7 @@ struct regnode_ssc {
 /* These are the flags that apply to both regular ANYOF nodes and synthetic
  * start class nodes during construction of the SSC.  During finalization of
  * the SSC, other of the flags may get added to it */
-#define ANYOF_COMMON_FLAGS    ( ANYOF_HAS_UTF8_NONBITMAP_MATCHES    \
-                               |ANYOF_LOC_REQ_UTF8)
+#define ANYOF_COMMON_FLAGS      ANYOF_LOC_REQ_UTF8
 
 /* Character classes for node->classflags of ANYOF */
 /* Should be synchronized with a table in regprop() */
diff --git a/regexec.c b/regexec.c
index 16b9e0a..afe87a5 100644
--- a/regexec.c
+++ b/regexec.c
@@ -8732,11 +8732,27 @@ S_reginclass(pTHX_ regexp * const prog, const regnode * 
const n, const U8* const
         {
            match = TRUE;       /* Everything above the bitmap matches */
        }
-       else if ((flags & ANYOF_HAS_NONBITMAP_NON_UTF8_MATCHES)
-                 || (utf8_target && (flags & ANYOF_HAS_UTF8_NONBITMAP_MATCHES))
-                  || ((flags & ANYOF_LOC_FOLD)
-                       && IN_UTF8_CTYPE_LOCALE
-                       && ARG(n) != ANYOF_ONLY_HAS_BITMAP))
+            /* Here doesn't match everything above the bitmap.  If there is
+             * some information available beyond the bitmap, we may find a
+             * match in it.  If so, this is most likely because the code point
+             * is outside the bitmap range.  But rarely, it could be because of
+             * some other reason.  If so, various flags are set to indicate
+             * this possibility.  On ANYOFD nodes, there may be matches that
+             * happen only when the target string is UTF-8; or for other node
+             * types, because runtime lookup is needed, regardless of the
+             * UTF-8ness of the target string.  Finally, under /il, there may
+             * be some matches only possible if the locale is a UTF-8 one. */
+       else if (    ARG(n) != ANYOF_ONLY_HAS_BITMAP
+                 && (   c >= NUM_ANYOF_CODE_POINTS
+                     || (   (flags & 
ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP)
+                         && (   UNLIKELY(OP(n) != ANYOFD)
+                             || (utf8_target && ! isASCII_uni(c)
+#                               if NUM_ANYOF_CODE_POINTS > 256
+                                                                 && c < 256
+#                               endif
+                                )))
+                     || ((   flags & ANYOF_ONLY_UTF8_LOC_FOLD_MATCHES)
+                          && IN_UTF8_CTYPE_LOCALE)))
         {
             SV* only_utf8_locale = NULL;
            SV * const sw = _get_regclass_nonbitmap_data(prog, n, TRUE, 0,

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.23.5-246-g709be74

Reply via email to