[perl.git] branch blead, updated. v5.21.7-165-g613abc6

Karl Williamson Mon, 29 Dec 2014 12:53:53 -0800

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/613abc6d16e99bd9834fe6afd79beb61a3a4734d?hp=ea5519d61c4e7f31f98e6f49013cbdadbfa26308>


- Log -----------------------------------------------------------------
commit 613abc6d16e99bd9834fe6afd79beb61a3a4734d
Author: Karl Williamson <[email protected]>
Date:   Mon Dec 29 13:15:57 2014 -0700

    Raise warning on multi-byte char in single-byte locale
    
    See http://nntp.perl.org/group/perl.perl5.porters/211909
    
    Something is quite likely wrong with the logic if say in a Greek locale,
    Unicode characters (especially Greek ones) are encountered.  The same
    character will be represented by two different code points.  This
    warning alerts the user to this undesirable state of affairs.

M       perl.h
M       pod/perldelta.pod
M       pod/perldiag.pod
M       pod/perllocale.pod
M       regexec.c
M       t/lib/warnings/regexec
M       t/lib/warnings/utf8
M       t/re/charset.t
M       t/re/pat_advanced.t
M       utf8.c

commit dbf3c4d788344c8d20eb2549c638ced519d3f0e8
Author: Karl Williamson <[email protected]>
Date:   Mon Dec 29 12:57:02 2014 -0700

    perllocale: Nits

M       pod/perllocale.pod

commit d635b7101aac73db76a54016b58991ba7cd8d778
Author: Karl Williamson <[email protected]>
Date:   Mon Dec 29 11:01:59 2014 -0700

    foldEQ_utf8(): Add some internal flags
    
    The comments explain their purpose

M       utf8.c
M       utf8.h

commit 8bdce3944e3c1dd192c971851b33f718084e1942
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 26 19:37:50 2014 -0700

    lib/warnings/utf8: Add missing tests
    
    lcfirst had been overlooked, and we had failed to test that turning off
    warnings actually suppresses them.

M       t/lib/warnings/utf8

commit 1d39b2cd2a278ed0630f07bd7598726910eb6427
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 26 18:31:04 2014 -0700

    Simplify foldEQ_utf8
    
    This moves the uncommon case of handling inputs under non-UTF-8 locales
    out of this function to the functions it calls, which already have the
    logic to handle it.  This simplifies this function, cutting a couple
    branches each time through the loop from the common usage.
    
    The locale handling is slowed down somewhat, but even if that were a
    concern, another simpler function is normally used for locale handling.
    This gets called only when one or both of the comparison strings is
    UTF-8, which should be comparatively rare for non-UTF8 locales.

M       utf8.c

commit 357aaddece5471320c7b8b94099d29e9ee5c74fb
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 26 18:20:14 2014 -0700

    utf8.c: Use OP_DESC instead of passing string.
    
    OP_DESC is simpler and more general.

M       embed.fnc
M       embed.h
M       proto.h
M       utf8.c

commit e7b7ac466eda00925c5668867d967e36cdfcb731
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 26 17:47:37 2014 -0700

    utf8.c: Fix potential fold bug
    
    The function _to_uni_fold_flags() supposedly had the ability to do
    folding based on the current locale, if the correct flag is passed.
    However, it didn't actually do that, returning a non-locale fold
    instead.  Fortunately, this is an undocumented capability (actually, the
    whole function is undocumented), and no current calls to it used the
    flag.  This commit causes it to work.

M       utf8.c

commit aa8ebe624a829b26875f96527e6ba67a796bd995
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 26 15:41:33 2014 -0700

    utf8.c: Add some function parameter assertions
    
    Currently these are not violated, but this guards against future
    mistakes.

M       utf8.c

commit 01f55654e76127c670a8806c68378a393e3972f4
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 18 22:21:21 2014 -0700

    regexec.c: Move goto label to avoid redundant work
    
    This causes a goto to skip work that has already been done.

M       regexec.c

commit 8e57b935987c3b2b9702249e3565bd39d3a05f9f
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 26 16:08:28 2014 -0700

    regexec.c: White-space only

M       regexec.c

commit 780fcc9fd03dbbd16715e2b6ecd020f9e50b7cc7
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 18 13:29:51 2014 -0700

    Don't raise 'poorly supported' locale warning unnecessarily
    
    Commit 8c6180a91de91a1194f427fc639694f43a903a78 added a warning message
    for when Perl determines that the program's underlying locale just
    switched into is poorly supported.  At the time it was thought that this
    would be an extremely rare occurrence.  However, a bug in HP-UX -
    B.11.00/64 causes this message to be raised for the "C" locale.  A
    workaround was done that silenced those.  However, before it got fixed,
    this message would occur gobs of times executing the test suite.  It was
    raised even if the script is not locale-aware, so that the underlying
    locale was completely irrelevant.  There is a good prospect that someone
    using an older Asian locale as their default would get this message
    inappropriately, even if they don't use locales, or switch to a
    supported one before using them.
    
    This commit causes the message to be raised only if it actually is
    relevant.  When not in the scope of 'use locale', the message is stored,
    not raised.  Upon the first locale-dependent operation within a bad
    locale, the saved message is raised, and the storage cleared.  I was
    able to do this without adding extra branching to the main-line
    non-locale execution code.  This was done by adding regnodes which get
    jumped to by switch statements, and refactoring some existing C tests so
    they exclude non-locale right off the bat.
    
    These changes would have been necessary for another locale warning that
    I previously agreed to implement, and which is coming a few commits from
    now.
    
    I do not know of any way to add tests in the test suite for this.  It is
    in fact rare for modern locales to have these issues.  The way I tested
    this was to temporarily change the C code so that all locales are viewed
    as defective, and manually note that the warnings came out where
    expected, and only where expected.
    
    I chose not to try to output this warning on any POSIX functions called.
    I believe that all that are affected are deprecated or scheduled to be
    deprecated anyway.  And POSIX is closer to the hardware of the machine.
    
    For convenience, I also don't output the message for some zero-length
    pattern matches.  If something is going to be matched, the message will
    likely very soon be raised anyway.

M       embedvar.h
M       intrpvar.h
M       locale.c
M       perl.c
M       perl.h
M       pod/perldelta.pod
M       pod/perldiag.pod
M       pp.c
M       regexec.c
M       sv.c
M       utf8.c

commit a4525e789871d3846f20d0ea7d2d239c6a21a5a4
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 18 10:42:30 2014 -0700

    Add regex nodes for locale
    
    These will be used in a future commit to distinguish between /l patterns
    vs non-/l.

M       pod/perldebguts.pod
M       regcomp.c
M       regcomp.h
M       regcomp.sym
M       regexec.c
M       regnodes.h

commit e7fd4aa18abbfe0099d4947060c99ca85f42f764
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 18 14:03:09 2014 -0700

    regcomp.c, regexec.c: Vertically align ternary operators
    
    For clarity

M       regcomp.c
M       regexec.c

commit 2f306ab9fcefa58507af5830f60ce18c43bbad18
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 18 13:32:21 2014 -0700

    Nits in comments

M       regcomp.c
M       regcomp.sym
M       regexec.c
M       utf8.c
-----------------------------------------------------------------------

Summary of changes:
 embed.fnc              |   1 -
 embed.h                |   2 +-
 embedvar.h             |   1 +
 intrpvar.h             |   1 +
 locale.c               |  35 ++++--
 perl.c                 |   2 +
 perl.h                 |  46 ++++++++
 pod/perldebguts.pod    |   6 +
 pod/perldelta.pod      |  13 +++
 pod/perldiag.pod       |  34 +++++-
 pod/perllocale.pod     |  30 +++--
 pp.c                   |  23 ++--
 proto.h                |   9 +-
 regcomp.c              |  91 ++++++++++-----
 regcomp.h              |  10 +-
 regcomp.sym            |   6 +
 regexec.c              | 150 ++++++++++++++++++++----
 regnodes.h             | 311 ++++++++++++++++++++++++++-----------------------
 sv.c                   |   3 +
 t/lib/warnings/regexec |  26 +++++
 t/lib/warnings/utf8    |  37 ++++++
 t/re/charset.t         |   2 +
 t/re/pat_advanced.t    |   1 +
 utf8.c                 | 222 ++++++++++++++++++-----------------
 utf8.h                 |   2 +
 25 files changed, 717 insertions(+), 347 deletions(-)

diff --git a/embed.fnc b/embed.fnc
index 52ec3ee..a5b955a 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -2462,7 +2462,6 @@ sn        |NV|mulexp10    |NV value|I32 exponent
 
 #if defined(PERL_IN_UTF8_C)
 sRM    |UV     |check_locale_boundary_crossing                             \
-               |NN const char * const func_name                            \
                |NN const U8* const p                                       \
                |const UV result                                            \
                |NN U8* const ustrp                                         \
diff --git a/embed.h b/embed.h
index 49d9ee8..7895e61 100644
--- a/embed.h
+++ b/embed.h
@@ -1744,7 +1744,7 @@
 #define isa_lookup(a,b,c,d)    S_isa_lookup(aTHX_ a,b,c,d)
 #  endif
 #  if defined(PERL_IN_UTF8_C)
-#define check_locale_boundary_crossing(a,b,c,d,e)      
S_check_locale_boundary_crossing(aTHX_ a,b,c,d,e)
+#define check_locale_boundary_crossing(a,b,c,d)        
S_check_locale_boundary_crossing(aTHX_ a,b,c,d)
 #define is_utf8_common(a,b,c,d)        S_is_utf8_common(aTHX_ a,b,c,d)
 #define swash_scan_list_line(a,b,c,d,e,f,g)    S_swash_scan_list_line(aTHX_ 
a,b,c,d,e,f,g)
 #define swatch_get(a,b,c)      S_swatch_get(aTHX_ a,b,c)
diff --git a/embedvar.h b/embedvar.h
index 32a8b9b..da3c331 100644
--- a/embedvar.h
+++ b/embedvar.h
@@ -352,6 +352,7 @@
 #define PL_utf8_xidstart       (vTHX->Iutf8_xidstart)
 #define PL_utf8cache           (vTHX->Iutf8cache)
 #define PL_utf8locale          (vTHX->Iutf8locale)
+#define PL_warn_locale         (vTHX->Iwarn_locale)
 #define PL_warnhook            (vTHX->Iwarnhook)
 #define PL_watchaddr           (vTHX->Iwatchaddr)
 #define PL_watchok             (vTHX->Iwatchok)
diff --git a/intrpvar.h b/intrpvar.h
index 3bb1c9a..eb96283 100644
--- a/intrpvar.h
+++ b/intrpvar.h
@@ -238,6 +238,7 @@ PERLVAR(I, exit_flags,      U8)             /* was exit() 
unexpected, etc. */
 
 PERLVAR(I, utf8locale, bool)           /* utf8 locale detected */
 PERLVAR(I, in_utf8_CTYPE_locale, bool)
+PERLVAR(I, warn_locale, SV *)
 
 PERLVARA(I, colors,6,  char *)         /* values from PERL_RE_COLORS env var */
 
diff --git a/locale.c b/locale.c
index 429fdb7..2577ed2 100644
--- a/locale.c
+++ b/locale.c
@@ -292,6 +292,8 @@ Perl_new_ctype(pTHX_ const char *newctype)
                                                to start */
         unsigned int bad_count = 0;         /* Count of bad characters */
 
+        SvREFCNT_dec(PL_warn_locale);   /* We are about to overwrite this */
+
         for (i = 0; i < 256; i++) {
             if (isUPPER_LC((U8) i))
                 PL_fold_locale[i] = (U8) toLOWER_LC((U8) i);
@@ -360,17 +362,9 @@ Perl_new_ctype(pTHX_ const char *newctype)
 #endif
 
         if (bad_count || multi_byte_locale) {
-
-            /* We have to save 'newctype' because the setlocale() just below
-             * may destroy it.  The next setlocale() further down should
-             * restore it properly so that the intermediate change here is
-             * transparent to this function's caller */
-            const char * const badlocale = savepv(newctype);
-
-            setlocale(LC_CTYPE, "C");
-            Perl_warner(aTHX_ packWARN(WARN_LOCALE),
+            PL_warn_locale = Perl_newSVpvf(aTHX_
                              "Locale '%s' may not work well.%s%s%s\n",
-                             badlocale,
+                             newctype,
                              (multi_byte_locale)
                               ? "  Some characters in it are not recognized by"
                                 " Perl."
@@ -384,7 +378,26 @@ Perl_new_ctype(pTHX_ const char *newctype)
                               ? bad_chars_list
                               : ""
                             );
-            setlocale(LC_CTYPE, badlocale);
+            /* If we are actually in the scope of the locale, output the
+             * message now.  Otherwise we save it to be output at the first
+             * operation using this locale, if that actually happens.  Most
+             * programs don't use locales, so they are immune to bad ones */
+            if (IN_LC(LC_CTYPE)) {
+
+                /* We have to save 'newctype' because the setlocale() just
+                 * below may destroy it.  The next setlocale() further down
+                 * should restore it properly so that the intermediate change
+                 * here is transparent to this function's caller */
+                const char * const badlocale = savepv(newctype);
+
+                setlocale(LC_CTYPE, "C");
+
+                /* The '0' below suppresses a bogus gcc compiler warning */
+                Perl_warner(aTHX_ packWARN(WARN_LOCALE), 
SvPVX(PL_warn_locale), 0);
+                setlocale(LC_CTYPE, badlocale);
+                SvREFCNT_dec_NN(PL_warn_locale);
+                PL_warn_locale = NULL;
+            }
         }
     }
 
diff --git a/perl.c b/perl.c
index be9932d..2ebc4f7 100644
--- a/perl.c
+++ b/perl.c
@@ -1040,6 +1040,7 @@ perl_destruct(pTHXx)
     SvREFCNT_dec(PL_Latin1);
     SvREFCNT_dec(PL_NonL1NonFinalFold);
     SvREFCNT_dec(PL_HasMultiCharFold);
+    SvREFCNT_dec(PL_warn_locale);
     PL_utf8_mark       = NULL;
     PL_utf8_toupper    = NULL;
     PL_utf8_totitle    = NULL;
@@ -1051,6 +1052,7 @@ perl_destruct(pTHXx)
     PL_AboveLatin1       = NULL;
     PL_InBitmap          = NULL;
     PL_HasMultiCharFold  = NULL;
+    PL_warn_locale       = NULL;
     PL_Latin1            = NULL;
     PL_NonL1NonFinalFold = NULL;
     PL_UpperLatin1       = NULL;
diff --git a/perl.h b/perl.h
index a3f63b0..89a7d43 100644
--- a/perl.h
+++ b/perl.h
@@ -5779,6 +5779,48 @@ typedef struct am_table_short AMTS;
 #   define IN_LC(category)  \
                     (IN_LC_COMPILETIME(category) || IN_LC_RUNTIME(category))
 
+#   if defined (PERL_CORE) || defined (PERL_IN_XSUB_RE)
+
+        /* This internal macro should be called from places that operate under
+         * locale rules.  It there is a problem with the current locale that
+         * hasn't been raised yet, it will output a warning this time */
+#       define _CHECK_AND_WARN_PROBLEMATIC_LOCALE                           \
+       STMT_START {                                                        \
+            if (PL_warn_locale) {                                           \
+                /*GCC_DIAG_IGNORE(-Wformat-security);   Didn't work */      \
+                Perl_ck_warner(aTHX_ packWARN(WARN_LOCALE),                 \
+                                     SvPVX(PL_warn_locale),                 \
+                                     0 /* dummy to avoid comp warning */ ); \
+                /* GCC_DIAG_RESTORE; */                                     \
+                SvREFCNT_dec_NN(PL_warn_locale);                            \
+                PL_warn_locale = NULL;                                      \
+            }                                                               \
+        }  STMT_END
+
+
+    /* These two internal macros are called when a warning should be raised,
+     * and will do so if enabled.  The first takes a single code point
+     * argument; the 2nd, is a pointer to the first byte of the UTF-8 encoded
+     * string, and an end position which it won't try to read past */
+#   define _CHECK_AND_OUTPUT_WIDE_LOCALE_CP_MSG(cp)                         \
+      Perl_ck_warner(aTHX_ packWARN(WARN_LOCALE),                           \
+             "Wide character (U+%"UVXf") in %s", (UV) cp, OP_DESC(PL_op));
+
+#  define _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(s, send)                   \
+       STMT_START { /* Check if to warn before doing the conversion work */\
+            if (ckWARN(WARN_LOCALE)) {                                      \
+                UV cp = utf8_to_uvchr_buf((U8 *) s, (U8 *) send, NULL);     \
+                Perl_warner(aTHX_ packWARN(WARN_LOCALE),                    \
+                    "Wide character (U+%"UVXf") in %s",                     \
+                    (cp == 0)                                               \
+                     ? UNICODE_REPLACEMENT                                  \
+                     : (UV) cp,                                             \
+                    OP_DESC(PL_op));                                        \
+            }                                                               \
+        }  STMT_END
+
+#   endif   /* PERL_CORE or PERL_IN_XSUB_RE */
+
 #else   /* No locale usage */
 #   define IN_LOCALE_RUNTIME                0
 #   define IN_SOME_LOCALE_FORM_RUNTIME      0
@@ -5793,6 +5835,10 @@ typedef struct am_table_short AMTS;
 #   define IN_LC_COMPILETIME(category)      0
 #   define IN_LC_RUNTIME(category)          0
 #   define IN_LC(category)                  0
+
+#   define _CHECK_AND_WARN_PROBLEMATIC_LOCALE
+#   define _CHECK_AND_OUTPUT_WIDE_LOCALE_CP_MSG(a)
+#   define _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(a,b)
 #endif
 
 #ifdef USE_LOCALE_NUMERIC
diff --git a/pod/perldebguts.pod b/pod/perldebguts.pod
index 8b90342..57fa1f4 100644
--- a/pod/perldebguts.pod
+++ b/pod/perldebguts.pod
@@ -593,6 +593,7 @@ will be lost.
  CANY            no         Match any one byte.
  ANYOF           sv 1       Match character in (or not in) this class,
                             single char match only
+ ANYOFL          sv 1       Like ANYOF, but /l is in effect
 
  # POSIX Character Classes:
  POSIXD          none       Some [[:class:]] under /d; the FLAGS field
@@ -626,6 +627,7 @@ will be lost.
  # Literals
 
  EXACT           str        Match this string (preceded by length).
+ EXACTL          str        Like EXACT, but /l is in effect.
  EXACTF          str        Match this non-UTF-8 string (not guaranteed
                             to be folded) using /id rules (w/len).
  EXACTFL         str        Match this string (not guaranteed to be
@@ -635,9 +637,13 @@ will be lost.
                             UTF-8) using /iu rules (w/len).
  EXACTFA         str        Match this string (not guaranteed to be
                             folded) using /iaa rules (w/len).
+
  EXACTFU_SS      str        Match this string (folded iff in UTF-8,
                             length in folding may change even if not in
                             UTF-8) using /iu rules (w/len).
+ EXACTFLU8       str        Rare cirucmstances: like EXACTFU, but is
+                            under /l, UTF-8, folded, and everything in
+                            it is above 255.
  EXACTFA_NO_TRIE str        Match this string (which is not trie-able;
                             not guaranteed to be folded) using /iaa
                             rules (w/len).
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index 6eecc00..6a830b9 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -209,6 +209,10 @@ XXX L<message|perldiag/"message">
 
 XXX L<message|perldiag/"message">
 
+=item *
+
+L<Wide character (U+%X) in %s|perldiag/"Wide character (U+%X) in %s">
+
 =back
 
 =head2 Changes to Existing Diagnostics
@@ -221,6 +225,15 @@ XXX Changes (i.e. rewording) of diagnostic messages go here
 
 XXX Describe change here
 
+The message
+L<Locale '%s' may not work well.%s|perldiag/"Locale '%s' may not work well.%s">
+is no longer raised unless the problemtatic locale is actually used in
+the Perl program.  Previously it was raised if it merely was the
+underlying locale.  All Perl programs have an underlying locale at all
+times, but something like a C<S<use locale>> is needed for that locale
+to actually have some effect.  This message will not be raised when
+the underlying locale is hidden.
+
 =back
 
 =head1 Utility Changes
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 1c845dd..4979da2 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -2982,16 +2982,16 @@ likely fix this error.
 
 =item Locale '%s' may not work well.%s
 
-(W locale) The named locale that Perl is now trying to use is not fully
-compatible with Perl.  The second C<%s> gives a reason.
+(W locale) You are using the named locale, which is a non-UTF-8 one, and
+which Perl has determined is not fully compatible with Perl.  The second
+C<%s> gives a reason.
 
 By far the most common reason is that the locale has characters in it
 that are represented by more than one byte.  The only such locales that
 Perl can handle are the UTF-8 locales.  Most likely the specified locale
 is a non-UTF-8 one for an East Asian language such as Chinese or
 Japanese.  If the locale is a superset of ASCII, the ASCII portion of it
-may work in Perl.  Read on for problems when it isn't a superset of
-ASCII.
+may work in Perl.
 
 Some essentially obsolete locales that aren't supersets of ASCII, mainly
 those in ISO 646 or other 7-bit locales, such as ASMO 449, can also have
@@ -2999,6 +2999,18 @@ problems, depending on what portions of the ASCII 
character set get
 changed by the locale and are also used by the program.
 The warning message lists the determinable conflicting characters.
 
+Note that not all incompatibilities are found.
+
+If this happens to you, there's not much you can do except switch to use a
+different locale or use L<Encode> to translate from the locale into
+UTF-8; if that's impracticable, you have been warned that some things
+may break.
+
+This message is output once each time a bad locale is switched into
+within the scope of C<S<use locale>>, or on the first possibly-affected
+operation if the C<S<use locale>> inherits a bad one.  It is not raised
+for any operations from the L<POSIX> module.
+
 =item localtime(%f) failed
 
 (W overflow) You called C<localtime> with a number that it could not handle:
@@ -6952,6 +6964,20 @@ warning is to add C<no warnings 'utf8';> but that is 
often closer to
 cheating.  In general, you are supposed to explicitly mark the
 filehandle with an encoding, see L<open> and L<perlfunc/binmode>.
 
+=item Wide character (U+%X) in %s
+
+(W locale) While in a single-byte locale (I<i.e.>, a non-UTF-8
+one), a multi-byte character was encountered.   Perl considers this
+character to be the specified Unicode code point.  Combining non-UTF8
+locales and Unicode is dangerous.  Almost certainly some characters
+will have two different representations.  For example, in the ISO 8859-7
+(Greek) locale, the code point 0xC3 represents a Capital Gamma.  But so
+also does 0x393.  This will make string comparisons unreliable.
+
+You likely need to figure out how this multi-byte character got mixed up
+with your single-byte locale (or perhaps you thought you had a UTF-8
+locale, but Perl disagrees).
+
 =item Within []-length '%c' not allowed
 
 (F) The count in the (un)pack template may be replaced by C<[TEMPLATE]>
diff --git a/pod/perllocale.pod b/pod/perllocale.pod
index d083c09..3b2d79d 100644
--- a/pod/perllocale.pod
+++ b/pod/perllocale.pod
@@ -298,8 +298,8 @@ C<ucfirst()>, and C<lcfirst()>) use C<LC_CTYPE>
 
 =item *
 
-The variables L<$!|perlvar/$ERRNO> (and its synonyms C<$ERRNO> and
-C<$OS_ERROR>) and L<$^E|perlvar/$EXTENDED_OS_ERROR> (and its synonym
+B<The variables L<$!|perlvar/$ERRNO>> (and its synonyms C<$ERRNO> and
+C<$OS_ERROR>) B<and L<$^E|perlvar/$EXTENDED_OS_ERROR>> (and its synonym
 C<$EXTENDED_OS_ERROR>) when used as strings use C<LC_MESSAGES>.
 
 =back
@@ -755,7 +755,7 @@ alphabets, but where do "E<aacute>" and "E<aring>" belong?  
And while
 "color" follows "chocolate" in English, what about in traditional Spanish?
 
 The following collations all make sense and you may meet any of them
-if you "use locale".
+if you C<"use locale">.
 
        A B C D E a b c d e
        A a B b C c D d E e
@@ -792,7 +792,7 @@ C<$equal_in_locale> will be true if the collation locale 
specifies a
 dictionary-like ordering that ignores space characters completely and
 which folds case.
 
-Perl only supports single-byte locales for C<LC_COLLATE>.  This means
+Perl currently only supports single-byte locales for C<LC_COLLATE>.  This means
 that a UTF-8 locale likely will just give you machine-native ordering.
 Use L<Unicode::Collate> for the full implementation of the Unicode
 Collation Algorithm.
@@ -1005,7 +1005,7 @@ results.  Here are a few possibilities:
 
 Regular expression checks for safe file names or mail addresses using
 C<\w> may be spoofed by an C<LC_CTYPE> locale that claims that
-characters such as "E<gt>" and "|" are alphanumeric.
+characters such as C<"E<gt>"> and C<"|"> are alphanumeric.
 
 =item *
 
@@ -1466,9 +1466,12 @@ the characters in the upper half of the Latin-1 range 
(128 - 255)
 properly under C<LC_CTYPE>.  To see if a character is a particular type
 under a locale, Perl uses the functions like C<isalnum()>.  Your C
 library may not work for UTF-8 locales with those functions, instead
-only working under the newer wide library functions like C<iswalnum()>.
-However, they are treated like single-byte locales, and will have the
-restrictions described below.
+only working under the newer wide library functions like C<iswalnum()>,
+which Perl does not use.
+These multi-byte locales are treated like single-byte locales, and will
+have the restrictions described below.  Starting in Perl v5.22 a warning
+message is raised when Perl detects a multi-byte locale that it doesn't
+fully support.
 
 For single-byte locales,
 Perl generally takes the tack to use locale rules on code points that can fit
@@ -1488,7 +1491,7 @@ Unicode, C<\p{Alpha}> will never match it, regardless of 
locale.  A similar
 issue occurs with C<\N{...}>.  Prior to v5.20, It is therefore a bad
 idea to use C<\p{}> or
 C<\N{}> under plain C<use locale>--I<unless> you can guarantee that the
-locale will be a ISO8859-1.  Use POSIX character classes instead.
+locale will be ISO8859-1.  Use POSIX character classes instead.
 
 Another problem with this approach is that operations that cross the
 single byte/multiple byte boundary are not well-defined, and so are
@@ -1516,6 +1519,11 @@ Still another problem is that this approach can lead to 
two code
 points meaning the same character.  Thus in a Greek locale, both U+03A7
 and U+00D7 are GREEK CAPITAL LETTER CHI.
 
+Because of all these problems, starting in v5.22, Perl will raise a
+warning if a multi-byte (hence Unicode) code point is used when a
+single-byte locale is in effect.  (Although it doesn't check for this if
+doing so would unreasonably slow execution down.)
+
 Vendor locales are notoriously buggy, and it is difficult for Perl to test
 its locale-handling code because this interacts with code that Perl has no
 control over; therefore the locale-handling code in Perl may be buggy as
@@ -1541,8 +1549,8 @@ Pre-v5.12, it was somewhat haphazard; in v5.12 it was 
applied fairly
 consistently to regular expression matching except for bracketed
 character classes; in v5.14 it was extended to all regex matches; and in
 v5.16 to the casing operations such as C<\L> and C<uc()>.  For
-collation, in all releases, the system's C<strxfrm()> function is called,
-and whatever it does is what you get.
+collation, in all releases so far, the system's C<strxfrm()> function is
+called, and whatever it does is what you get.
 
 =head1 BUGS
 
diff --git a/pp.c b/pp.c
index 182fa71..08e0999 100644
--- a/pp.c
+++ b/pp.c
@@ -3588,23 +3588,27 @@ PP(pp_ucfirst)
        if (op_type == OP_LCFIRST) {
 
            /* lower case the first letter: no trickiness for any character */
-            *tmpbuf =
 #ifdef USE_LOCALE_CTYPE
-                      (IN_LC_RUNTIME(LC_CTYPE))
-                      ? toLOWER_LC(*s)
-                      :
+            if (IN_LC_RUNTIME(LC_CTYPE)) {
+                _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+                *tmpbuf = toLOWER_LC(*s);
+            }
+            else
 #endif
-                         (IN_UNI_8_BIT)
-                         ? toLOWER_LATIN1(*s)
-                         : toLOWER(*s);
+            {
+                *tmpbuf = (IN_UNI_8_BIT)
+                          ? toLOWER_LATIN1(*s)
+                          : toLOWER(*s);
+            }
        }
-       /* is ucfirst() */
 #ifdef USE_LOCALE_CTYPE
+       /* is ucfirst() */
        else if (IN_LC_RUNTIME(LC_CTYPE)) {
             if (IN_UTF8_CTYPE_LOCALE) {
                 goto do_uni_rules;
             }
 
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
             *tmpbuf = (U8) toUPPER_LC(*s); /* This would be a bug if any
                                               locales have upper and title case
                                               different */
@@ -3909,6 +3913,7 @@ PP(pp_uc)
                 if (IN_UTF8_CTYPE_LOCALE) {
                     goto do_uni_rules;
                 }
+                _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
                for (; s < send; d++, s++)
                     *d = (U8) toUPPER_LC(*s);
            }
@@ -4116,6 +4121,7 @@ PP(pp_lc)
             * whole thing in a tight loop, for speed, */
 #ifdef USE_LOCALE_CTYPE
             if (IN_LC_RUNTIME(LC_CTYPE)) {
+                _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
                for (; s < send; d++, s++)
                    *d = toLOWER_LC(*s);
             }
@@ -4298,6 +4304,7 @@ PP(pp_fc)
             if (IN_UTF8_CTYPE_LOCALE) {
                 goto do_uni_folding;
             }
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
             for (; s < send; d++, s++)
                 *d = (U8) toFOLD_LC(*s);
         }
diff --git a/proto.h b/proto.h
index cf8e93d..ccd768f 100644
--- a/proto.h
+++ b/proto.h
@@ -7914,14 +7914,13 @@ STATIC bool     S_isa_lookup(pTHX_ HV *stash, const 
char * const name, STRLEN len, U
 
 #endif
 #if defined(PERL_IN_UTF8_C)
-STATIC UV      S_check_locale_boundary_crossing(pTHX_ const char * const 
func_name, const U8* const p, const UV result, U8* const ustrp, STRLEN *lenp)
+STATIC UV      S_check_locale_boundary_crossing(pTHX_ const U8* const p, const 
UV result, U8* const ustrp, STRLEN *lenp)
                        __attribute__warn_unused_result__
                        __attribute__nonnull__(pTHX_1)
-                       __attribute__nonnull__(pTHX_2)
-                       __attribute__nonnull__(pTHX_4)
-                       __attribute__nonnull__(pTHX_5);
+                       __attribute__nonnull__(pTHX_3)
+                       __attribute__nonnull__(pTHX_4);
 #define PERL_ARGS_ASSERT_CHECK_LOCALE_BOUNDARY_CROSSING        \
-       assert(func_name); assert(p); assert(ustrp); assert(lenp)
+       assert(p); assert(ustrp); assert(lenp)
 
 PERL_STATIC_INLINE bool        S_is_utf8_common(pTHX_ const U8 *const p, SV 
**swash, const char * const swashname, SV* const invlist)
                        __attribute__warn_unused_result__
diff --git a/regcomp.c b/regcomp.c
index a58080b..905d41b 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -1044,13 +1044,13 @@ S_ssc_init(pTHX_ const RExC_state_t *pRExC_state, 
regnode_ssc *ssc)
     ARG_SET(ssc, ANYOF_ONLY_HAS_BITMAP);
     ssc_anything(ssc);
 
-    /* If any portion of the regex is to operate under locale rules,
-     * initialization includes it.  The reason this isn't done for all regexes
-     * is that the optimizer was written under the assumption that locale was
-     * all-or-nothing.  Given the complexity and lack of documentation in the
-     * optimizer, and that there are inadequate test cases for locale, many
-     * parts of it may not work properly, it is safest to avoid locale unless
-     * necessary. */
+    /* If any portion of the regex is to operate under locale rules that aren't
+     * fully known at compile time, initialization includes it.  The reason
+     * this isn't done for all regexes is that the optimizer was written under
+     * the assumption that locale was all-or-nothing.  Given the complexity and
+     * lack of documentation in the optimizer, and that there are inadequate
+     * test cases for locale, many parts of it may not work properly, it is
+     * safest to avoid locale unless necessary. */
     if (RExC_contains_locale) {
        ANYOF_POSIXL_SETALL(ssc);
     }
@@ -1879,7 +1879,7 @@ S_dump_trie_interim_table(pTHX_ const struct 
_reg_trie_data *trie,
               May be the same as tail.
   tail       : item following the branch sequence
   count      : words in the sequence
-  flags      : currently the OP() type we will be building one of 
/EXACT(|F|FA|FU|FU_SS)/
+  flags      : currently the OP() type we will be building one of 
/EXACT(|F|FA|FU|FU_SS|L|FLU8)/
   depth      : indent depth
 
 Inplace optimizes a sequence of 2 or more Branch-Exact nodes into a TRIE node.
@@ -2143,10 +2143,11 @@ S_make_trie(pTHX_ RExC_state_t *pRExC_state, regnode 
*startbranch,
 #endif
 
     switch (flags) {
-        case EXACT: break;
+        case EXACT: case EXACTL: break;
        case EXACTFA:
         case EXACTFU_SS:
-       case EXACTFU: folder = PL_fold_latin1; break;
+       case EXACTFU:
+       case EXACTFLU8: folder = PL_fold_latin1; break;
        case EXACTF:  folder = PL_fold; break;
         default: Perl_croak( aTHX_ "panic! In trie construction, unknown node 
type %u %s", (unsigned) flags, PL_reg_name[flags] );
     }
@@ -2157,7 +2158,7 @@ S_make_trie(pTHX_ RExC_state_t *pRExC_state, regnode 
*startbranch,
     trie->wordcount = word_count;
     RExC_rxi->data->data[ data_slot ] = (void*)trie;
     trie->charmap = (U16 *) PerlMemShared_calloc( 256, sizeof(U16) );
-    if (flags == EXACT)
+    if (flags == EXACT || flags == EXACTL)
        trie->bitmap = (char *) PerlMemShared_calloc( ANYOF_BITMAP_SIZE, 1 );
     trie->wordinfo = (reg_trie_wordinfo *) PerlMemShared_calloc(
                        trie->wordcount+1, sizeof(reg_trie_wordinfo));
@@ -3201,7 +3202,7 @@ S_construct_ahocorasick_from_trie(pTHX_ RExC_state_t 
*pRExC_state, regnode *sour
         StructCopy(source,op,struct regnode_charclass);
         stclass = (regnode *)op;
     }
-    OP(stclass)+=2; /* covert the TRIE type to its AHO-CORASICK equivalent */
+    OP(stclass)+=2; /* convert the TRIE type to its AHO-CORASICK equivalent */
 
     ARG_SET( stclass, data_slot );
     aho = (reg_ac_data *) PerlMemShared_calloc( 1, sizeof(reg_ac_data) );
@@ -3500,7 +3501,7 @@ S_join_exact(pTHX_ RExC_state_t *pRExC_state, regnode 
*scan,
      * this final joining, sequences could have been split over boundaries, and
      * hence missed).  The sequences only happen in folding, hence for any
      * non-EXACT EXACTish node */
-    if (OP(scan) != EXACT) {
+    if (OP(scan) != EXACT && OP(scan) != EXACTL) {
         U8* s0 = (U8*) STRING(scan);
         U8* s = s0;
         U8* s_end = s0 + STR_LEN(scan);
@@ -4148,14 +4149,24 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode 
**scanp,
                                 EXACTFU         | EXACTFU
                                 EXACTFU_SS      | EXACTFU
                                 EXACTFA         | EXACTFA
+                                EXACTL          | EXACTL
+                                EXACTFLU8       | EXACTFLU8
 
 
                         */
-#define TRIE_TYPE(X) ( ( NOTHING == (X) ) ? NOTHING :   \
-                       ( EXACT == (X) )   ? EXACT :        \
-                       ( EXACTFU == (X) || EXACTFU_SS == (X) ) ? EXACTFU :     
   \
-                       ( EXACTFA == (X) ) ? EXACTFA :        \
-                       0 )
+#define TRIE_TYPE(X) ( ( NOTHING == (X) )                                   \
+                       ? NOTHING                                            \
+                       : ( EXACT == (X) )                                   \
+                         ? EXACT                                            \
+                         : ( EXACTFU == (X) || EXACTFU_SS == (X) )          \
+                           ? EXACTFU                                        \
+                           : ( EXACTFA == (X) )                             \
+                             ? EXACTFA                                      \
+                             : ( EXACTL == (X) )                            \
+                               ? EXACTL                                     \
+                               : ( EXACTFLU8 == (X) )                        \
+                                 ? EXACTFLU8                                 \
+                                 : 0 )
 
                         /* dont use tail as the end marker for this traverse */
                         for ( cur = startbranch ; cur != scan ; cur = regnext( 
cur ) ) {
@@ -4471,7 +4482,7 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode 
**scanp,
                continue;
            }
        }
-       else if (OP(scan) == EXACT) {
+       else if (OP(scan) == EXACT || OP(scan) == EXACTL) {
            SSize_t l = STR_LEN(scan);
            UV uc;
            if (UTF) {
@@ -4589,7 +4600,10 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode 
**scanp,
            case PLUS:
                if (flags & (SCF_DO_SUBSTR | SCF_DO_STCLASS)) {
                    next = NEXTOPER(scan);
-                   if (OP(next) == EXACT || (flags & SCF_DO_STCLASS)) {
+                   if (OP(next) == EXACT
+                        || OP(next) == EXACTL
+                        || (flags & SCF_DO_STCLASS))
+                    {
                        mincount = 1;
                        maxcount = REG_INFTY;
                        next = regnext(scan);
@@ -5070,6 +5084,7 @@ PerlIO_printf(Perl_debug_log, "LHS=%"UVuf" RHS=%"UVuf"\n",
                    }
                    break;
 
+                case ANYOFL:
                 case ANYOF:
                    if (flags & SCF_DO_STCLASS_AND)
                        ssc_and(pRExC_state, data->start_class,
@@ -6982,7 +6997,7 @@ reStudy:
         DEBUG_PEEP("first:",first,0);
         /* Ignore EXACT as we deal with it later. */
        if (PL_regkind[OP(first)] == EXACT) {
-           if (OP(first) == EXACT)
+           if (OP(first) == EXACT || OP(first) == EXACTL)
                NOOP;   /* Empty, get anchored substr later. */
            else
                ri->regstclass = first;
@@ -7332,7 +7347,7 @@ reStudy:
                  && OP(regnext(first)) == END)
             r->extflags |= RXf_WHITE;
         else if ( r->extflags & RXf_SPLIT
-                  && fop == EXACT
+                  && (fop == EXACT || fop == EXACTL)
                   && STR_LEN(first) == 1
                   && *(STRING(first)) == ' '
                   && OP(regnext(first)) == END )
@@ -11348,7 +11363,9 @@ S_compute_EXACTish(RExC_state_t *pRExC_state)
     PERL_ARGS_ASSERT_COMPUTE_EXACTISH;
 
     if (! FOLD) {
-        return EXACT;
+        return (LOC)
+                ? EXACTL
+                : EXACT;
     }
 
     op = get_regex_charset(RExC_flags);
@@ -11446,7 +11463,9 @@ S_alloc_maybe_populate_EXACT(pTHX_ RExC_state_t 
*pRExC_state,
                                                for those.  */
                     && ! _invlist_contains_cp(PL_utf8_foldable, code_point))
                 {
-                    OP(node) = EXACT;
+                    OP(node) = (LOC)
+                               ? EXACTL
+                               : EXACT;
                 }
             }
             else if (code_point <= MAX_UTF8_TWO_BYTE) {
@@ -12481,7 +12500,7 @@ tryagain:
                     /* Here, are folding and are not UTF-8 encoded; therefore
                      * the character must be in the range 0-255, and is not /l
                      * (Not /l because we already handled these under /l in
-                     * is_PROBLEMATIC_LOCALE_FOLD_cp */
+                     * is_PROBLEMATIC_LOCALE_FOLD_cp) */
                     if (IS_IN_SOME_FOLD_L1(ender)) {
                         maybe_exact = FALSE;
 
@@ -12747,10 +12766,14 @@ tryagain:
                      * differently depending on UTF8ness of the target string
                      * (for /u), or depending on locale for /l */
                     if (maybe_exact) {
-                        OP(ret) = EXACT;
+                        OP(ret) = (LOC)
+                                  ? EXACTL
+                                  : EXACT;
                     }
                     else if (maybe_exactfu) {
-                        OP(ret) = EXACTFU;
+                        OP(ret) = (LOC)
+                                  ? EXACTFLU8
+                                  : EXACTFU;
                     }
                 }
                 alloc_maybe_populate_EXACT(pRExC_state, ret, flagp, len, ender,
@@ -13804,7 +13827,11 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
     DEBUG_PARSE("clas");
 
     /* Assume we are going to generate an ANYOF node. */
-    ret = reganode(pRExC_state, ANYOF, 0);
+    ret = reganode(pRExC_state,
+                   (LOC)
+                    ? ANYOFL
+                    : ANYOF,
+                   0);
 
     if (SIZE_ONLY) {
        RExC_size += ANYOF_SKIP;
@@ -15283,7 +15310,9 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
                 value = start;
 
                 if (! FOLD) {
-                    op = EXACT;
+                    op = (LOC)
+                         ? EXACTL
+                         : EXACT;
                 }
                 else if (LOC) {
 
@@ -15996,10 +16025,12 @@ S_regtail_study(pTHX_ RExC_state_t *pRExC_state, 
regnode *p,
         if ( exact ) {
             switch (OP(scan)) {
                 case EXACT:
+                case EXACTL:
                 case EXACTF:
                 case EXACTFA_NO_TRIE:
                 case EXACTFA:
                 case EXACTFU:
+                case EXACTFLU8:
                 case EXACTFU_SS:
                 case EXACTFL:
                         if( exact == PSEUDO )
@@ -16413,7 +16444,7 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const 
regnode *o, const regmatch_
         SV* bitmap_invlist;  /* Will hold what the bit map contains */
 
 
-       if (flags & ANYOF_LOCALE_FLAGS)
+       if (OP(o) == ANYOFL)
            sv_catpvs(sv, "{loc}");
        if (flags & ANYOF_LOC_FOLD)
            sv_catpvs(sv, "{i}");
diff --git a/regcomp.h b/regcomp.h
index 049ac43..a111893 100644
--- a/regcomp.h
+++ b/regcomp.h
@@ -389,7 +389,15 @@ struct regnode_ssc {
  * probably better than that commit anyway.  But it could be reinstated if we
  * need a bit.  The LOC flags are only for /l nodes; the reverted commit was
  * only for /d, so there are no combinatorial issues.  The LOC flag to use is
- * probably the POSIXL one.
+ * probably the POSIXL one.  Now that there is an ANYOFL (locale) node, another
+ * option would be to make all of those include the POSIXL data structure,
+ * which would get rid of needing a separate POSIXL flag.  But it would
+ * increase the size of all such nodes, so it's probably not as atractive as
+ * having an ANYOF_POSIXL node type.  But if we did do it, note that not all 32
+ * bits of that extra space are used, one bit of that could be set aside for
+ * the LOC_FOLD flag, yielding yet another bit.  This would require extra code
+ * for masking, so again not the most attractive solution.
+ *
  * Several flags are not used in synthetic start class (SSC) nodes, so could be
  * shared should new flags be needed for SSCs, like SSC_MATCHES_EMPTY_STRING
  * now. */
diff --git a/regcomp.sym b/regcomp.sym
index 02b278c..c20c5aa 100644
--- a/regcomp.sym
+++ b/regcomp.sym
@@ -58,6 +58,7 @@ REG_ANY     REG_ANY,    no 0 S    ; Match any one character 
(except newline).
 SANY        REG_ANY,    no 0 S    ; Match any one character.
 CANY        REG_ANY,    no 0 S    ; Match any one byte.
 ANYOF       ANYOF,      sv 1 S    ; Match character in (or not in) this class, 
single char match only
+ANYOFL      ANYOF,      sv 1 S    ; Like ANYOF, but /l is in effect
 
 #* POSIX Character Classes:
 # Order of the below is important.  See ordering comment above.
@@ -90,11 +91,16 @@ BRANCH      BRANCH,     node 0 V  ; Match this alternative, 
or the next...
 # NOTE: the relative ordering of these types is important do not change it
 
 EXACT       EXACT,      str       ; Match this string (preceded by length).
+EXACTL      EXACT,      str       ; Like EXACT, but /l is in effect.
 EXACTF      EXACT,      str       ; Match this non-UTF-8 string (not 
guaranteed to be folded) using /id rules (w/len).
 EXACTFL     EXACT,      str       ; Match this string (not guaranteed to be 
folded) using /il rules (w/len).
 EXACTFU     EXACT,      str      ; Match this string (folded iff in UTF-8, 
length in folding doesn't change if not in UTF-8) using /iu rules (w/len).
 EXACTFA     EXACT,      str      ; Match this string (not guaranteed to be 
folded) using /iaa rules (w/len).
+
+# End of important relative ordering.
+
 EXACTFU_SS  EXACT,      str      ; Match this string (folded iff in UTF-8, 
length in folding may change even if not in UTF-8) using /iu rules (w/len).
+EXACTFLU8   EXACT,      str      ; Rare cirucmstances: like EXACTFU, but is 
under /l, UTF-8, folded, and everything in it is above 255.
 EXACTFA_NO_TRIE  EXACT, str      ; Match this string (which is not trie-able; 
not guaranteed to be folded) using /iaa rules (w/len).
 
 #*Do nothing types
diff --git a/regexec.c b/regexec.c
index 1a5eb61..e659f4b 100644
--- a/regexec.c
+++ b/regexec.c
@@ -231,15 +231,15 @@ static const char* const non_utf8_target_but_utf8_required
 
 #if 0 
 /* Currently these are only used when PL_regkind[OP(rn)] == EXACT so
-   we don't need this definition. */
+   we don't need this definition.  XXX These are now out-of-sync*/
 #define IS_TEXT(rn)   ( OP(rn)==EXACT   || OP(rn)==REF   || OP(rn)==NREF   )
 #define IS_TEXTF(rn)  ( OP(rn)==EXACTFU || OP(rn)==EXACTFU_SS || 
OP(rn)==EXACTFA || OP(rn)==EXACTFA_NO_TRIE || OP(rn)==EXACTF || OP(rn)==REFF  
|| OP(rn)==NREFF )
 #define IS_TEXTFL(rn) ( OP(rn)==EXACTFL || OP(rn)==REFFL || OP(rn)==NREFFL )
 
 #else
 /* ... so we use this as its faster. */
-#define IS_TEXT(rn)   ( OP(rn)==EXACT   )
-#define IS_TEXTFU(rn)  ( OP(rn)==EXACTFU || OP(rn)==EXACTFU_SS || OP(rn) == 
EXACTFA || OP(rn) == EXACTFA_NO_TRIE)
+#define IS_TEXT(rn)   ( OP(rn)==EXACT || OP(rn)==EXACTL )
+#define IS_TEXTFU(rn)  ( OP(rn)==EXACTFU || OP(rn)==EXACTFLU8 || 
OP(rn)==EXACTFU_SS || OP(rn) == EXACTFA || OP(rn) == EXACTFA_NO_TRIE)
 #define IS_TEXTF(rn)  ( OP(rn)==EXACTF  )
 #define IS_TEXTFL(rn) ( OP(rn)==EXACTFL )
 
@@ -498,7 +498,7 @@ S_isFOO_utf8_lc(pTHX_ const U8 classnum, const U8* 
character)
      * '_char_class_number'.
      *
      * This just calls isFOO_lc on the code point for the character if it is in
-     * the range 0-255.  Outside that range, all characters avoid Unicode
+     * the range 0-255.  Outside that range, all characters use Unicode
      * rules, ignoring any locale.  So use the Unicode function if this class
      * requires a swash, and use the Unicode macro otherwise. */
 
@@ -512,6 +512,8 @@ S_isFOO_utf8_lc(pTHX_ const U8 classnum, const U8* 
character)
                         TWO_BYTE_UTF8_TO_NATIVE(*character, *(character + 1)));
     }
 
+    _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(character, character + 
UTF8SKIP(character));
+
     if (classnum < _FIRST_NON_SWASH_CC) {
 
         /* Initialize the swash unless done already */
@@ -1433,23 +1435,39 @@ Perl_re_intuit_start(pTHX_
 
 
 #define DECL_TRIE_TYPE(scan) \
-    const enum { trie_plain, trie_utf8, trie_utf8_fold, trie_latin_utf8_fold, \
-                 trie_utf8_exactfa_fold, trie_latin_utf8_exactfa_fold } \
-                    trie_type = ((scan->flags == EXACT) \
-                              ? (utf8_target ? trie_utf8 : trie_plain) \
-                              : (scan->flags == EXACTFA) \
-                                ? (utf8_target ? trie_utf8_exactfa_fold : 
trie_latin_utf8_exactfa_fold) \
-                                : (utf8_target ? trie_utf8_fold : 
trie_latin_utf8_fold))
+    const enum { trie_plain, trie_utf8, trie_utf8_fold, trie_latin_utf8_fold,  
     \
+                 trie_utf8_exactfa_fold, trie_latin_utf8_exactfa_fold,         
     \
+                 trie_utf8l, trie_flu8 }                                       
     \
+                    trie_type = ((scan->flags == EXACT)                        
     \
+                                 ? (utf8_target ? trie_utf8 : trie_plain)      
     \
+                                 : (scan->flags == EXACTL)                     
     \
+                                    ? (utf8_target ? trie_utf8l : trie_plain)  
     \
+                                    : (scan->flags == EXACTFA)                 
     \
+                                      ? (utf8_target                           
     \
+                                         ? trie_utf8_exactfa_fold              
     \
+                                         : trie_latin_utf8_exactfa_fold)       
     \
+                                      : (scan->flags == EXACTFLU8              
     \
+                                         ? trie_flu8                           
     \
+                                         : (utf8_target                        
     \
+                                           ? trie_utf8_fold                    
     \
+                                           :   trie_latin_utf8_fold)))
 
 #define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, 
uvc, charid, foldlen, foldbuf, uniflags) \
 STMT_START {                                                                   
     \
     STRLEN skiplen;                                                            
     \
     U8 flags = FOLD_FLAGS_FULL;                                                
     \
     switch (trie_type) {                                                       
     \
+    case trie_flu8:                                                            
     \
+        _CHECK_AND_WARN_PROBLEMATIC_LOCALE;                                    
     \
+        if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) {                        
     \
+            _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc));     
     \
+        }                                                                      
     \
+        goto do_trie_utf8_fold;                                                
     \
     case trie_utf8_exactfa_fold:                                               
     \
         flags |= FOLD_FLAGS_NOMIX_ASCII;                                       
     \
-        /* FALLTHROUGH */                                                      
    \
+        /* FALLTHROUGH */                                                      
     \
     case trie_utf8_fold:                                                       
     \
+      do_trie_utf8_fold:                                                       
     \
         if ( foldlen>0 ) {                                                     
     \
             uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, 
uniflags ); \
             foldlen -= len;                                                    
     \
@@ -1465,7 +1483,7 @@ STMT_START {
         break;                                                                 
     \
     case trie_latin_utf8_exactfa_fold:                                         
     \
         flags |= FOLD_FLAGS_NOMIX_ASCII;                                       
     \
-        /* FALLTHROUGH */                                                      
    \
+        /* FALLTHROUGH */                                                      
     \
     case trie_latin_utf8_fold:                                                 
     \
         if ( foldlen>0 ) {                                                     
     \
             uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, 
uniflags ); \
@@ -1480,6 +1498,12 @@ STMT_START {
             uscan = foldbuf + skiplen;                                         
     \
         }                                                                      
     \
         break;                                                                 
     \
+    case trie_utf8l:                                                           
     \
+        _CHECK_AND_WARN_PROBLEMATIC_LOCALE;                                    
     \
+        if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) {                        
     \
+            _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc));     
     \
+        }                                                                      
     \
+        /* FALLTHROUGH */                                                      
     \
     case trie_utf8:                                                            
     \
         uvc = utf8n_to_uvchr( (const U8*) uc, UTF8_MAXLEN, &len, uniflags );   
     \
         break;                                                                 
     \
@@ -1739,6 +1763,9 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, 
char *s,
 
     /* We know what class it must start with. */
     switch (OP(c)) {
+    case ANYOFL:
+        _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+        /* FALLTHROUGH */
     case ANYOF:
         if (utf8_target) {
             REXEC_FBC_UTF8_CLASS_SCAN(
@@ -1780,6 +1807,7 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, 
char *s,
         goto do_exactf_non_utf8;
 
     case EXACTFL:
+        _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
         if (is_utf8_pat || utf8_target || IN_UTF8_CTYPE_LOCALE) {
             utf8_fold_flags = FOLDEQ_LOCALE;
             goto do_exactf_utf8;
@@ -1794,6 +1822,15 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, 
char *s,
         }
         goto do_exactf_utf8;
 
+    case EXACTFLU8:
+            if (! utf8_target) {    /* All code points in this node require
+                                       UTF-8 to express.  */
+                break;
+            }
+            utf8_fold_flags =  FOLDEQ_LOCALE | FOLDEQ_S2_ALREADY_FOLDED
+                                             | FOLDEQ_S2_FOLDS_SANE;
+            goto do_exactf_utf8;
+
     case EXACTFU:
         if (is_utf8_pat || utf8_target) {
             utf8_fold_flags = is_utf8_pat ? FOLDEQ_S2_ALREADY_FOLDED : 0;
@@ -1899,9 +1936,11 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, 
char *s,
     }
 
     case BOUNDL:
+        _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
         FBC_BOUND(isWORDCHAR_LC, isWORDCHAR_LC_uvchr, isWORDCHAR_LC_utf8);
         break;
     case NBOUNDL:
+        _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
         FBC_NBOUND(isWORDCHAR_LC, isWORDCHAR_LC_uvchr, isWORDCHAR_LC_utf8);
         break;
     case BOUND:
@@ -1936,6 +1975,7 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, 
char *s,
         /* FALLTHROUGH */
 
     case POSIXL:
+        _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
         REXEC_FBC_CSCAN(to_complement ^ cBOOL(isFOO_utf8_lc(FLAGS(c), (U8 *) 
s)),
                         to_complement ^ cBOOL(isFOO_lc(FLAGS(c), *s)));
         break;
@@ -3648,7 +3688,7 @@ S_setup_EXACTISH_ST_c1_c2(pTHX_ const regnode * const 
text_node, int *c1p,
     U8 *pat = (U8*)STRING(text_node);
     U8 folded[UTF8_MAX_FOLD_CHAR_EXPAND * UTF8_MAXBYTES_CASE + 1] = { '\0' };
 
-    if (OP(text_node) == EXACT) {
+    if (OP(text_node) == EXACT || OP(text_node) == EXACTL) {
 
         /* In an exact node, only one thing can be matched, that first
          * character.  If both the pat and the target are UTF-8, we can just
@@ -4152,6 +4192,19 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
                HV * widecharmap = MUTABLE_HV(rexi->data->data[ ARG( scan ) + 1 
]);
                 U32 state = trie->startstate;
 
+                if (scan->flags == EXACTL || scan->flags == EXACTFLU8) {
+                    _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+                    if (utf8_target
+                        && UTF8_IS_ABOVE_LATIN1(nextchr)
+                        && scan->flags == EXACTL)
+                    {
+                        /* We only output for EXACTL, as we let the folder
+                         * output this message for EXACTFLU8 to avoid
+                         * duplication */
+                        _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(locinput,
+                                                               
reginfo->strend);
+                    }
+                }
                 if (   trie->bitmap
                     && (NEXTCHR_IS_EOS || !TRIE_BITMAP_TEST(trie, nextchr)))
                 {
@@ -4425,6 +4478,19 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
         }
 #undef  ST
 
+       case EXACTL:             /*  /abc/l       */
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+
+            /* Complete checking would involve going through every character
+             * matched by the string to see if any is above latin1.  But the
+             * comparision otherwise might very well be a fast assembly
+             * language routine, and I (khw) don't think slowing things down
+             * just to check for this warning is worth it.  So this just checks
+             * the first character */
+            if (utf8_target && UTF8_IS_ABOVE_LATIN1(*locinput)) {
+                _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(locinput, 
reginfo->strend);
+            }
+            /* FALLTHROUGH */
        case EXACT: {            /*  /abc/        */
            char *s = STRING(scan);
            ln = STR_LEN(scan);
@@ -4511,11 +4577,22 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char 
*startpos, regnode *prog)
            const char * s;
            U32 fold_utf8_flags;
 
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
             folder = foldEQ_locale;
             fold_array = PL_fold_locale;
            fold_utf8_flags = FOLDEQ_LOCALE;
            goto do_exactf;
 
+        case EXACTFLU8:           /*  /abc/il; but all 'abc' are above 255, so
+                                      is effectively /u; hence to match, target
+                                      must be UTF-8. */
+            if (! utf8_target) {
+                sayNO;
+            }
+            fold_utf8_flags =  FOLDEQ_LOCALE | FOLDEQ_S1_ALREADY_FOLDED
+                                             | FOLDEQ_S1_FOLDS_SANE;
+           goto do_exactf;
+
        case EXACTFU_SS:         /*  /\x{df}/iu   */
        case EXACTFU:            /*  /abc/iu      */
            folder = foldEQ_latin1;
@@ -4583,6 +4660,8 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
         * have to set the FLAGS fields of these */
        case BOUNDL:  /*  /\b/l  */
        case NBOUNDL: /*  /\B/l  */
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+            /* FALLTHROUGH */
        case BOUND:   /*  /\b/   */
        case BOUNDU:  /*  /\b/u  */
        case BOUNDA:  /*  /\b/a  */
@@ -4661,7 +4740,10 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
                    sayNO;
            break;
 
-       case ANYOF:  /*  /[abc]/       */
+       case ANYOFL:  /*  /[abc]/l      */
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+            /* FALLTHROUGH */
+       case ANYOF:  /*   /[abc]/       */
             if (NEXTCHR_IS_EOS)
                 sayNO;
            if (utf8_target) {
@@ -4685,6 +4767,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
             /* FALLTHROUGH */
 
         case POSIXL:    /* \w or [:punct:] etc. under /l */
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
             if (NEXTCHR_IS_EOS)
                 sayNO;
 
@@ -4705,7 +4788,8 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
                 }
             }
             else { /* Here, must be an above Latin-1 code point */
-                goto utf8_posix_not_eos;
+                _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(locinput, 
reginfo->strend);
+                goto utf8_posix_above_latin1;
             }
 
             /* Here, must be utf8 */
@@ -4764,7 +4848,6 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
             if (NEXTCHR_IS_EOS) {
                 sayNO;
             }
-          utf8_posix_not_eos:
 
             /* Use _generic_isCC() for characters within Latin1.  (Note that
              * UTF8_IS_INVARIANT works even on non-UTF-8 strings, or else
@@ -4788,6 +4871,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
                 locinput += 2;
             }
             else {  /* Handle above Latin-1 code points */
+          utf8_posix_above_latin1:
                 classnum = (_char_class_number) FLAGS(scan);
                 if (classnum < _FIRST_NON_SWASH_CC) {
 
@@ -5061,6 +5145,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
            const U8 *fold_array;
            UV utf8_fold_flags;
 
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
            folder = foldEQ_locale;
            fold_array = PL_fold_locale;
            type = REFFL;
@@ -5105,6 +5190,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, 
regnode *prog)
            goto do_nref_ref_common;
 
        case REFFL:  /*  /\1/il  */
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
            folder = foldEQ_locale;
            fold_array = PL_fold_locale;
            utf8_fold_flags = FOLDEQ_LOCALE;
@@ -7174,6 +7260,12 @@ S_regrepeat(pTHX_ regexp *prog, char **startposp, const 
regnode *p,
             scan = loceol;
         }
        break;
+    case EXACTL:
+        _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+        if (utf8_target && UTF8_IS_ABOVE_LATIN1(*scan)) {
+            _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(scan, loceol);
+        }
+        /* FALLTHROUGH */
     case EXACT:
         assert(STR_LEN(p) == reginfo->is_utf8_pat ? UTF8SKIP(STRING(p)) : 1);
 
@@ -7247,6 +7339,7 @@ S_regrepeat(pTHX_ regexp *prog, char **startposp, const 
regnode *p,
        goto do_exactf;
 
     case EXACTFL:
+        _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
        utf8_flags = FOLDEQ_LOCALE;
        goto do_exactf;
 
@@ -7255,6 +7348,14 @@ S_regrepeat(pTHX_ regexp *prog, char **startposp, const 
regnode *p,
         utf8_flags = 0;
         goto do_exactf;
 
+    case EXACTFLU8:
+        if (! utf8_target) {
+            break;
+        }
+        utf8_flags =  FOLDEQ_LOCALE | FOLDEQ_S2_ALREADY_FOLDED
+                                    | FOLDEQ_S2_FOLDS_SANE;
+        goto do_exactf;
+
     case EXACTFU_SS:
     case EXACTFU:
        utf8_flags = reginfo->is_utf8_pat ? FOLDEQ_S2_ALREADY_FOLDED : 0;
@@ -7318,6 +7419,9 @@ S_regrepeat(pTHX_ regexp *prog, char **startposp, const 
regnode *p,
        }
        break;
     }
+    case ANYOFL:
+        _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+        /* FALLTHROUGH */
     case ANYOF:
        if (utf8_target) {
            while (hardcount < max
@@ -7340,6 +7444,7 @@ S_regrepeat(pTHX_ regexp *prog, char **startposp, const 
regnode *p,
         /* FALLTHROUGH */
 
     case POSIXL:
+        _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
        if (! utf8_target) {
            while (scan < loceol && to_complement ^ cBOOL(isFOO_lc(FLAGS(p),
                                                                    *scan)))
@@ -7559,16 +7664,18 @@ S_regrepeat(pTHX_ regexp *prog, char **startposp, const 
regnode *p,
        }
        break;
 
+    case BOUNDL:
+    case NBOUNDL:
+        _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+        /* FALLTHROUGH */
     case BOUND:
     case BOUNDA:
-    case BOUNDL:
     case BOUNDU:
     case EOS:
     case GPOS:
     case KEEPS:
     case NBOUND:
     case NBOUNDA:
-    case NBOUNDL:
     case NBOUNDU:
     case OPFAIL:
     case SBOL:
@@ -7627,7 +7734,7 @@ Perl_regclass_swash(pTHX_ const regexp *prog, const 
regnode* node, bool doinit,
 /*
  - reginclass - determine if a character falls into a character class
  
-  n is the ANYOF regnode
+  n is the ANYOF-type regnode
   p is the target string
   p_end points to one byte beyond the end of the target string
   utf8_target tells whether p is in UTF-8.
@@ -7661,6 +7768,9 @@ S_reginclass(pTHX_ regexp * const prog, const regnode * 
const n, const U8* const
                 * UTF8_ALLOW_FFFF */
        if (c_len == (STRLEN)-1)
            Perl_croak(aTHX_ "Malformed UTF-8 character (fatal)");
+        if (c > 255 && OP(n) == ANYOFL && ! is_ANYOF_SYNTHETIC(n)) {
+            _CHECK_AND_OUTPUT_WIDE_LOCALE_CP_MSG(c);
+        }
     }
 
     /* If this character is potentially in the bitmap, check it */
diff --git a/regnodes.h b/regnodes.h
index 41662a0..94616a6 100644
--- a/regnodes.h
+++ b/regnodes.h
@@ -6,8 +6,8 @@
 
 /* Regops and State definitions */
 
-#define REGNODE_MAX            90
-#define REGMATCH_STATE_MAX     130
+#define REGNODE_MAX            93
+#define REGMATCH_STATE_MAX     133
 
 #define        END                     0       /* 0000 End of program. */
 #define        SUCCEED                 1       /* 0x01 Return from a 
subroutine, basically. */
@@ -31,77 +31,80 @@
 #define        SANY                    17      /* 0x11 Match any one 
character. */
 #define        CANY                    18      /* 0x12 Match any one byte. */
 #define        ANYOF                   19      /* 0x13 Match character in (or 
not in) this class, single char match only */
-#define        POSIXD                  20      /* 0x14 Some [[:class:]] under 
/d; the FLAGS field gives which one */
-#define        POSIXL                  21      /* 0x15 Some [[:class:]] under 
/l; the FLAGS field gives which one */
-#define        POSIXU                  22      /* 0x16 Some [[:class:]] under 
/u; the FLAGS field gives which one */
-#define        POSIXA                  23      /* 0x17 Some [[:class:]] under 
/a; the FLAGS field gives which one */
-#define        NPOSIXD                 24      /* 0x18 complement of POSIXD, 
[[:^class:]] */
-#define        NPOSIXL                 25      /* 0x19 complement of POSIXL, 
[[:^class:]] */
-#define        NPOSIXU                 26      /* 0x1a complement of POSIXU, 
[[:^class:]] */
-#define        NPOSIXA                 27      /* 0x1b complement of POSIXA, 
[[:^class:]] */
-#define        CLUMP                   28      /* 0x1c Match any extended 
grapheme cluster sequence */
-#define        BRANCH                  29      /* 0x1d Match this alternative, 
or the next... */
-#define        EXACT                   30      /* 0x1e Match this string 
(preceded by length). */
-#define        EXACTF                  31      /* 0x1f Match this non-UTF-8 
string (not guaranteed to be folded) using /id rules (w/len). */
-#define        EXACTFL                 32      /* 0x20 Match this string (not 
guaranteed to be folded) using /il rules (w/len). */
-#define        EXACTFU                 33      /* 0x21 Match this string 
(folded iff in UTF-8, length in folding doesn't change if not in UTF-8) using 
/iu rules (w/len). */
-#define        EXACTFA                 34      /* 0x22 Match this string (not 
guaranteed to be folded) using /iaa rules (w/len). */
-#define        EXACTFU_SS              35      /* 0x23 Match this string 
(folded iff in UTF-8, length in folding may change even if not in UTF-8) using 
/iu rules (w/len). */
-#define        EXACTFA_NO_TRIE         36      /* 0x24 Match this string 
(which is not trie-able; not guaranteed to be folded) using /iaa rules (w/len). 
*/
-#define        NOTHING                 37      /* 0x25 Match empty string. */
-#define        TAIL                    38      /* 0x26 Match empty string. Can 
jump here from outside. */
-#define        STAR                    39      /* 0x27 Match this (simple) 
thing 0 or more times. */
-#define        PLUS                    40      /* 0x28 Match this (simple) 
thing 1 or more times. */
-#define        CURLY                   41      /* 0x29 Match this simple thing 
{n,m} times. */
-#define        CURLYN                  42      /* 0x2a Capture next-after-this 
simple thing */
-#define        CURLYM                  43      /* 0x2b Capture this 
medium-complex thing {n,m} times. */
-#define        CURLYX                  44      /* 0x2c Match this complex 
thing {n,m} times. */
-#define        WHILEM                  45      /* 0x2d Do curly processing and 
see if rest matches. */
-#define        OPEN                    46      /* 0x2e Mark this point in 
input as start of #n. */
-#define        CLOSE                   47      /* 0x2f Analogous to OPEN. */
-#define        REF                     48      /* 0x30 Match some already 
matched string */
-#define        REFF                    49      /* 0x31 Match already matched 
string, folded using native charset rules for non-utf8 */
-#define        REFFL                   50      /* 0x32 Match already matched 
string, folded in loc. */
-#define        REFFU                   51      /* 0x33 Match already matched 
string, folded using unicode rules for non-utf8 */
-#define        REFFA                   52      /* 0x34 Match already matched 
string, folded using unicode rules for non-utf8, no mixing ASCII, non-ASCII */
-#define        NREF                    53      /* 0x35 Match some already 
matched string */
-#define        NREFF                   54      /* 0x36 Match already matched 
string, folded using native charset rules for non-utf8 */
-#define        NREFFL                  55      /* 0x37 Match already matched 
string, folded in loc. */
-#define        NREFFU                  56      /* 0x38 Match already matched 
string, folded using unicode rules for non-utf8 */
-#define        NREFFA                  57      /* 0x39 Match already matched 
string, folded using unicode rules for non-utf8, no mixing ASCII, non-ASCII */
-#define        LONGJMP                 58      /* 0x3a Jump far away. */
-#define        BRANCHJ                 59      /* 0x3b BRANCH with long 
offset. */
-#define        IFMATCH                 60      /* 0x3c Succeeds if the 
following matches. */
-#define        UNLESSM                 61      /* 0x3d Fails if the following 
matches. */
-#define        SUSPEND                 62      /* 0x3e "Independent" sub-RE. */
-#define        IFTHEN                  63      /* 0x3f Switch, should be 
preceded by switcher. */
-#define        GROUPP                  64      /* 0x40 Whether the group 
matched. */
-#define        EVAL                    65      /* 0x41 Execute some Perl code. 
*/
-#define        MINMOD                  66      /* 0x42 Next operator is not 
greedy. */
-#define        LOGICAL                 67      /* 0x43 Next opcode should set 
the flag only. */
-#define        RENUM                   68      /* 0x44 Group with 
independently numbered parens. */
-#define        TRIE                    69      /* 0x45 Match many 
EXACT(F[ALU]?)? at once. flags==type */
-#define        TRIEC                   70      /* 0x46 Same as TRIE, but with 
embedded charclass data */
-#define        AHOCORASICK             71      /* 0x47 Aho Corasick stclass. 
flags==type */
-#define        AHOCORASICKC            72      /* 0x48 Same as AHOCORASICK, 
but with embedded charclass data */
-#define        GOSUB                   73      /* 0x49 recurse to paren arg1 
at (signed) ofs arg2 */
-#define        GOSTART                 74      /* 0x4a recurse to start of 
pattern */
-#define        NGROUPP                 75      /* 0x4b Whether the group 
matched. */
-#define        INSUBP                  76      /* 0x4c Whether we are in a 
specific recurse. */
-#define        DEFINEP                 77      /* 0x4d Never execute directly. 
*/
-#define        ENDLIKE                 78      /* 0x4e Used only for the type 
field of verbs */
-#define        OPFAIL                  79      /* 0x4f Same as (?!) */
-#define        ACCEPT                  80      /* 0x50 Accepts the current 
matched string. */
-#define        VERB                    81      /* 0x51 Used only for the type 
field of verbs */
-#define        PRUNE                   82      /* 0x52 Pattern fails at this 
startpoint if no-backtracking through this */
-#define        MARKPOINT               83      /* 0x53 Push the current 
location for rollback by cut. */
-#define        SKIP                    84      /* 0x54 On failure skip forward 
(to the mark) before retrying */
-#define        COMMIT                  85      /* 0x55 Pattern fails outright 
if backtracking through this */
-#define        CUTGROUP                86      /* 0x56 On failure go to the 
next alternation in the group */
-#define        KEEPS                   87      /* 0x57 $& begins here. */
-#define        LNBREAK                 88      /* 0x58 generic newline pattern 
*/
-#define        OPTIMIZED               89      /* 0x59 Placeholder for dump. */
-#define        PSEUDO                  90      /* 0x5a Pseudo opcode for 
internal use. */
+#define        ANYOFL                  20      /* 0x14 Like ANYOF, but /l is 
in effect */
+#define        POSIXD                  21      /* 0x15 Some [[:class:]] under 
/d; the FLAGS field gives which one */
+#define        POSIXL                  22      /* 0x16 Some [[:class:]] under 
/l; the FLAGS field gives which one */
+#define        POSIXU                  23      /* 0x17 Some [[:class:]] under 
/u; the FLAGS field gives which one */
+#define        POSIXA                  24      /* 0x18 Some [[:class:]] under 
/a; the FLAGS field gives which one */
+#define        NPOSIXD                 25      /* 0x19 complement of POSIXD, 
[[:^class:]] */
+#define        NPOSIXL                 26      /* 0x1a complement of POSIXL, 
[[:^class:]] */
+#define        NPOSIXU                 27      /* 0x1b complement of POSIXU, 
[[:^class:]] */
+#define        NPOSIXA                 28      /* 0x1c complement of POSIXA, 
[[:^class:]] */
+#define        CLUMP                   29      /* 0x1d Match any extended 
grapheme cluster sequence */
+#define        BRANCH                  30      /* 0x1e Match this alternative, 
or the next... */
+#define        EXACT                   31      /* 0x1f Match this string 
(preceded by length). */
+#define        EXACTL                  32      /* 0x20 Like EXACT, but /l is 
in effect. */
+#define        EXACTF                  33      /* 0x21 Match this non-UTF-8 
string (not guaranteed to be folded) using /id rules (w/len). */
+#define        EXACTFL                 34      /* 0x22 Match this string (not 
guaranteed to be folded) using /il rules (w/len). */
+#define        EXACTFU                 35      /* 0x23 Match this string 
(folded iff in UTF-8, length in folding doesn't change if not in UTF-8) using 
/iu rules (w/len). */
+#define        EXACTFA                 36      /* 0x24 Match this string (not 
guaranteed to be folded) using /iaa rules (w/len). */
+#define        EXACTFU_SS              37      /* 0x25 Match this string 
(folded iff in UTF-8, length in folding may change even if not in UTF-8) using 
/iu rules (w/len). */
+#define        EXACTFLU8               38      /* 0x26 Rare cirucmstances: 
like EXACTFU, but is under /l, UTF-8, folded, and everything in it is above 
255. */
+#define        EXACTFA_NO_TRIE         39      /* 0x27 Match this string 
(which is not trie-able; not guaranteed to be folded) using /iaa rules (w/len). 
*/
+#define        NOTHING                 40      /* 0x28 Match empty string. */
+#define        TAIL                    41      /* 0x29 Match empty string. Can 
jump here from outside. */
+#define        STAR                    42      /* 0x2a Match this (simple) 
thing 0 or more times. */
+#define        PLUS                    43      /* 0x2b Match this (simple) 
thing 1 or more times. */
+#define        CURLY                   44      /* 0x2c Match this simple thing 
{n,m} times. */
+#define        CURLYN                  45      /* 0x2d Capture next-after-this 
simple thing */
+#define        CURLYM                  46      /* 0x2e Capture this 
medium-complex thing {n,m} times. */
+#define        CURLYX                  47      /* 0x2f Match this complex 
thing {n,m} times. */
+#define        WHILEM                  48      /* 0x30 Do curly processing and 
see if rest matches. */
+#define        OPEN                    49      /* 0x31 Mark this point in 
input as start of #n. */
+#define        CLOSE                   50      /* 0x32 Analogous to OPEN. */
+#define        REF                     51      /* 0x33 Match some already 
matched string */
+#define        REFF                    52      /* 0x34 Match already matched 
string, folded using native charset rules for non-utf8 */
+#define        REFFL                   53      /* 0x35 Match already matched 
string, folded in loc. */
+#define        REFFU                   54      /* 0x36 Match already matched 
string, folded using unicode rules for non-utf8 */
+#define        REFFA                   55      /* 0x37 Match already matched 
string, folded using unicode rules for non-utf8, no mixing ASCII, non-ASCII */
+#define        NREF                    56      /* 0x38 Match some already 
matched string */
+#define        NREFF                   57      /* 0x39 Match already matched 
string, folded using native charset rules for non-utf8 */
+#define        NREFFL                  58      /* 0x3a Match already matched 
string, folded in loc. */
+#define        NREFFU                  59      /* 0x3b Match already matched 
string, folded using unicode rules for non-utf8 */
+#define        NREFFA                  60      /* 0x3c Match already matched 
string, folded using unicode rules for non-utf8, no mixing ASCII, non-ASCII */
+#define        LONGJMP                 61      /* 0x3d Jump far away. */
+#define        BRANCHJ                 62      /* 0x3e BRANCH with long 
offset. */
+#define        IFMATCH                 63      /* 0x3f Succeeds if the 
following matches. */
+#define        UNLESSM                 64      /* 0x40 Fails if the following 
matches. */
+#define        SUSPEND                 65      /* 0x41 "Independent" sub-RE. */
+#define        IFTHEN                  66      /* 0x42 Switch, should be 
preceded by switcher. */
+#define        GROUPP                  67      /* 0x43 Whether the group 
matched. */
+#define        EVAL                    68      /* 0x44 Execute some Perl code. 
*/
+#define        MINMOD                  69      /* 0x45 Next operator is not 
greedy. */
+#define        LOGICAL                 70      /* 0x46 Next opcode should set 
the flag only. */
+#define        RENUM                   71      /* 0x47 Group with 
independently numbered parens. */
+#define        TRIE                    72      /* 0x48 Match many 
EXACT(F[ALU]?)? at once. flags==type */
+#define        TRIEC                   73      /* 0x49 Same as TRIE, but with 
embedded charclass data */
+#define        AHOCORASICK             74      /* 0x4a Aho Corasick stclass. 
flags==type */
+#define        AHOCORASICKC            75      /* 0x4b Same as AHOCORASICK, 
but with embedded charclass data */
+#define        GOSUB                   76      /* 0x4c recurse to paren arg1 
at (signed) ofs arg2 */
+#define        GOSTART                 77      /* 0x4d recurse to start of 
pattern */
+#define        NGROUPP                 78      /* 0x4e Whether the group 
matched. */
+#define        INSUBP                  79      /* 0x4f Whether we are in a 
specific recurse. */
+#define        DEFINEP                 80      /* 0x50 Never execute directly. 
*/
+#define        ENDLIKE                 81      /* 0x51 Used only for the type 
field of verbs */
+#define        OPFAIL                  82      /* 0x52 Same as (?!) */
+#define        ACCEPT                  83      /* 0x53 Accepts the current 
matched string. */
+#define        VERB                    84      /* 0x54 Used only for the type 
field of verbs */
+#define        PRUNE                   85      /* 0x55 Pattern fails at this 
startpoint if no-backtracking through this */
+#define        MARKPOINT               86      /* 0x56 Push the current 
location for rollback by cut. */
+#define        SKIP                    87      /* 0x57 On failure skip forward 
(to the mark) before retrying */
+#define        COMMIT                  88      /* 0x58 Pattern fails outright 
if backtracking through this */
+#define        CUTGROUP                89      /* 0x59 On failure go to the 
next alternation in the group */
+#define        KEEPS                   90      /* 0x5a $& begins here. */
+#define        LNBREAK                 91      /* 0x5b generic newline pattern 
*/
+#define        OPTIMIZED               92      /* 0x5c Placeholder for dump. */
+#define        PSEUDO                  93      /* 0x5d Pseudo opcode for 
internal use. */
        /* ------------ States ------------- */
 #define        TRIE_next               (REGNODE_MAX + 1)       /* state for 
TRIE */
 #define        TRIE_next_fail          (REGNODE_MAX + 2)       /* state for 
TRIE */
@@ -170,6 +173,7 @@ EXTCONST U8 PL_regkind[] = {
        REG_ANY,        /* SANY                   */
        REG_ANY,        /* CANY                   */
        ANYOF,          /* ANYOF                  */
+       ANYOF,          /* ANYOFL                 */
        POSIXD,         /* POSIXD                 */
        POSIXD,         /* POSIXL                 */
        POSIXD,         /* POSIXU                 */
@@ -181,11 +185,13 @@ EXTCONST U8 PL_regkind[] = {
        CLUMP,          /* CLUMP                  */
        BRANCH,         /* BRANCH                 */
        EXACT,          /* EXACT                  */
+       EXACT,          /* EXACTL                 */
        EXACT,          /* EXACTF                 */
        EXACT,          /* EXACTFL                */
        EXACT,          /* EXACTFU                */
        EXACT,          /* EXACTFA                */
        EXACT,          /* EXACTFU_SS             */
+       EXACT,          /* EXACTFLU8              */
        EXACT,          /* EXACTFA_NO_TRIE        */
        NOTHING,        /* NOTHING                */
        NOTHING,        /* TAIL                   */
@@ -309,6 +315,7 @@ static const U8 regarglen[] = {
        0,                                      /* SANY         */
        0,                                      /* CANY         */
        EXTRA_SIZE(struct regnode_1),           /* ANYOF        */
+       EXTRA_SIZE(struct regnode_1),           /* ANYOFL       */
        0,                                      /* POSIXD       */
        0,                                      /* POSIXL       */
        0,                                      /* POSIXU       */
@@ -320,11 +327,13 @@ static const U8 regarglen[] = {
        0,                                      /* CLUMP        */
        0,                                      /* BRANCH       */
        0,                                      /* EXACT        */
+       0,                                      /* EXACTL       */
        0,                                      /* EXACTF       */
        0,                                      /* EXACTFL      */
        0,                                      /* EXACTFU      */
        0,                                      /* EXACTFA      */
        0,                                      /* EXACTFU_SS   */
+       0,                                      /* EXACTFLU8    */
        0,                                      /* EXACTFA_NO_TRIE */
        0,                                      /* NOTHING      */
        0,                                      /* TAIL         */
@@ -405,6 +414,7 @@ static const char reg_off_by_arg[] = {
        0,      /* SANY         */
        0,      /* CANY         */
        0,      /* ANYOF        */
+       0,      /* ANYOFL       */
        0,      /* POSIXD       */
        0,      /* POSIXL       */
        0,      /* POSIXU       */
@@ -416,11 +426,13 @@ static const char reg_off_by_arg[] = {
        0,      /* CLUMP        */
        0,      /* BRANCH       */
        0,      /* EXACT        */
+       0,      /* EXACTL       */
        0,      /* EXACTF       */
        0,      /* EXACTFL      */
        0,      /* EXACTFU      */
        0,      /* EXACTFA      */
        0,      /* EXACTFU_SS   */
+       0,      /* EXACTFLU8    */
        0,      /* EXACTFA_NO_TRIE */
        0,      /* NOTHING      */
        0,      /* TAIL         */
@@ -506,77 +518,80 @@ EXTCONST char * const PL_reg_name[] = {
        "SANY",                         /* 0x11 */
        "CANY",                         /* 0x12 */
        "ANYOF",                        /* 0x13 */
-       "POSIXD",                       /* 0x14 */
-       "POSIXL",                       /* 0x15 */
-       "POSIXU",                       /* 0x16 */
-       "POSIXA",                       /* 0x17 */
-       "NPOSIXD",                      /* 0x18 */
-       "NPOSIXL",                      /* 0x19 */
-       "NPOSIXU",                      /* 0x1a */
-       "NPOSIXA",                      /* 0x1b */
-       "CLUMP",                        /* 0x1c */
-       "BRANCH",                       /* 0x1d */
-       "EXACT",                        /* 0x1e */
-       "EXACTF",                       /* 0x1f */
-       "EXACTFL",                      /* 0x20 */
-       "EXACTFU",                      /* 0x21 */
-       "EXACTFA",                      /* 0x22 */
-       "EXACTFU_SS",                   /* 0x23 */
-       "EXACTFA_NO_TRIE",              /* 0x24 */
-       "NOTHING",                      /* 0x25 */
-       "TAIL",                         /* 0x26 */
-       "STAR",                         /* 0x27 */
-       "PLUS",                         /* 0x28 */
-       "CURLY",                        /* 0x29 */
-       "CURLYN",                       /* 0x2a */
-       "CURLYM",                       /* 0x2b */
-       "CURLYX",                       /* 0x2c */
-       "WHILEM",                       /* 0x2d */
-       "OPEN",                         /* 0x2e */
-       "CLOSE",                        /* 0x2f */
-       "REF",                          /* 0x30 */
-       "REFF",                         /* 0x31 */
-       "REFFL",                        /* 0x32 */
-       "REFFU",                        /* 0x33 */
-       "REFFA",                        /* 0x34 */
-       "NREF",                         /* 0x35 */
-       "NREFF",                        /* 0x36 */
-       "NREFFL",                       /* 0x37 */
-       "NREFFU",                       /* 0x38 */
-       "NREFFA",                       /* 0x39 */
-       "LONGJMP",                      /* 0x3a */
-       "BRANCHJ",                      /* 0x3b */
-       "IFMATCH",                      /* 0x3c */
-       "UNLESSM",                      /* 0x3d */
-       "SUSPEND",                      /* 0x3e */
-       "IFTHEN",                       /* 0x3f */
-       "GROUPP",                       /* 0x40 */
-       "EVAL",                         /* 0x41 */
-       "MINMOD",                       /* 0x42 */
-       "LOGICAL",                      /* 0x43 */
-       "RENUM",                        /* 0x44 */
-       "TRIE",                         /* 0x45 */
-       "TRIEC",                        /* 0x46 */
-       "AHOCORASICK",                  /* 0x47 */
-       "AHOCORASICKC",                 /* 0x48 */
-       "GOSUB",                        /* 0x49 */
-       "GOSTART",                      /* 0x4a */
-       "NGROUPP",                      /* 0x4b */
-       "INSUBP",                       /* 0x4c */
-       "DEFINEP",                      /* 0x4d */
-       "ENDLIKE",                      /* 0x4e */
-       "OPFAIL",                       /* 0x4f */
-       "ACCEPT",                       /* 0x50 */
-       "VERB",                         /* 0x51 */
-       "PRUNE",                        /* 0x52 */
-       "MARKPOINT",                    /* 0x53 */
-       "SKIP",                         /* 0x54 */
-       "COMMIT",                       /* 0x55 */
-       "CUTGROUP",                     /* 0x56 */
-       "KEEPS",                        /* 0x57 */
-       "LNBREAK",                      /* 0x58 */
-       "OPTIMIZED",                    /* 0x59 */
-       "PSEUDO",                       /* 0x5a */
+       "ANYOFL",                       /* 0x14 */
+       "POSIXD",                       /* 0x15 */
+       "POSIXL",                       /* 0x16 */
+       "POSIXU",                       /* 0x17 */
+       "POSIXA",                       /* 0x18 */
+       "NPOSIXD",                      /* 0x19 */
+       "NPOSIXL",                      /* 0x1a */
+       "NPOSIXU",                      /* 0x1b */
+       "NPOSIXA",                      /* 0x1c */
+       "CLUMP",                        /* 0x1d */
+       "BRANCH",                       /* 0x1e */
+       "EXACT",                        /* 0x1f */
+       "EXACTL",                       /* 0x20 */
+       "EXACTF",                       /* 0x21 */
+       "EXACTFL",                      /* 0x22 */
+       "EXACTFU",                      /* 0x23 */
+       "EXACTFA",                      /* 0x24 */
+       "EXACTFU_SS",                   /* 0x25 */
+       "EXACTFLU8",                    /* 0x26 */
+       "EXACTFA_NO_TRIE",              /* 0x27 */
+       "NOTHING",                      /* 0x28 */
+       "TAIL",                         /* 0x29 */
+       "STAR",                         /* 0x2a */
+       "PLUS",                         /* 0x2b */
+       "CURLY",                        /* 0x2c */
+       "CURLYN",                       /* 0x2d */
+       "CURLYM",                       /* 0x2e */
+       "CURLYX",                       /* 0x2f */
+       "WHILEM",                       /* 0x30 */
+       "OPEN",                         /* 0x31 */
+       "CLOSE",                        /* 0x32 */
+       "REF",                          /* 0x33 */
+       "REFF",                         /* 0x34 */
+       "REFFL",                        /* 0x35 */
+       "REFFU",                        /* 0x36 */
+       "REFFA",                        /* 0x37 */
+       "NREF",                         /* 0x38 */
+       "NREFF",                        /* 0x39 */
+       "NREFFL",                       /* 0x3a */
+       "NREFFU",                       /* 0x3b */
+       "NREFFA",                       /* 0x3c */
+       "LONGJMP",                      /* 0x3d */
+       "BRANCHJ",                      /* 0x3e */
+       "IFMATCH",                      /* 0x3f */
+       "UNLESSM",                      /* 0x40 */
+       "SUSPEND",                      /* 0x41 */
+       "IFTHEN",                       /* 0x42 */
+       "GROUPP",                       /* 0x43 */
+       "EVAL",                         /* 0x44 */
+       "MINMOD",                       /* 0x45 */
+       "LOGICAL",                      /* 0x46 */
+       "RENUM",                        /* 0x47 */
+       "TRIE",                         /* 0x48 */
+       "TRIEC",                        /* 0x49 */
+       "AHOCORASICK",                  /* 0x4a */
+       "AHOCORASICKC",                 /* 0x4b */
+       "GOSUB",                        /* 0x4c */
+       "GOSTART",                      /* 0x4d */
+       "NGROUPP",                      /* 0x4e */
+       "INSUBP",                       /* 0x4f */
+       "DEFINEP",                      /* 0x50 */
+       "ENDLIKE",                      /* 0x51 */
+       "OPFAIL",                       /* 0x52 */
+       "ACCEPT",                       /* 0x53 */
+       "VERB",                         /* 0x54 */
+       "PRUNE",                        /* 0x55 */
+       "MARKPOINT",                    /* 0x56 */
+       "SKIP",                         /* 0x57 */
+       "COMMIT",                       /* 0x58 */
+       "CUTGROUP",                     /* 0x59 */
+       "KEEPS",                        /* 0x5a */
+       "LNBREAK",                      /* 0x5b */
+       "OPTIMIZED",                    /* 0x5c */
+       "PSEUDO",                       /* 0x5d */
        /* ------------ States ------------- */
        "TRIE_next",                    /* REGNODE_MAX +0x01 */
        "TRIE_next_fail",               /* REGNODE_MAX +0x02 */
@@ -711,7 +726,7 @@ EXTCONST U8 PL_varies[] __attribute__deprecated__ = {
 EXTCONST U8 PL_varies_bitmask[];
 #else
 EXTCONST U8 PL_varies_bitmask[] = {
-    0x00, 0x00, 0x00, 0x30, 0x80, 0x3F, 0xFF, 0xCB, 0x00, 0x00, 0x00, 0x00
+    0x00, 0x00, 0x00, 0x60, 0x00, 0xFC, 0xF9, 0x5F, 0x06, 0x00, 0x00, 0x00
 };
 #endif /* DOINIT */
 
@@ -723,8 +738,8 @@ EXTCONST U8 PL_varies_bitmask[] = {
 EXTCONST U8 PL_simple[] __attribute__deprecated__;
 #else
 EXTCONST U8 PL_simple[] __attribute__deprecated__ = {
-    REG_ANY, SANY, CANY, ANYOF, POSIXD, POSIXL, POSIXU, POSIXA, NPOSIXD,
-    NPOSIXL, NPOSIXU, NPOSIXA,
+    REG_ANY, SANY, CANY, ANYOF, ANYOFL, POSIXD, POSIXL, POSIXU, POSIXA,
+    NPOSIXD, NPOSIXL, NPOSIXU, NPOSIXA,
     0
 };
 #endif /* DOINIT */
@@ -733,7 +748,7 @@ EXTCONST U8 PL_simple[] __attribute__deprecated__ = {
 EXTCONST U8 PL_simple_bitmask[];
 #else
 EXTCONST U8 PL_simple_bitmask[] = {
-    0x00, 0x00, 0xFF, 0x0F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+    0x00, 0x00, 0xFF, 0x1F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
 };
 #endif /* DOINIT */
 
diff --git a/sv.c b/sv.c
index 1f9ea87..94740d3 100644
--- a/sv.c
+++ b/sv.c
@@ -14588,6 +14588,9 @@ perl_clone_using(PerlInterpreter *proto_perl, UV flags,
     /* Unicode features (see perlrun/-C) */
     PL_unicode         = proto_perl->Iunicode;
 
+    /* Should we warn if uses locale? */
+    PL_warn_locale      = proto_perl->Iwarn_locale;
+
     /* Pre-5.8 signals control */
     PL_signals         = proto_perl->Isignals;
 
diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec
index 73696df..0c6a16a 100644
--- a/t/lib/warnings/regexec
+++ b/t/lib/warnings/regexec
@@ -117,3 +117,29 @@ $_ = 'a' x (2**15+1);
 #
 EXPECT
 
+########
+# NAME Wide character in non-UTF-8 locale
+eval { require POSIX; POSIX->import("locale_h") };
+if ($@) {
+    print("SKIPPED\n# no POSIX\n"),exit;
+}
+use warnings 'locale';
+use locale;
+setlocale(&POSIX::LC_CTYPE, "C");
+"\x{100}" =~ /\x{100}|\x{101}/il;
+"\x{100}" =~ /\x{100}|\x{101}/l;
+"\x{100}" =~ /\w/l;
+"\x{100}" =~ /\x{100}+/l;
+"\x{100}" =~ /[\x{100}\x{102}]/l;
+no warnings 'locale';
+EXPECT
+Wide character (U+100) in pattern match (m//) at - line 8.
+Wide character (U+100) in pattern match (m//) at - line 8.
+Wide character (U+100) in pattern match (m//) at - line 9.
+Wide character (U+100) in pattern match (m//) at - line 9.
+Wide character (U+100) in pattern match (m//) at - line 9.
+Wide character (U+100) in pattern match (m//) at - line 10.
+Wide character (U+100) in pattern match (m//) at - line 10.
+Wide character (U+100) in pattern match (m//) at - line 11.
+Wide character (U+100) in pattern match (m//) at - line 12.
+Wide character (U+100) in pattern match (m//) at - line 12.
diff --git a/t/lib/warnings/utf8 b/t/lib/warnings/utf8
index 3690ce1..75f3f25 100644
--- a/t/lib/warnings/utf8
+++ b/t/lib/warnings/utf8
@@ -597,9 +597,46 @@ $a = fc("\x{1E9E}");
 $a = fc("\x{FB05}");
 $a = uc("\x{FB00}");
 $a = ucfirst("\x{149}");
+$a = lcfirst("\x{178}");
+no warnings 'locale';
+$a = lc("\x{178}");
+$a = fc("\x{1E9E}");
+$a = fc("\x{FB05}");
+$a = uc("\x{FB00}");
+$a = ucfirst("\x{149}");
+$a = lcfirst("\x{178}");
 EXPECT
 Can't do lc("\x{178}") on non-UTF-8 locale; resolved to "\x{178}". at - line 
10.
 Can't do fc("\x{1E9E}") on non-UTF-8 locale; resolved to "\x{17F}\x{17F}". at 
- line 11.
 Can't do fc("\x{FB05}") on non-UTF-8 locale; resolved to "\x{FB06}". at - line 
12.
 Can't do uc("\x{FB00}") on non-UTF-8 locale; resolved to "\x{FB00}". at - line 
13.
 Can't do ucfirst("\x{149}") on non-UTF-8 locale; resolved to "\x{149}". at - 
line 14.
+Can't do lcfirst("\x{178}") on non-UTF-8 locale; resolved to "\x{178}". at - 
line 15.
+########
+# NAME Wide character in non-UTF-8 locale
+eval { require POSIX; POSIX->import("locale_h") };
+if ($@) {
+    print("SKIPPED\n# no POSIX\n"),exit;
+}
+use warnings 'locale';
+use feature 'fc';
+use locale;
+setlocale(&POSIX::LC_CTYPE, "C");
+my $a;
+$a = lc("\x{100}");
+$a = lcfirst("\x{101}");
+$a = fc("\x{102}");
+$a = uc("\x{103}");
+$a = ucfirst("\x{104}");
+no warnings 'locale';
+$a = lc("\x{100}");
+$a = lcfirst("\x{101}");
+$a = fc("\x{102}");
+$a = uc("\x{103}");
+$a = ucfirst("\x{104}");
+EXPECT
+Wide character (U+100) in lc at - line 10.
+Wide character (U+101) in lcfirst at - line 11.
+Wide character (U+102) in fc at - line 12.
+Wide character (U+103) in uc at - line 13.
+Wide character (U+104) in ucfirst at - line 14.
diff --git a/t/re/charset.t b/t/re/charset.t
index 4d0d99c..e061916 100644
--- a/t/re/charset.t
+++ b/t/re/charset.t
@@ -9,6 +9,8 @@ BEGIN {
 
 use strict;
 use warnings;
+no warnings 'locale';   # Some /l tests use above-latin1 chars to make sure
+                        # they work, even though they warn.
 use Config;
 
 plan('no_plan');
diff --git a/t/re/pat_advanced.t b/t/re/pat_advanced.t
index c210e2e..19d6fbc 100644
--- a/t/re/pat_advanced.t
+++ b/t/re/pat_advanced.t
@@ -866,6 +866,7 @@ sub run_tests {
         ok "\x{100}X" =~ /$re/, "S_cl_and ANYOF_UNICODE & ANYOF_INVERTED";
         my $loc_re = qq /(?l:^([^X]*)X)/;
         utf8::upgrade ($loc_re);
+        no warnings 'locale';
         ok "\x{100}X" =~ /$loc_re/, "locale, S_cl_and ANYOF_UNICODE & 
ANYOF_INVERTED";
     }
 
diff --git a/utf8.c b/utf8.c
index f328372..8551e11 100644
--- a/utf8.c
+++ b/utf8.c
@@ -1600,22 +1600,23 @@ Perl__to_uni_fold_flags(pTHX_ UV c, U8* p, STRLEN 
*lenp, U8 flags)
 
     PERL_ARGS_ASSERT__TO_UNI_FOLD_FLAGS;
 
-    /* Tread a UTF-8 locale as not being in locale at all */
-    if (IN_UTF8_CTYPE_LOCALE) {
-        flags &= ~FOLD_FLAGS_LOCALE;
+    if (flags & FOLD_FLAGS_LOCALE) {
+        /* Treat a UTF-8 locale as not being in locale at all */
+        if (IN_UTF8_CTYPE_LOCALE) {
+            flags &= ~FOLD_FLAGS_LOCALE;
+        }
+        else {
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+            goto needs_full_generality;
+        }
     }
 
     if (c < 256) {
-       UV result = _to_fold_latin1((U8) c, p, lenp,
+        return _to_fold_latin1((U8) c, p, lenp,
                            flags & (FOLD_FLAGS_FULL | FOLD_FLAGS_NOMIX_ASCII));
-       /* It is illegal for the fold to cross the 255/256 boundary under
-        * locale; in this case return the original */
-       return (result > 256 && flags & FOLD_FLAGS_LOCALE)
-              ? c
-              : result;
     }
 
-    /* If no special needs, just use the macro */
+    /* Here, above 255.  If no special needs, just use the macro */
     if ( ! (flags & (FOLD_FLAGS_LOCALE|FOLD_FLAGS_NOMIX_ASCII))) {
        uvchr_to_utf8(p, c);
        return CALL_FOLD_CASE(p, p, lenp, flags & FOLD_FLAGS_FULL);
@@ -1623,6 +1624,8 @@ Perl__to_uni_fold_flags(pTHX_ UV c, U8* p, STRLEN *lenp, 
U8 flags)
     else {  /* Otherwise, _to_utf8_fold_flags has the intelligence to deal with
               the special flags. */
        U8 utf8_c[UTF8_MAXBYTES + 1];
+
+      needs_full_generality:
        uvchr_to_utf8(utf8_c, c);
        return _to_utf8_fold_flags(utf8_c, p, lenp, flags);
     }
@@ -1876,7 +1879,7 @@ Perl_to_utf8_case(pTHX_ const U8 *p, U8* ustrp, STRLEN 
*lenp,
 }
 
 STATIC UV
-S_check_locale_boundary_crossing(pTHX_ const char * const func_name, const U8* 
const p, const UV result, U8* const ustrp, STRLEN *lenp)
+S_check_locale_boundary_crossing(pTHX_ const U8* const p, const UV result, U8* 
const ustrp, STRLEN *lenp)
 {
     /* This is called when changing the case of a utf8-encoded character above
      * the Latin1 range, and the operation is in a non-UTF-8 locale.  If the
@@ -1911,7 +1914,8 @@ S_check_locale_boundary_crossing(pTHX_ const char * const 
func_name, const U8* c
            s += UTF8SKIP(s);
        }
 
-       /* Here, no characters crossed, result is ok as-is */
+        /* Here, no characters crossed, result is ok as-is, but we warn. */
+        _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(p, p + UTF8SKIP(p));
        return result;
     }
 
@@ -1924,7 +1928,7 @@ bad_crossing:
     Perl_ck_warner(aTHX_ packWARN(WARN_LOCALE),
                            "Can't do %s(\"\\x{%"UVXf"}\") on non-UTF-8 locale; 
"
                            "resolved to \"\\x{%"UVXf"}\".",
-                           func_name,
+                           OP_DESC(PL_op),
                            original,
                            original);
     Copy(p, ustrp, *lenp, char);
@@ -1949,8 +1953,14 @@ Perl__to_utf8_upper_flags(pTHX_ const U8 *p, U8* ustrp, 
STRLEN *lenp, bool flags
 
     PERL_ARGS_ASSERT__TO_UTF8_UPPER_FLAGS;
 
-    if (flags && IN_UTF8_CTYPE_LOCALE) {
-        flags = FALSE;
+    if (flags) {
+        /* Treat a UTF-8 locale as not being in locale at all */
+        if (IN_UTF8_CTYPE_LOCALE) {
+            flags = FALSE;
+        }
+        else {
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+        }
     }
 
     if (UTF8_IS_INVARIANT(*p)) {
@@ -1975,7 +1985,7 @@ Perl__to_utf8_upper_flags(pTHX_ const U8 *p, U8* ustrp, 
STRLEN *lenp, bool flags
        result = CALL_UPPER_CASE(p, ustrp, lenp);
 
        if (flags) {
-           result = check_locale_boundary_crossing("uc", p, result, ustrp, 
lenp);
+           result = check_locale_boundary_crossing(p, result, ustrp, lenp);
        }
        return result;
     }
@@ -2014,8 +2024,14 @@ Perl__to_utf8_title_flags(pTHX_ const U8 *p, U8* ustrp, 
STRLEN *lenp, bool flags
 
     PERL_ARGS_ASSERT__TO_UTF8_TITLE_FLAGS;
 
-    if (flags && IN_UTF8_CTYPE_LOCALE) {
-        flags = FALSE;
+    if (flags) {
+        /* Treat a UTF-8 locale as not being in locale at all */
+        if (IN_UTF8_CTYPE_LOCALE) {
+            flags = FALSE;
+        }
+        else {
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+        }
     }
 
     if (UTF8_IS_INVARIANT(*p)) {
@@ -2040,7 +2056,7 @@ Perl__to_utf8_title_flags(pTHX_ const U8 *p, U8* ustrp, 
STRLEN *lenp, bool flags
        result = CALL_TITLE_CASE(p, ustrp, lenp);
 
        if (flags) {
-           result = check_locale_boundary_crossing("ucfirst", p, result, 
ustrp, lenp);
+           result = check_locale_boundary_crossing(p, result, ustrp, lenp);
        }
        return result;
     }
@@ -2078,8 +2094,14 @@ Perl__to_utf8_lower_flags(pTHX_ const U8 *p, U8* ustrp, 
STRLEN *lenp, bool flags
 
     PERL_ARGS_ASSERT__TO_UTF8_LOWER_FLAGS;
 
-    if (flags && IN_UTF8_CTYPE_LOCALE) {
-        flags = FALSE;
+    if (flags) {
+        /* Treat a UTF-8 locale as not being in locale at all */
+        if (IN_UTF8_CTYPE_LOCALE) {
+            flags = FALSE;
+        }
+        else {
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+        }
     }
 
     if (UTF8_IS_INVARIANT(*p)) {
@@ -2104,7 +2126,7 @@ Perl__to_utf8_lower_flags(pTHX_ const U8 *p, U8* ustrp, 
STRLEN *lenp, bool flags
        result = CALL_LOWER_CASE(p, ustrp, lenp);
 
        if (flags) {
-           result = check_locale_boundary_crossing("lc", p, result, ustrp, 
lenp);
+           result = check_locale_boundary_crossing(p, result, ustrp, lenp);
        }
 
        return result;
@@ -2153,8 +2175,14 @@ Perl__to_utf8_fold_flags(pTHX_ const U8 *p, U8* ustrp, 
STRLEN *lenp, U8 flags)
 
     assert(p != ustrp); /* Otherwise overwrites */
 
-    if (flags & FOLD_FLAGS_LOCALE && IN_UTF8_CTYPE_LOCALE) {
-        flags &= ~FOLD_FLAGS_LOCALE;
+    if (flags & FOLD_FLAGS_LOCALE) {
+        /* Treat a UTF-8 locale as not being in locale at all */
+        if (IN_UTF8_CTYPE_LOCALE) {
+            flags &= ~FOLD_FLAGS_LOCALE;
+        }
+        else {
+            _CHECK_AND_WARN_PROBLEMATIC_LOCALE;
+        }
     }
 
     if (UTF8_IS_INVARIANT(*p)) {
@@ -2208,7 +2236,7 @@ Perl__to_utf8_fold_flags(pTHX_ const U8 *p, U8* ustrp, 
STRLEN *lenp, U8 flags)
                               "resolved to \"\\x{FB06}\".");
                 goto return_ligature_st;
             }
-           return check_locale_boundary_crossing("fc", p, result, ustrp, lenp);
+           return check_locale_boundary_crossing(p, result, ustrp, lenp);
        }
        else if (! (flags & FOLD_FLAGS_NOMIX_ASCII)) {
            return result;
@@ -3911,8 +3939,18 @@ L<http://www.unicode.org/unicode/reports/tr21/> (Case 
Mappings).
  *  FOLDEQ_LOCALE          is set iff the rules from the current underlying
  *                         locale are to be used.
  *  FOLDEQ_S1_ALREADY_FOLDED  s1 has already been folded before calling this
- *                            routine.  This allows that step to be skipped.
+ *                          routine.  This allows that step to be skipped.
+ *                          Currently, this requires s1 to be encoded as UTF-8
+ *                          (u1 must be true), which is asserted for.
+ *  FOLDEQ_S1_FOLDS_SANE    With either NOMIX_ASCII or LOCALE, no folds may
+ *                          cross certain boundaries.  Hence, the caller should
+ *                          let this function do the folding instead of
+ *                          pre-folding.  This code contains an assertion to
+ *                          that effect.  However, if the caller knows what
+ *                          it's doing, it can pass this flag to indicate that,
+ *                          and the assertion is skipped.
  *  FOLDEQ_S2_ALREADY_FOLDED  Similarly.
+ *  FOLDEQ_S2_FOLDS_SANE
  */
 I32
 Perl_foldEQ_utf8_flags(pTHX_ const char *s1, char **pe1, UV l1, bool u1, const 
char *s2, char **pe2, UV l2, bool u2, U32 flags)
@@ -3928,11 +3966,15 @@ Perl_foldEQ_utf8_flags(pTHX_ const char *s1, char 
**pe1, UV l1, bool u1, const c
     STRLEN n1 = 0, n2 = 0;              /* Number of bytes in current char */
     U8 foldbuf1[UTF8_MAXBYTES_CASE+1];
     U8 foldbuf2[UTF8_MAXBYTES_CASE+1];
+    U8 flags_for_folder = FOLD_FLAGS_FULL;
 
     PERL_ARGS_ASSERT_FOLDEQ_UTF8_FLAGS;
 
     assert( ! ((flags & (FOLDEQ_UTF8_NOMIX_ASCII | FOLDEQ_LOCALE))
-           && (flags & (FOLDEQ_S1_ALREADY_FOLDED | 
FOLDEQ_S2_ALREADY_FOLDED))));
+               && (((flags & FOLDEQ_S1_ALREADY_FOLDED)
+                     && !(flags & FOLDEQ_S1_FOLDS_SANE))
+                   || ((flags & FOLDEQ_S2_ALREADY_FOLDED)
+                       && !(flags & FOLDEQ_S2_FOLDS_SANE)))));
     /* The algorithm is to trial the folds without regard to the flags on
      * the first line of the above assert(), and then see if the result
      * violates them.  This means that the inputs can't be pre-folded to a
@@ -3944,8 +3986,13 @@ Perl_foldEQ_utf8_flags(pTHX_ const char *s1, char **pe1, 
UV l1, bool u1, const c
      * and /iaa matches are most likely to involve code points 0-255, and this
      * function only under rare conditions gets called for 0-255. */
 
-    if (IN_UTF8_CTYPE_LOCALE) {
-        flags &= ~FOLDEQ_LOCALE;
+    if (flags & FOLDEQ_LOCALE) {
+        if (IN_UTF8_CTYPE_LOCALE) {
+            flags &= ~FOLDEQ_LOCALE;
+        }
+        else {
+            flags_for_folder |= FOLD_FLAGS_LOCALE;
+        }
     }
 
     if (pe1) {
@@ -3997,98 +4044,59 @@ Perl_foldEQ_utf8_flags(pTHX_ const char *s1, char 
**pe1, UV l1, bool u1, const c
     while (p1 < e1 && p2 < e2) {
 
         /* If at the beginning of a new character in s1, get its fold to use
-        * and the length of the fold.  (exception: locale rules just get the
-        * character to a single byte) */
+        * and the length of the fold. */
         if (n1 == 0) {
            if (flags & FOLDEQ_S1_ALREADY_FOLDED) {
                f1 = (U8 *) p1;
+                assert(u1);
                n1 = UTF8SKIP(f1);
            }
            else {
-               /* If in locale matching, we use two sets of rules, depending
-                * on if the code point is above or below 255.  Here, we test
-                * for and handle locale rules */
-               if ((flags & FOLDEQ_LOCALE)
-                   && (! u1 || ! UTF8_IS_ABOVE_LATIN1(*p1)))
-               {
-                   /* There is no mixing of code points above and below 255. */
-                   if (u2 && UTF8_IS_ABOVE_LATIN1(*p2)) {
-                       return 0;
-                   }
-
-                   /* We handle locale rules by converting, if necessary, the
-                    * code point to a single byte. */
-                   if (! u1 || UTF8_IS_INVARIANT(*p1)) {
-                       *foldbuf1 = *p1;
-                   }
-                   else {
**** PATCH TRUNCATED AT 2000 LINES -- 120 NOT SHOWN ****

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.21.7-165-g613abc6

Reply via email to