[perl.git] branch blead, updated. v5.25.9-138-g7e2f38b21a

Karl Williamson Mon, 13 Feb 2017 20:24:52 -0800

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/7e2f38b21adec3a814f436b1c0c7e79796f5dcc5?hp=4623556176aef2c3bff081e1456fdf7f5c0695c1>


- Log -----------------------------------------------------------------
commit 7e2f38b21adec3a814f436b1c0c7e79796f5dcc5
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 19:57:53 2017 -0700

    Make _byte_dump_string() usable in all of core
    
    I found myself needing this function for development debugging, which
    formerly was only usable from utf8.c.  This enhances it to allow a
    second format type, and makes it core-accessible.

M       embed.fnc
M       embed.h
M       locale.c
M       proto.h
M       utf8.c

commit 6b58f9be8c82394d776908727ff4e9d582f87f12
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 19:22:59 2017 -0700

    toke.c: Make sure things are initialized
    
    Commit 3dd4eaeb8ac39e08179145b86aedda36584a3509 fixed a bug wherein the
    tr/// operator parsing code could be looking at uninitialized data.
    This happens only because we try to carry on when we find errors, so as
    to find as many errors as possible in a single run, as a convenience to
    the person debugging the script being compiled.  And we failed to
    initialize stuff upon getting an error; stuff that was later looked at
    by tr///.
    
    That commit fixed the ticket by making sure the things mentioned there
    got initialized upon error, but didn't handle the various other places
    in the loop where the same thing could happen.
    
    At the time, I thought it would be easier to instead change the tr///
    handling code to know that its inputs were problematic, and to avoid
    looking at them in that case.  This is easily done, and would
    automatically catch all the cases in the loop, now and any added in the
    future.
    
    But then I thought, maybe tr/// isn't the only operator that could be
    thrown off by this.  It is the most obvious one, to someone who knows
    how it goes about getting compiled; but there may be other operators
    that I don't know how they get compiled and have the same or a similar
    problem.  The better solution then would be to extend
    3dd4eaeb8ac39e08179145b86aedda36584a3509 to make sure everything gets
    initialized when there is an error.  That is what this current commit
    does.
    
    The previous few commits have refactored things so as to minimize the
    number of places that need to be handled here, down to three.    I kinda
    doubt that new constructs will be added, at this stage in the language
    development, that would require the same initialization handling.  But,
    if they were, hopefully those doing it would follow the existing
    paradigm that this commit and 3dd4eaeb8ac39e08179145b86aedda36584a3509
    establish.
    
    Another way to handle this would have been to, instead of doing an
    initialize-and-'continue', to instead jump to a common label at the
    bottom of the loop which does the initialization.  I think it doesn't
    matter much which, so left it as this.

M       toke.c

commit e90756be0bd7bbe5f7193c0e9e856c56ae09c085
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 19:18:38 2017 -0700

    toke.c: Quit now if error at end of input
    
    In these two cases, we know we are at the end of the input, and that we
    have an error.  There is no need to try to patch things up so we can
    continue to parse looking for other errors; there's nothing left to
    parse.  So skip having to deal with patching up.

M       toke.c

commit 4c3e84f3090c9cc68d352a2c80a95c91d25a8fc7
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 19:12:31 2017 -0700

    toke.c: Un-special case something
    
    By refactoring slightly, we make this code in a switch statement
    have the same entrance and exit invariants as the other cases, so they
    all can be handled uniformly at the end of the switch.

M       toke.c

commit f065e1e68bf6a5541c8ceba8c9fcc6e18f51a32b
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 19:01:46 2017 -0700

    Don't try to compile a pattern known to be in error
    
    Regular expression patterns are parsed by the lexer/toker, and then
    compiled by the regex compiler.  It is foolish to try to compile one
    that the parser has rejected as syntactically bad; assumptions may be
    violated and segfaults ensue.  This commit abandons all parsing
    immediately if a pattern had errors in it.  A better solution would be
    to flag this pattern as not to be compiled, and continue parsing other
    things so as to find the most errors in a single attempt, but I don't
    think it's worth the extra effort.
    
    Making this change caused some misleading error messages in the test
    suite to be replaced by better ones.

M       t/re/re_tests
M       toke.c

commit d43328d502ac91c4d98e218d0721cd5f3bcd3950
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 18:49:52 2017 -0700

    toke.c: Add internal function to abort parsing
    
    This is to be called to abort the parsing early, before the required
    number of errors have been found.  It is used when continuing the parse
    would be either fruitless or we could be looking at garbage.

M       embed.fnc
M       embed.h
M       proto.h
M       toke.c

commit ae389f219fdcdf1c117790fefa36a302c0d16edc
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 18:46:30 2017 -0700

    toke.c: White-space only
    
    Indent after the previous commit enclosed this code in a new block.

M       toke.c

commit 68da2a652769fdc14ed8787783c334789f2614d1
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 18:27:02 2017 -0700

    Relax internal function API
    
    This changes  yyerror_pvn so that its first parameter can be NULL.  This
    indicates no message is to be output, but that parsing is to be
    abandoned immediately, without waiting for more errors to build up.

M       embed.fnc
M       proto.h
M       toke.c

commit c77da5ff879f4bc56ececd521148ca08ab8ce837
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 13:18:38 2017 -0700

    Extract code into a function
    
    This creates a function in toke.c to output the compilation aborted
    message, changing perl.c to call that function.  This is in preparation
    for this to be called from a 2nd place

M       embed.fnc
M       embed.h
M       perl.c
M       pod/perldiag.pod
M       proto.h
M       toke.c

commit f88c646639aaa9f3f90a801fa3aa5d1ce27fd3c0
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 13:03:32 2017 -0700

    toke.c: Rmv no longer necessary UTF-8 checks
    
    The previous commit tightened up the checking for well-formed UTF8ness,
    so that the ones removed here were redundant.
    
    The test during a string eval may also no longer be necessary, but since
    there are many ways to create that string, I'm not confidant enough to
    remove it.

M       pod/perldiag.pod
M       toke.c

commit 7f7dbdc9c79b3476c92af87ef8d1f9178f566fc5
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 20:40:57 2017 -0700

    Add test for [perl #130675]

M       t/lib/croak/toke_l1

commit 0b75cb8099a8002299b9fedfe415323b9e21a5e0
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 20:37:47 2017 -0700

    t/lib/croak/toke_l1: Cut down test
    
    The previous commits hardening toke.c against malformed UTF-8 input have
    allowed this test case to be cut down substantially

M       t/lib/croak/toke_l1

commit efa571ab2dff0efcc97f14f98ed11f9296cd27a7
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 12:35:18 2017 -0700

    toke.c: Fix bugs where UTF-8 is turned on in mid chunk
    
    Previous commits have tightened up the checking of UTF-8 for
    well-formedness in the input program or string eval.  This is done in
    lex_next_chunk and lex_start.  But it doesn't handle the case of
    
        use utf8; foo
    
    because 'foo' is checked while UTF-8 is still off.  This solves that
    problem by noticing when utf8 is turned on, and then rechecking at the
    next opportunity.
    
    See thread beginning at
    http://nntp.perl.org/group/perl.perl5.porters/242916
    
    This fixes [perl #130675].  A test will be added in a future commit
    
    This catches some errors earlier than they used to be and aborts. so
    some tests in the suite had to be split into multiple parts.

M       embed.fnc
M       embed.h
M       lib/utf8.t
M       mg.c
M       parser.h
M       proto.h
M       sv.c
M       t/lib/croak/toke
M       t/lib/croak/toke_l1
M       t/uni/parser.t
M       toke.c

commit 5d382c252dbb26302398c7fe38021a0f3c06b809
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 12:16:45 2017 -0700

    mg.c: PL_hints is unsigned
    
    Therefore it's dangerous to presume things fit into an IV.

M       mg.c

commit 0aab20f283bf5c6088780bee353c6a125f50088b
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 12:02:54 2017 -0700

    toke.c: Add branch prediction
    
    The input is far more likely to be well-formed than not.

M       toke.c

commit ef3ff34de0f1e12e7961a83cbf63dcbb91808ba2
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 11:54:06 2017 -0700

    toke.c: Fix comments describing S_tokeq
    
    The comments about what this function does were incorrect.

M       toke.c

commit 07337b9591822033e937370211f01ca63f3243f1
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 11:51:59 2017 -0700

    toke.c: Slight refactor.
    
    This moves an automatic variable to closer to the only place it is used;
    it also adds branch prediction.  It is likely that the input will be
    well-formed.

M       toke.c

commit 218304f90362a79b96bb58e1ef63c0f213451079
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 11:29:35 2017 -0700

    toke.c: White space, comments, braces
    
    I am adding the braces because in one of the areas, the lack of braces
    had led to a blead failure.

M       toke.c

commit 7cc9fe9f24f654e4295a82ca84e289a5160cf10c
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 11:31:53 2017 -0700

    toke.c: Don't compare same bytes twice
    
    Before starting this memEQ, we know that the first bytes are the same,
    so might as well start the compare with the 2nd bytes.

M       toke.c

commit a6909bd2c28e2bd347ad942359c890e1b8ea6d4a
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 13 11:15:07 2017 -0700

    toke.c: Move declaration
    
    This automatic variable doesn't need such a large scope.

M       toke.c
-----------------------------------------------------------------------

Summary of changes:
 embed.fnc           |  10 +-
 embed.h             |   5 +-
 lib/utf8.t          |   2 +-
 locale.c            |   8 +-
 mg.c                |  12 +-
 parser.h            |   2 +
 perl.c              |   7 +-
 pod/perldiag.pod    |  10 +-
 proto.h             |  13 +-
 sv.c                |   1 +
 t/lib/croak/toke    |  18 ++-
 t/lib/croak/toke_l1 | Bin 3193 -> 1127 bytes
 t/re/re_tests       |   7 +-
 t/uni/parser.t      |   2 +-
 toke.c              | 368 +++++++++++++++++++++++++++++++---------------------
 utf8.c              |  41 +++---
 16 files changed, 306 insertions(+), 200 deletions(-)

diff --git a/embed.fnc b/embed.fnc
index d548f5d5e5..0f63ed0c0f 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -859,6 +859,7 @@ pP  |I32    |keyword        |NN const char *name|I32 
len|bool all_keywords
 s      |void   |inplace_aassign        |NN OP* o
 #endif
 Ap     |void   |leave_scope    |I32 base
+p      |void   |notify_parser_that_changed_to_utf8
 : Public lexer API
 AMpd   |void   |lex_start      |NULLOK SV* line|NULLOK PerlIO *rsfp|U32 flags
 AMpd   |bool   |lex_bufutf8
@@ -1714,6 +1715,10 @@ ApdD     |UV     |to_utf8_case   |NN const U8 *p         
                        \
                                |NN SV **swashp                                 
\
                                |NN const char *normal|                         
\
                                NULLOK const char *special
+ApM    |char * |_byte_dump_string                                      \
+                               |NN const U8 * s                        \
+                               |const STRLEN len                       \
+                               |const bool format
 #if defined(PERL_IN_UTF8_C)
 inR    |bool   |does_utf8_overflow|NN const U8 * const s|NN const U8 * e
 inR    |bool   |is_utf8_overlong_given_start_byte_ok|NN const U8 * const 
s|const STRLEN len
@@ -1723,7 +1728,6 @@ sMR       |char * |unexpected_non_continuation_text       
                \
                |STRLEN print_len                                       \
                |const STRLEN non_cont_byte_pos                         \
                |const STRLEN expect_len
-sM     |char * |_byte_dump_string|NN const U8 * s|const STRLEN len
 s      |void   |warn_on_first_deprecated_use                               \
                                |NN const char * const name                 \
                                |NN const char * const alternative          \
@@ -1876,8 +1880,10 @@ inR      |bool   |should_warn_nl|NN const char *pv
 p      |void   |write_to_stderr|NN SV* msv
 : Used in op.c
 p      |int    |yyerror        |NN const char *const s
+p      |void   |yyquit
+p      |void   |abort_execution|NN const char * const msg|NN const char * 
const name
 p      |int    |yyerror_pv     |NN const char *const s|U32 flags
-p      |int    |yyerror_pvn    |NN const char *const s|STRLEN len|U32 flags
+p      |int    |yyerror_pvn    |NULLOK const char *const s|STRLEN len|U32 flags
 : Used in perly.y, and by Data::Alias
 EXp    |int    |yylex
 p      |void   |yyunlex
diff --git a/embed.h b/embed.h
index 2233a35e80..5b9c46c578 100644
--- a/embed.h
+++ b/embed.h
@@ -27,6 +27,7 @@
 /* Hide global symbols */
 
 #define Gv_AMupdate(a,b)       Perl_Gv_AMupdate(aTHX_ a,b)
+#define _byte_dump_string(a,b,c)       Perl__byte_dump_string(aTHX_ a,b,c)
 #define _force_out_malformed_utf8_message(a,b,c,d)     
Perl__force_out_malformed_utf8_message(aTHX_ a,b,c,d)
 #define _is_in_locale_category(a,b)    Perl__is_in_locale_category(aTHX_ a,b)
 #define _is_uni_FOO(a,b)       Perl__is_uni_FOO(aTHX_ a,b)
@@ -1164,6 +1165,7 @@
 #ifdef PERL_CORE
 #define Slab_Alloc(a)          Perl_Slab_Alloc(aTHX_ a)
 #define Slab_Free(a)           Perl_Slab_Free(aTHX_ a)
+#define abort_execution(a,b)   Perl_abort_execution(aTHX_ a,b)
 #define alloc_LOGOP(a,b,c)     Perl_alloc_LOGOP(aTHX_ a,b,c)
 #define allocmy(a,b,c)         Perl_allocmy(aTHX_ a,b,c)
 #define amagic_is_enabled(a)   Perl_amagic_is_enabled(aTHX_ a)
@@ -1351,6 +1353,7 @@
 #define newXS_len_flags(a,b,c,d,e,f,g) Perl_newXS_len_flags(aTHX_ 
a,b,c,d,e,f,g)
 #define nextargv(a,b)          Perl_nextargv(aTHX_ a,b)
 #define noperl_die             Perl_noperl_die
+#define notify_parser_that_changed_to_utf8()   
Perl_notify_parser_that_changed_to_utf8(aTHX)
 #define oopsAV(a)              Perl_oopsAV(aTHX_ a)
 #define oopsHV(a)              Perl_oopsHV(aTHX_ a)
 #define op_unscope(a)          Perl_op_unscope(aTHX_ a)
@@ -1414,6 +1417,7 @@
 #define yyerror_pv(a,b)                Perl_yyerror_pv(aTHX_ a,b)
 #define yyerror_pvn(a,b,c)     Perl_yyerror_pvn(aTHX_ a,b,c)
 #define yyparse(a)             Perl_yyparse(aTHX_ a)
+#define yyquit()               Perl_yyquit(aTHX)
 #define yyunlex()              Perl_yyunlex(aTHX)
 #  if !(defined(DEBUGGING))
 #    if !defined(NV_PRESERVES_UV)
@@ -1836,7 +1840,6 @@
 #define isa_lookup(a,b,c,d)    S_isa_lookup(aTHX_ a,b,c,d)
 #  endif
 #  if defined(PERL_IN_UTF8_C)
-#define _byte_dump_string(a,b) S__byte_dump_string(aTHX_ a,b)
 #define _to_utf8_case(a,b,c,d,e,f,g)   S__to_utf8_case(aTHX_ a,b,c,d,e,f,g)
 #define check_and_deprecate(a,b,c,d,e,f)       S_check_and_deprecate(aTHX_ 
a,b,c,d,e,f)
 #define check_locale_boundary_crossing(a,b,c,d)        
S_check_locale_boundary_crossing(aTHX_ a,b,c,d)
diff --git a/lib/utf8.t b/lib/utf8.t
index e5f9547a8a..d35110baee 100644
--- a/lib/utf8.t
+++ b/lib/utf8.t
@@ -168,7 +168,7 @@ no utf8; # Ironic, no?
     use utf8; %a = ("$malformed" =>"sterling");
     print 'start'; printf '%x,', ord \$_ foreach keys %a; print "end\n";
 BANG
-             qr/^Malformed UTF-8 character: .*? \(too short; \d bytes? 
available, need \d\).*start\d+,end$/sm
+             qr/^Malformed UTF-8 character: .*? \(unexpected non-continuation 
byte/
             ],
             );
     foreach (@tests) {
diff --git a/locale.c b/locale.c
index 01962ea235..1ba802f1ae 100644
--- a/locale.c
+++ b/locale.c
@@ -1908,14 +1908,12 @@ Perl__mem_collxfrm(pTHX_ const char *input_string,
 
 #ifdef DEBUGGING
     if (DEBUG_Lv_TEST || debug_initialization) {
-        Size_t i;
 
         print_collxfrm_input_and_return(s, s + len, xlen, utf8);
         PerlIO_printf(Perl_debug_log, "Its xfrm is:");
-        for (i = COLLXFRM_HDR_LEN; i < *xlen + COLLXFRM_HDR_LEN; i++) {
-            PerlIO_printf(Perl_debug_log, " %02x", (U8) xbuf[i]);
-        }
-        PerlIO_printf(Perl_debug_log, "\n");
+        PerlIO_printf(Perl_debug_log, "%s\n",
+                      _byte_dump_string((U8 *) xbuf + COLLXFRM_HDR_LEN,
+                       *xlen, 1));
     }
 #endif
 
diff --git a/mg.c b/mg.c
index 172127c865..6e648d8c2e 100644
--- a/mg.c
+++ b/mg.c
@@ -944,7 +944,7 @@ Perl_magic_get(pTHX_ SV *sv, MAGIC *mg)
        }
        break;
     case '\010':               /* ^H */
-       sv_setiv(sv, (IV)PL_hints);
+       sv_setuv(sv, PL_hints);
        break;
     case '\011':               /* ^I */ /* NOT \t in EBCDIC */
        sv_setpv(sv, PL_inplace); /* Will undefine sv if PL_inplace is NULL */
@@ -2737,7 +2737,15 @@ Perl_magic_set(pTHX_ SV *sv, MAGIC *mg)
        PL_maxsysfd = SvIV(sv);
        break;
     case '\010':       /* ^H */
-       PL_hints = SvIV(sv);
+        {
+            U32 save_hints = PL_hints;
+            PL_hints = SvUV(sv);
+
+            /* If wasn't UTF-8, and now is, notify the parser */
+            if ((PL_hints & HINT_UTF8) && ! (save_hints & HINT_UTF8)) {
+                notify_parser_that_changed_to_utf8();
+            }
+        }
        break;
     case '\011':       /* ^I */ /* NOT \t in EBCDIC */
        Safefree(PL_inplace);
diff --git a/parser.h b/parser.h
index ad148c26eb..4187e0a93d 100644
--- a/parser.h
+++ b/parser.h
@@ -115,6 +115,8 @@ typedef struct yy_parser {
     IV          sig_optelems;   /* number of optional signature elems seen */
     char        sig_slurpy;     /* the sigil of the slurpy var (or null) */
 
+    bool        recheck_utf8_validity;
+
     PERL_BITFIELD16    in_pod:1;      /* lexer is within a =pod section */
     PERL_BITFIELD16    filtered:1;    /* source filters in evalbytes */
     PERL_BITFIELD16    saw_infix_sigil:1; /* saw & or * or % operator */
diff --git a/perl.c b/perl.c
index 09eb2f4f62..98bf356652 100644
--- a/perl.c
+++ b/perl.c
@@ -2374,12 +2374,7 @@ S_parse_body(pTHX_ char **env, XSINIT_t xsinit)
 
     SETERRNO(0,SS_NORMAL);
     if (yyparse(GRAMPROG) || PL_parser->error_count) {
-       if (PL_minus_c)
-           Perl_croak(aTHX_ "%s had compilation errors.\n", PL_origfilename);
-       else {
-           Perl_croak(aTHX_ "Execution of %s aborted due to compilation 
errors.\n",
-                      PL_origfilename);
-       }
+        abort_execution("", PL_origfilename);
     }
     CopLINE_set(PL_curcop, 0);
     SET_CURSTASH(PL_defstash);
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index c2408f047d..99da61ec1f 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -2193,7 +2193,7 @@ variable and glob that.
 (F) The C<exec> function is not implemented on some systems, e.g., Symbian
 OS.  See L<perlport>.
 
-=item Execution of %s aborted due to compilation errors.
+=item %sExecution of %s aborted due to compilation errors.
 
 (F) The final summary message when a Perl compilation fails.
 
@@ -3414,14 +3414,6 @@ message.
 
 See also L<Encode/"Handling Malformed Data">.
 
-=item Malformed UTF-8 character immediately after '%s'
-
-(F) You said C<use utf8>, but the program file doesn't comply with UTF-8
-encoding rules.  The message prints out the properly encoded characters
-just before the first bad one.  If C<utf8> warnings are enabled, a
-warning is generated that gives more details about the type of
-malformation.
-
 =item Malformed UTF-8 returned by \N{%s} immediately after '%s'
 
 (F) The charnames handler returned malformed UTF-8.
diff --git a/proto.h b/proto.h
index 7ec784981a..c61980e585 100644
--- a/proto.h
+++ b/proto.h
@@ -41,6 +41,9 @@ PERL_CALLCONV void*   Perl_Slab_Alloc(pTHX_ size_t sz)
 PERL_CALLCONV void     Perl_Slab_Free(pTHX_ void *op);
 #define PERL_ARGS_ASSERT_SLAB_FREE     \
        assert(op)
+PERL_CALLCONV char *   Perl__byte_dump_string(pTHX_ const U8 * s, const STRLEN 
len, const bool format);
+#define PERL_ARGS_ASSERT__BYTE_DUMP_STRING     \
+       assert(s)
 PERL_CALLCONV void     Perl__force_out_malformed_utf8_message(pTHX_ const U8 
*const p, const U8 * const e, const U32 flags, const bool die_here);
 #define PERL_ARGS_ASSERT__FORCE_OUT_MALFORMED_UTF8_MESSAGE     \
        assert(p); assert(e)
@@ -115,6 +118,9 @@ PERL_CALLCONV UV    Perl__to_utf8_upper_flags(pTHX_ const 
U8 *p, const U8 *e, U8* u
 #define PERL_ARGS_ASSERT__TO_UTF8_UPPER_FLAGS  \
        assert(p); assert(ustrp); assert(file)
 PERL_CALLCONV void     Perl__warn_problematic_locale(void);
+PERL_CALLCONV void     Perl_abort_execution(pTHX_ const char * const msg, 
const char * const name);
+#define PERL_ARGS_ASSERT_ABORT_EXECUTION       \
+       assert(msg); assert(name)
 PERL_CALLCONV LOGOP*   Perl_alloc_LOGOP(pTHX_ I32 type, OP *first, OP *other);
 PERL_CALLCONV PADOFFSET        Perl_allocmy(pTHX_ const char *const name, 
const STRLEN len, const U32 flags);
 #define PERL_ARGS_ASSERT_ALLOCMY       \
@@ -2331,6 +2337,7 @@ PERL_CALLCONV_NO_RET void Perl_noperl_die(const char* 
pat, ...)
        assert(pat)
 
 PERL_CALLCONV int      Perl_nothreadhook(pTHX);
+PERL_CALLCONV void     Perl_notify_parser_that_changed_to_utf8(pTHX);
 PERL_CALLCONV OP*      Perl_oopsAV(pTHX_ OP* o)
                        __attribute__warn_unused_result__;
 #define PERL_ARGS_ASSERT_OOPSAV        \
@@ -3695,10 +3702,9 @@ PERL_CALLCONV int        Perl_yyerror_pv(pTHX_ const 
char *const s, U32 flags);
 #define PERL_ARGS_ASSERT_YYERROR_PV    \
        assert(s)
 PERL_CALLCONV int      Perl_yyerror_pvn(pTHX_ const char *const s, STRLEN len, 
U32 flags);
-#define PERL_ARGS_ASSERT_YYERROR_PVN   \
-       assert(s)
 PERL_CALLCONV int      Perl_yylex(pTHX);
 PERL_CALLCONV int      Perl_yyparse(pTHX_ int gramtype);
+PERL_CALLCONV void     Perl_yyquit(pTHX);
 PERL_CALLCONV void     Perl_yyunlex(pTHX);
 #if !(defined(DEBUGGING))
 #  if !defined(NV_PRESERVES_UV)
@@ -5621,9 +5627,6 @@ STATIC bool       S_isa_lookup(pTHX_ HV *stash, const 
char * const name, STRLEN len, U
        assert(stash); assert(name)
 #endif
 #if defined(PERL_IN_UTF8_C)
-STATIC char *  S__byte_dump_string(pTHX_ const U8 * s, const STRLEN len);
-#define PERL_ARGS_ASSERT__BYTE_DUMP_STRING     \
-       assert(s)
 STATIC UV      S__to_utf8_case(pTHX_ const UV uv1, const U8 *p, U8* ustrp, 
STRLEN *lenp, SV **swashp, const char *normal, const char *special);
 #define PERL_ARGS_ASSERT__TO_UTF8_CASE \
        assert(p); assert(ustrp); assert(swashp); assert(normal)
diff --git a/sv.c b/sv.c
index 472d69c4e5..e0c327a350 100644
--- a/sv.c
+++ b/sv.c
@@ -13204,6 +13204,7 @@ Perl_parser_dup(pTHX_ const yy_parser *const proto, 
CLONE_PARAMS *const param)
     parser->sig_elems  = proto->sig_elems;
     parser->sig_optelems= proto->sig_optelems;
     parser->sig_slurpy  = proto->sig_slurpy;
+    parser->recheck_utf8_validity = proto->recheck_utf8_validity;
     parser->linestr    = sv_dup_inc(proto->linestr, param);
 
     {
diff --git a/t/lib/croak/toke b/t/lib/croak/toke
index f1817b3a7e..40354955e5 100644
--- a/t/lib/croak/toke
+++ b/t/lib/croak/toke
@@ -180,9 +180,16 @@ Execution of - aborted due to compilation errors.
 # NAME Regexp constant overloading when *^H is undefined
 use overload;
 BEGIN { overload::constant qr => sub {}; undef *^H }
-/a/, m'a'
+/a/
 EXPECT
 Constant(qq) unknown at - line 3, within pattern
+Execution of - aborted due to compilation errors.
+########
+# NAME Regexp constant overloading when *^H is undefined
+use overload;
+BEGIN { overload::constant qr => sub {}; undef *^H }
+m'a'
+EXPECT
 Constant(q) unknown at - line 3, within pattern
 Execution of - aborted due to compilation errors.
 ########
@@ -232,9 +239,16 @@ Execution of - aborted due to compilation errors.
 # NAME Regexp constant overloading returning undef
 use overload;
 BEGIN { overload::constant qr => sub {} }
-/a/, m'a'
+/a/
 EXPECT
 Constant(qq): Call to &{$^H{qr}} did not return a defined value at - line 3, 
within pattern
+Execution of - aborted due to compilation errors.
+########
+# NAME Regexp constant overloading returning undef
+use overload;
+BEGIN { overload::constant qr => sub {} }
+m'a'
+EXPECT
 Constant(q): Call to &{$^H{qr}} did not return a defined value at - line 3, 
within pattern
 Execution of - aborted due to compilation errors.
 ########
diff --git a/t/lib/croak/toke_l1 b/t/lib/croak/toke_l1
index bb85b025d5..6d656beb7c 100644
Binary files a/t/lib/croak/toke_l1 and b/t/lib/croak/toke_l1 differ
diff --git a/t/re/re_tests b/t/re/re_tests
index f210202657..410fceadac 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1464,13 +1464,10 @@ abc\N   abc\n   n
 # Verify get errors.  For these, we need // or else puts it in single quotes,
 # and bypasses the lexer.
 /\N{U+}/       -       c       -       Invalid hexadecimal number
-# Below currently gives a misleading message
-/[\N{U+}]/     -       Sc      -       Unmatched
-/[\N{U+}]/     -       sc      -       Syntax error in (?[...])
+/[\N{U+}]/     -       c       -       Invalid hexadecimal number
 /abc\N{def/    -       c       -       Missing right brace
 /\N{U+4AG3}/   -       c       -       Invalid hexadecimal number
-/[\N{U+4AG3}]/ -       Sc      -       Unmatched
-/[\N{U+4AG3}]/ -       sc      -       Syntax error in (?[...])
+/[\N{U+4AG3}]/ -       c       -       Invalid hexadecimal number
 
 # And verify that in single quotes which bypasses the lexer, the regex compiler
 # figures it out.
diff --git a/t/uni/parser.t b/t/uni/parser.t
index 624fdd04df..2c68fb0473 100644
--- a/t/uni/parser.t
+++ b/t/uni/parser.t
@@ -197,7 +197,7 @@ like( $@, qr/Bad name after ï¼¦ï½ï½'/, 'Bad name after 
ï¼¦ï½ï½\'' );
                            ? "\x{74}\x{41}"
                            : "\x{c0}\x{a0}";
     CORE::evalbytes "use charnames ':full'; use utf8; my \$x = 
\"\\N{abc$malformed_to_be}\"";
-    like( $@, qr/Malformed UTF-8 character immediately after '\\N\{abc' at .* 
within string/, 'Malformed UTF-8 input to \N{}');
+    like( $@, qr/Malformed UTF-8 character \(fatal\) at /, 'Malformed UTF-8 
input to \N{}');
 }
 
 # RT# 124216: Perl_sv_clear: Assertion
diff --git a/toke.c b/toke.c
index d529f03b72..b9ea8486c1 100644
--- a/toke.c
+++ b/toke.c
@@ -701,7 +701,6 @@ Perl_lex_start(pTHX_ SV *line, PerlIO *rsfp, U32 flags)
 {
     const char *s = NULL;
     yy_parser *parser, *oparser;
-    const U8* first_bad_char_loc;
 
     if (flags && flags & ~LEX_START_FLAGS)
        Perl_croak(aTHX_ "Lexing code internal error (%s)", "lex_start");
@@ -728,6 +727,7 @@ Perl_lex_start(pTHX_ SV *line, PerlIO *rsfp, U32 flags)
     parser->lex_state = LEX_NORMAL;
     parser->expect = XSTATE;
     parser->rsfp = rsfp;
+    parser->recheck_utf8_validity = FALSE;
     parser->rsfp_filters =
       !(flags & LEX_START_SAME_FILTER) || !oparser
         ? NULL
@@ -744,11 +744,14 @@ Perl_lex_start(pTHX_ SV *line, PerlIO *rsfp, U32 flags)
 
     if (line) {
        STRLEN len;
+        const U8* first_bad_char_loc;
+
        s = SvPV_const(line, len);
 
-        if (SvUTF8(line) && ! is_utf8_string_loc((U8 *) s,
-                                                 SvCUR(line),
-                                                 &first_bad_char_loc))
+        if (   SvUTF8(line)
+            && UNLIKELY(! is_utf8_string_loc((U8 *) s,
+                                             SvCUR(line),
+                                             &first_bad_char_loc)))
         {
             _force_out_malformed_utf8_message(first_bad_char_loc,
                                               (U8 *) s + SvCUR(line),
@@ -765,6 +768,7 @@ Perl_lex_start(pTHX_ SV *line, PerlIO *rsfp, U32 flags)
     } else {
        parser->linestr = newSVpvn("\n;", rsfp ? 1 : 2);
     }
+
     parser->oldoldbufptr =
        parser->oldbufptr =
        parser->bufptr =
@@ -1053,12 +1057,7 @@ Perl_lex_stuff_pvn(pTHX_ const char *pv, STRLEN len, U32 
flags)
                } else if (UTF8_IS_NEXT_CHAR_DOWNGRADEABLE(p, e)) {
                    p++;
                    highhalf++;
-                } else if (! UTF8_IS_INVARIANT(c)) {
-                    _force_out_malformed_utf8_message((U8 *) p, (U8 *) e,
-                                                      0,
-                                                      1 /* 1 means die */ );
-                    NOT_REACHED; /* NOTREACHED */
-               }
+                } else assert(UTF8_IS_INVARIANT(c));
            }
            if (!highhalf)
                goto plain_copy;
@@ -1272,6 +1271,24 @@ Perl_lex_discard_to(pTHX_ char *ptr)
        PL_parser->last_lop -= discard_len;
 }
 
+void
+Perl_notify_parser_that_changed_to_utf8(pTHX)
+{
+    /* Called when $^H is changed to indicate that HINT_UTF8 has changed from
+     * off to on.  At compile time, this has the effect of entering a 'use
+     * utf8' section.  This means that any input was not previously checked for
+     * UTF-8 (because it was off), but now we do need to check it, or our
+     * assumptions about the input being sane could be wrong, and we could
+     * segfault.  This routine just sets a flag so that the next time we look
+     * at the input we do the well-formed UTF-8 check.  If we aren't in the
+     * proper phase, there may not be a parser object, but if there is, setting
+     * the flag is harmless */
+
+    if (PL_parser) {
+        PL_parser->recheck_utf8_validity = TRUE;
+    }
+}
+
 /*
 =for apidoc Amx|bool|lex_next_chunk|U32 flags
 
@@ -1307,7 +1324,6 @@ Perl_lex_next_chunk(pTHX_ U32 flags)
     STRLEN linestart_pos, last_uni_pos, last_lop_pos;
     bool got_some_for_debugger = 0;
     bool got_some;
-    const U8* first_bad_char_loc;
 
     if (flags & ~(LEX_KEEP_PREVIOUS|LEX_FAKE_EOF|LEX_NO_TERM))
        Perl_croak(aTHX_ "Lexing code internal error (%s)", "lex_next_chunk");
@@ -1374,15 +1390,19 @@ Perl_lex_next_chunk(pTHX_ U32 flags)
     PL_parser->bufend = buf + new_bufend_pos;
     PL_parser->bufptr = buf + bufptr_pos;
 
-    if (UTF && ! is_utf8_string_loc((U8 *) PL_parser->bufptr,
-                                    PL_parser->bufend - PL_parser->bufptr,
-                                    &first_bad_char_loc))
-    {
-        _force_out_malformed_utf8_message(first_bad_char_loc,
-                                          (U8 *) PL_parser->bufend,
-                                          0,
-                                          1 /* 1 means die */ );
-        NOT_REACHED; /* NOTREACHED */
+    if (UTF) {
+        const U8* first_bad_char_loc;
+        if (UNLIKELY(! is_utf8_string_loc(
+                            (U8 *) PL_parser->bufptr,
+                                   PL_parser->bufend - PL_parser->bufptr,
+                                   &first_bad_char_loc)))
+        {
+            _force_out_malformed_utf8_message(first_bad_char_loc,
+                                              (U8 *) PL_parser->bufend,
+                                              0,
+                                              1 /* 1 means die */ );
+            NOT_REACHED; /* NOTREACHED */
+        }
     }
 
     PL_parser->oldbufptr = buf + oldbufptr_pos;
@@ -2263,10 +2283,9 @@ S_force_strict_version(pTHX_ char *s)
 
 /*
  * S_tokeq
- * Tokenize a quoted string passed in as an SV.  It finds the next
- * chunk, up to end of string or a backslash.  It may make a new
- * SV containing that chunk (if HINT_NEW_STRING is on).  It also
- * turns \\ into \.
+ * Turns any \\ into \ in a quoted string passed in in 'sv', returning 'sv',
+ * modified as necessary.  However, if HINT_NEW_STRING is on, 'sv' is
+ * unchanged, and a new SV containing the modified input is returned.
  */
 
 STATIC SV *
@@ -2571,7 +2590,6 @@ S_get_and_check_backslash_N_name(pTHX_ const char* s, 
const char* const e)
     SV *cv;
     SV *rv;
     HV *stash;
-    const U8* first_bad_char_loc;
     const char* backslash_ptr = s - 3; /* Points to the <\> of \N{... */
 
     PERL_ARGS_ASSERT_GET_AND_CHECK_BACKSLASH_N_NAME;
@@ -2581,21 +2599,6 @@ S_get_and_check_backslash_N_name(pTHX_ const char* s, 
const char* const e)
         return res;
     }
 
-    if (UTF && ! is_utf8_string_loc((U8 *) backslash_ptr,
-                                     e - backslash_ptr,
-                                     &first_bad_char_loc))
-    {
-        _force_out_malformed_utf8_message(first_bad_char_loc,
-                                          (U8 *) PL_parser->bufend,
-                                          0,
-                                          0 /* 0 means don't die */ );
-        yyerror_pv(Perl_form(aTHX_
-            "Malformed UTF-8 character immediately after '%.*s'",
-            (int) (first_bad_char_loc - (U8 *) backslash_ptr), backslash_ptr),
-                   SVf_UTF8);
-       return NULL;
-    }
-
     res = new_constant( NULL, 0, "charnames", res, NULL, backslash_ptr,
                         /* include the <}> */
                         e - backslash_ptr + 1);
@@ -2720,7 +2723,9 @@ S_get_and_check_backslash_N_name(pTHX_ const char* s, 
const char* const e)
         const U8* first_bad_char_loc;
         STRLEN len;
         const char* const str = SvPV_const(res, len);
-        if (! is_utf8_string_loc((U8 *) str, len, &first_bad_char_loc)) {
+        if (UNLIKELY(! is_utf8_string_loc((U8 *) str, len,
+                                          &first_bad_char_loc)))
+        {
             _force_out_malformed_utf8_message(first_bad_char_loc,
                                               (U8 *) PL_parser->bufend,
                                               0,
@@ -3024,7 +3029,6 @@ S_scan_const(pTHX_ char *start)
                 bool convert_unicode;
                 IV real_range_max = 0;
 #endif
-
                 /* Get the code point values of the range ends. */
                 if (has_utf8) {
                     /* We know the utf8 is valid, because we just constructed
@@ -3050,7 +3054,11 @@ S_scan_const(pTHX_ char *start)
                  * that code point is already in the output, twice.  We can
                  * just back up over the second instance and avoid all the rest
                  * of the work.  But if it is a variant character, it's been
-                 * counted twice, so decrement */
+                 * counted twice, so decrement.  (This unlikely scenario is
+                 * special cased, like the one for a range of 2 code points
+                 * below, only because the main-line code below needs a range
+                 * of 3 or more to work without special casing.  Might as well
+                 * get it out of the way now.) */
                 if (UNLIKELY(range_max == range_min)) {
                     d = max_ptr;
                     if (! has_utf8 && ! UVCHR_IS_INVARIANT(range_max)) {
@@ -3633,6 +3641,7 @@ S_scan_const(pTHX_ char *start)
                s++;
                if (*s != '{') {
                    yyerror("Missing braces on \\N{}");
+                    *d++ = '\0';
                    continue;
                }
                s++;
@@ -3644,7 +3653,7 @@ S_scan_const(pTHX_ char *start)
                    } else {
                        yyerror("Missing right brace on \\N{} or unescaped left 
brace after \\N");
                    }
-                   continue;
+                    yyquit(); /* Have exhausted the input. */
                }
 
                /* Here it looks like a named character */
@@ -3663,6 +3672,7 @@ S_scan_const(pTHX_ char *start)
                                 "Invalid hexadecimal number in \\N{U+...}"
                             );
                             s = e + 1;
+                            *d++ = '\0';
                             continue;
                         }
                         while (++s < e) {
@@ -3861,6 +3871,7 @@ S_scan_const(pTHX_ char *start)
                                     " in transliteration operator",
                                         /*  +1 to include the "}" */
                                     (int) (e + 1 - start), start));
+                                *d++ = '\0';
                                 goto end_backslash_N;
                             }
 
@@ -3926,15 +3937,16 @@ S_scan_const(pTHX_ char *start)
            case 'c':
                s++;
                if (s < send) {
-                   *d++ = grok_bslash_c(*s++, 1);
+                   *d++ = grok_bslash_c(*s, 1);
                }
                else {
                    yyerror("Missing control char name in \\c");
+                   yyquit();   /* Are at end of input, no sense continuing */
                }
 #ifdef EBCDIC
                 non_portable_endpoint++;
 #endif
-               continue;
+                break;
 
            /* printf-style backslashes, formfeeds, newlines, etc */
            case 'b':
@@ -4752,6 +4764,20 @@ Perl_yylex(pTHX)
     GV *gv = NULL;
     GV **gvp = NULL;
 
+    if (UNLIKELY(PL_parser->recheck_utf8_validity)) {
+        const U8* first_bad_char_loc;
+        if (UTF && UNLIKELY(! is_utf8_string_loc((U8 *) PL_bufptr,
+                                                        PL_bufend - PL_bufptr,
+                                                        &first_bad_char_loc)))
+        {
+            _force_out_malformed_utf8_message(first_bad_char_loc,
+                                              (U8 *) PL_bufend,
+                                              0,
+                                              1 /* 1 means die */ );
+            NOT_REACHED; /* NOTREACHED */
+        }
+        PL_parser->recheck_utf8_validity = FALSE;
+    }
     DEBUG_T( {
        SV* tmp = newSVpvs("");
        PerlIO_printf(Perl_debug_log, "### %" IVdf ":LEX_%s/X%s %s\n",
@@ -4999,7 +5025,16 @@ Perl_yylex(pTHX)
            s = PL_bufend;
        }
        else {
+            int save_error_count = PL_error_count;
+
            s = scan_const(PL_bufptr);
+
+            /* Quit if this was a pattern and there were errors.  This prevents
+             * us from trying to regex compile a broken pattern, which could
+             * lead to segfaults, etc. */
+            if (PL_lex_inpat && PL_error_count > save_error_count) {
+                yyquit();
+            }
            if (*s == '\\')
                PL_lex_state = LEX_INTERPCASEMOD;
            else
@@ -5108,12 +5143,6 @@ Perl_yylex(pTHX)
     switch (*s) {
     default:
        if (UTF) {
-            if (! isUTF8_CHAR((U8 *) s, (U8 *) PL_bufend)) {
-                _force_out_malformed_utf8_message((U8 *) s, (U8 *) PL_bufend,
-                                                  0,
-                                                  1 /* 1 means die */ );
-                NOT_REACHED; /* NOTREACHED */
-            }
             if (isIDFIRST_utf8_safe(s, PL_bufend)) {
                 goto keylookup;
             }
@@ -10542,10 +10571,14 @@ S_scan_str(pTHX_ char *start, int 
keep_bracketed_quoted, int keep_delims, int re
                }
                /* terminate when run out of buffer (the for() condition), or
                   have found the terminator */
-               else if (*s == term) {
-                   if (termlen == 1)
+               else if (*s == term) {  /* First byte of terminator matches */
+                   if (termlen == 1)   /* If is the only byte, are done */
                        break;
-                   if (s+termlen <= PL_bufend && memEQ(s, (char*)termstr, 
termlen))
+
+                    /* If the remainder of the terminator matches, also are
+                     * done, after checking that is a separate grapheme */
+                    if (   s + termlen <= PL_bufend
+                        && memEQ(s + 1, (char*)termstr + 1, termlen - 1))
                     {
                         if (   check_grapheme
                             && UNLIKELY(! _is_grapheme((U8 *) start,
@@ -10559,8 +10592,10 @@ S_scan_str(pTHX_ char *start, int 
keep_bracketed_quoted, int keep_delims, int re
                        break;
                     }
                }
-               else if (!has_utf8 && !UTF8_IS_INVARIANT((U8)*s) && UTF)
+               else if (!has_utf8 && !UTF8_IS_INVARIANT((U8)*s) && UTF) {
                    has_utf8 = TRUE;
+                }
+
                *to = *s;
            }
        }
@@ -11428,6 +11463,29 @@ S_yywarn(pTHX_ const char *const s, U32 flags)
     return 0;
 }
 
+void
+Perl_abort_execution(pTHX_ const char * const msg, const char * const name)
+{
+    PERL_ARGS_ASSERT_ABORT_EXECUTION;
+
+    if (PL_minus_c)
+        Perl_croak(aTHX_ "%s%s had compilation errors.\n", msg, name);
+    else {
+        Perl_croak(aTHX_
+                "%sExecution of %s aborted due to compilation errors.\n", msg, 
name);
+    }
+    NOT_REACHED; /* NOTREACHED */
+}
+
+void
+Perl_yyquit(pTHX)
+{
+    /* Called, after at least one error has been found, to abort the parse now,
+     * instead of trying to forge ahead */
+
+    yyerror_pvn(NULL, 0, 0);
+}
+
 int
 Perl_yyerror(pTHX_ const char *const s)
 {
@@ -11451,100 +11509,120 @@ Perl_yyerror_pvn(pTHX_ const char *const s, STRLEN 
len, U32 flags)
     SV * const where_sv = newSVpvs_flags("", SVs_TEMP);
     int yychar  = PL_parser->yychar;
 
-    PERL_ARGS_ASSERT_YYERROR_PVN;
-
-    if (!yychar || (yychar == ';' && !PL_rsfp))
-       sv_catpvs(where_sv, "at EOF");
-    else if (   PL_oldoldbufptr
-             && PL_bufptr > PL_oldoldbufptr
-             && PL_bufptr - PL_oldoldbufptr < 200
-             && PL_oldoldbufptr != PL_oldbufptr
-             && PL_oldbufptr != PL_bufptr)
-    {
-       /*
-               Only for NetWare:
-               The code below is removed for NetWare because it abends/crashes 
on NetWare
-               when the script has error such as not having the closing quotes 
like:
-                   if ($var eq "value)
-               Checking of white spaces is anyway done in NetWare code.
-       */
+    /* Output error message 's' with length 'len'.  'flags' are SV flags that
+     * apply.  If the number of errors found is large enough, it abandons
+     * parsing.  If 's' is NULL, there is no message, and it abandons
+     * processing unconditionally */
+
+    if (s != NULL) {
+        if (!yychar || (yychar == ';' && !PL_rsfp))
+            sv_catpvs(where_sv, "at EOF");
+        else if (   PL_oldoldbufptr
+                 && PL_bufptr > PL_oldoldbufptr
+                 && PL_bufptr - PL_oldoldbufptr < 200
+                 && PL_oldoldbufptr != PL_oldbufptr
+                 && PL_oldbufptr != PL_bufptr)
+        {
+            /*
+                    Only for NetWare:
+                    The code below is removed for NetWare because it
+                    abends/crashes on NetWare when the script has error such as
+                    not having the closing quotes like:
+                        if ($var eq "value)
+                    Checking of white spaces is anyway done in NetWare code.
+            */
 #ifndef NETWARE
-       while (isSPACE(*PL_oldoldbufptr))
-           PL_oldoldbufptr++;
+            while (isSPACE(*PL_oldoldbufptr))
+                PL_oldoldbufptr++;
 #endif
-       context = PL_oldoldbufptr;
-       contlen = PL_bufptr - PL_oldoldbufptr;
-    }
-    else if (  PL_oldbufptr
-            && PL_bufptr > PL_oldbufptr
-            && PL_bufptr - PL_oldbufptr < 200
-            && PL_oldbufptr != PL_bufptr) {
-       /*
-               Only for NetWare:
-               The code below is removed for NetWare because it abends/crashes 
on NetWare
-               when the script has error such as not having the closing quotes 
like:
-                   if ($var eq "value)
-               Checking of white spaces is anyway done in NetWare code.
-       */
+            context = PL_oldoldbufptr;
+            contlen = PL_bufptr - PL_oldoldbufptr;
+        }
+        else if (  PL_oldbufptr
+                && PL_bufptr > PL_oldbufptr
+                && PL_bufptr - PL_oldbufptr < 200
+                && PL_oldbufptr != PL_bufptr) {
+            /*
+                    Only for NetWare:
+                    The code below is removed for NetWare because it
+                    abends/crashes on NetWare when the script has error such as
+                    not having the closing quotes like:
+                        if ($var eq "value)
+                    Checking of white spaces is anyway done in NetWare code.
+            */
 #ifndef NETWARE
-       while (isSPACE(*PL_oldbufptr))
-           PL_oldbufptr++;
+            while (isSPACE(*PL_oldbufptr))
+                PL_oldbufptr++;
 #endif
-       context = PL_oldbufptr;
-       contlen = PL_bufptr - PL_oldbufptr;
-    }
-    else if (yychar > 255)
-       sv_catpvs(where_sv, "next token ???");
-    else if (yychar == YYEMPTY) {
-       if (PL_lex_state == LEX_NORMAL)
-           sv_catpvs(where_sv, "at end of line");
-       else if (PL_lex_inpat)
-           sv_catpvs(where_sv, "within pattern");
-       else
-           sv_catpvs(where_sv, "within string");
-    }
-    else {
-       sv_catpvs(where_sv, "next char ");
-       if (yychar < 32)
-           Perl_sv_catpvf(aTHX_ where_sv, "^%c", toCTRL(yychar));
-       else if (isPRINT_LC(yychar)) {
-           const char string = yychar;
-           sv_catpvn(where_sv, &string, 1);
-       }
-       else
-           Perl_sv_catpvf(aTHX_ where_sv, "\\%03o", yychar & 255);
-    }
-    msg = newSVpvn_flags(s, len, (flags & SVf_UTF8) | SVs_TEMP);
-    Perl_sv_catpvf(aTHX_ msg, " at %s line %" IVdf ", ",
-        OutCopFILE(PL_curcop),
-        (IV)(PL_parser->preambling == NOLINE
-               ? CopLINE(PL_curcop)
-               : PL_parser->preambling));
-    if (context)
-       Perl_sv_catpvf(aTHX_ msg, "near \"%" UTF8f "\"\n",
-                            UTF8fARG(UTF, contlen, context));
-    else
-       Perl_sv_catpvf(aTHX_ msg, "%" SVf "\n", SVfARG(where_sv));
-    if (PL_multi_start < PL_multi_end && (U32)(CopLINE(PL_curcop) - 
PL_multi_end) <= 1) {
-        Perl_sv_catpvf(aTHX_ msg,
-        "  (Might be a runaway multi-line %c%c string starting on line %" IVdf 
")\n",
-                (int)PL_multi_open,(int)PL_multi_close,(IV)PL_multi_start);
-        PL_multi_end = 0;
-    }
-    if (PL_in_eval & EVAL_WARNONLY) {
-       PL_in_eval &= ~EVAL_WARNONLY;
-       Perl_ck_warner_d(aTHX_ packWARN(WARN_SYNTAX), "%" SVf, SVfARG(msg));
+            context = PL_oldbufptr;
+            contlen = PL_bufptr - PL_oldbufptr;
+        }
+        else if (yychar > 255)
+            sv_catpvs(where_sv, "next token ???");
+        else if (yychar == YYEMPTY) {
+            if (PL_lex_state == LEX_NORMAL)
+                sv_catpvs(where_sv, "at end of line");
+            else if (PL_lex_inpat)
+                sv_catpvs(where_sv, "within pattern");
+            else
+                sv_catpvs(where_sv, "within string");
+        }
+        else {
+            sv_catpvs(where_sv, "next char ");
+            if (yychar < 32)
+                Perl_sv_catpvf(aTHX_ where_sv, "^%c", toCTRL(yychar));
+            else if (isPRINT_LC(yychar)) {
+                const char string = yychar;
+                sv_catpvn(where_sv, &string, 1);
+            }
+            else
+                Perl_sv_catpvf(aTHX_ where_sv, "\\%03o", yychar & 255);
+        }
+        msg = newSVpvn_flags(s, len, (flags & SVf_UTF8) | SVs_TEMP);
+        Perl_sv_catpvf(aTHX_ msg, " at %s line %" IVdf ", ",
+            OutCopFILE(PL_curcop),
+            (IV)(PL_parser->preambling == NOLINE
+                   ? CopLINE(PL_curcop)
+                   : PL_parser->preambling));
+        if (context)
+            Perl_sv_catpvf(aTHX_ msg, "near \"%" UTF8f "\"\n",
+                                 UTF8fARG(UTF, contlen, context));
+        else
+            Perl_sv_catpvf(aTHX_ msg, "%" SVf "\n", SVfARG(where_sv));
+        if (   PL_multi_start < PL_multi_end
+            && (U32)(CopLINE(PL_curcop) - PL_multi_end) <= 1)
+        {
+            Perl_sv_catpvf(aTHX_ msg,
+            "  (Might be a runaway multi-line %c%c string starting on"
+            " line %" IVdf ")\n",
+                    (int)PL_multi_open,(int)PL_multi_close,(IV)PL_multi_start);
+            PL_multi_end = 0;
+        }
+        if (PL_in_eval & EVAL_WARNONLY) {
+            PL_in_eval &= ~EVAL_WARNONLY;
+            Perl_ck_warner_d(aTHX_ packWARN(WARN_SYNTAX), "%" SVf, 
SVfARG(msg));
+        }
+        else {
+            qerror(msg);
+        }
     }
-    else
-       qerror(msg);
-    if (PL_error_count >= 10) {
-       SV * errsv;
-       if (PL_in_eval && ((errsv = ERRSV), SvCUR(errsv)))
-           Perl_croak(aTHX_ "%" SVf "%s has too many errors.\n",
-                      SVfARG(errsv), OutCopFILE(PL_curcop));
-       else
-           Perl_croak(aTHX_ "%s has too many errors.\n",
-            OutCopFILE(PL_curcop));
+    if (s == NULL || PL_error_count >= 10) {
+        const char * msg = "";
+        const char * const name = OutCopFILE(PL_curcop);
+
+       if (PL_in_eval) {
+            SV * errsv = ERRSV;
+            if (SvCUR(errsv)) {
+                msg = Perl_form(aTHX_ "%" SVf, SVfARG(errsv));
+            }
+        }
+
+        if (s == NULL) {
+            abort_execution(msg, name);
+        }
+        else {
+            Perl_croak(aTHX_ "%s%s has too many errors.\n", msg, name);
+        }
     }
     PL_in_my = 0;
     PL_in_my_stash = NULL;
diff --git a/utf8.c b/utf8.c
index bec68a5883..89c8413f7c 100644
--- a/utf8.c
+++ b/utf8.c
@@ -754,11 +754,15 @@ Perl__is_utf8_char_helper(const U8 * const s, const U8 * 
e, const U32 flags)
     return UTF8SKIP(s);
 }
 
-STATIC char *
-S__byte_dump_string(pTHX_ const U8 * s, const STRLEN len)
+char *
+Perl__byte_dump_string(pTHX_ const U8 * s, const STRLEN len, const bool format)
 {
     /* Returns a mortalized C string that is a displayable copy of the 'len'
-     * bytes starting at 's', each in a \xXY format. */
+     * bytes starting at 's'.  'format' gives how to display each byte.
+     * Currently, there are only two formats, so it is currently a bool:
+     *      0   \xab
+     *      1    ab         (that is a space between two hex digit bytes)
+     */
 
     const STRLEN output_len = 4 * len + 1;  /* 4 bytes per each input, plus a
                                                trailing NUL */
@@ -776,8 +780,13 @@ S__byte_dump_string(pTHX_ const U8 * s, const STRLEN len)
         const unsigned high_nibble = (*s & 0xF0) >> 4;
         const unsigned low_nibble =  (*s & 0x0F);
 
-        *d++ = '\\';
-        *d++ = 'x';
+        if (format) {
+            *d++ = ' ';
+        }
+        else {
+            *d++ = '\\';
+            *d++ = 'x';
+        }
 
         if (high_nibble < 10) {
             *d++ = high_nibble + '0';
@@ -827,7 +836,7 @@ S_unexpected_non_continuation_text(pTHX_ const U8 * const s,
     return Perl_form(aTHX_ "%s: %s (unexpected non-continuation byte 0x%02x,"
                            " %s after start byte 0x%02x; need %d bytes, got 
%d)",
                            malformed_text,
-                           _byte_dump_string(s, print_len),
+                           _byte_dump_string(s, print_len, 0),
                            *(s + non_cont_byte_pos),
                            where,
                            *s,
@@ -1401,7 +1410,7 @@ Perl_utf8n_to_uvchr_error(pTHX_ const U8 *s,
                         if (pack_warn) {
                             message = Perl_form(aTHX_ "%s: %s (overflows)",
                                             malformed_text,
-                                            _byte_dump_string(s0, send - s0));
+                                            _byte_dump_string(s0, send - s0, 
0));
                         }
                     }
                 }
@@ -1437,7 +1446,7 @@ Perl_utf8n_to_uvchr_error(pTHX_ const U8 *s,
                                 "%s: %s (unexpected continuation byte 0x%02x,"
                                 " with no preceding start byte)",
                                 malformed_text,
-                                _byte_dump_string(s0, 1), *s0);
+                                _byte_dump_string(s0, 1, 0), *s0);
                     }
                 }
             }
@@ -1452,7 +1461,7 @@ Perl_utf8n_to_uvchr_error(pTHX_ const U8 *s,
                         message = Perl_form(aTHX_
                                 "%s: %s (too short; %d byte%s available, need 
%d)",
                                 malformed_text,
-                                _byte_dump_string(s0, send - s0),
+                                _byte_dump_string(s0, send - s0, 0),
                                 (int)avail_len,
                                 avail_len == 1 ? "" : "s",
                                 (int)expectlen);
@@ -1516,8 +1525,8 @@ Perl_utf8n_to_uvchr_error(pTHX_ const U8 *s,
                                     " should be represented with a"
                                     " different, shorter sequence)",
                                     malformed_text,
-                                    _byte_dump_string(s0, send - s0),
-                                    _byte_dump_string(s0, curlen));
+                                    _byte_dump_string(s0, send - s0, 0),
+                                    _byte_dump_string(s0, curlen, 0));
                         }
                         else {
                             U8 tmpbuf[UTF8_MAXBYTES+1];
@@ -1527,8 +1536,8 @@ Perl_utf8n_to_uvchr_error(pTHX_ const U8 *s,
                                 "%s: %s (overlong; instead use %s to represent"
                                 " U+%0*" UVXf ")",
                                 malformed_text,
-                                _byte_dump_string(s0, send - s0),
-                                _byte_dump_string(tmpbuf, e - tmpbuf),
+                                _byte_dump_string(s0, send - s0, 0),
+                                _byte_dump_string(tmpbuf, e - tmpbuf, 0),
                                 ((uv < 256) ? 2 : 4), /* Field width of 2 for
                                                          small code points */
                                 uv);
@@ -1553,7 +1562,7 @@ Perl_utf8n_to_uvchr_error(pTHX_ const U8 *s,
                             message = Perl_form(aTHX_
                                     "UTF-16 surrogate (any UTF-8 sequence that"
                                     " starts with \"%s\" is for a surrogate)",
-                                    _byte_dump_string(s0, curlen));
+                                    _byte_dump_string(s0, curlen, 0));
                         }
                         else {
                             message = Perl_form(aTHX_
@@ -1583,7 +1592,7 @@ Perl_utf8n_to_uvchr_error(pTHX_ const U8 *s,
                                     "Any UTF-8 sequence that starts with"
                                     " \"%s\" is for a non-Unicode code point,"
                                     " may not be portable",
-                                    _byte_dump_string(s0, curlen));
+                                    _byte_dump_string(s0, curlen, 0));
                         }
                         else {
                             message = Perl_form(aTHX_
@@ -1622,7 +1631,7 @@ Perl_utf8n_to_uvchr_error(pTHX_ const U8 *s,
                                         "Any UTF-8 sequence that starts with"
                                         " \"%s\" is for a non-Unicode code"
                                         " point, and is not portable",
-                                        _byte_dump_string(s0, curlen));
+                                        _byte_dump_string(s0, curlen, 0));
                         }
                         else {
                             message = Perl_form(aTHX_

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.25.9-138-g7e2f38b21a

Reply via email to