In perl.git, the branch blead has been updated <http://perl5.git.perl.org/perl.git/commitdiff/c7f317a9270a52c9028667b8adec18e94f450586?hp=2c0445268a1bb7696e04b8b9b324c3d6880bb18a>
- Log ----------------------------------------------------------------- commit c7f317a9270a52c9028667b8adec18e94f450586 Author: David Mitchell <[email protected]> Date: Wed Apr 15 08:47:18 2015 +0100 assertion failure on interpolated parse err RT# 124216 When paring the interpolated string "$X", where X is a unicode char that is not a legal variable name, failure to restore things properly during error recovery led to corrupted state and assertion failures. In more detail: When parsing a double-quoted string, S_sublex_push() saves most of the current parser state. On parse error, the save stack is popped back, which restores all that state. However, PL_lex_defer wasn't being saved, so if we were in the middle of handling a forced token, PL_lex_state gets restored from PL_lex_defer, and suddenly the lexer thinks we're back inside an interpolated string again. So S_sublex_done() gets called multiple times, too many scopes are popped, and things like PL_compcv are freed prematurely. Note that in order to reproduce: * we must be within a double quoted context; * we must be parsing a var (which causes a forced token); * the variable name must be illegal, which implies unicode, as chr(0..255) are all legal names; * the terminating string quote must be the last char of the input file, as this code: case LEX_INTERPSTART: if (PL_bufptr == PL_bufend) return REPORT(sublex_done()); won't trigger an extra call to sublex_done() otherwise. I'm sure this bug affects other cases too, but this was the only way I found to reproduce. ----------------------------------------------------------------------- Summary of changes: t/uni/parser.t | 26 +++++++++++++++++++++++++- toke.c | 1 + 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/t/uni/parser.t b/t/uni/parser.t index 9c39943..3d89249 100644 --- a/t/uni/parser.t +++ b/t/uni/parser.t @@ -9,7 +9,7 @@ BEGIN { skip_all_without_unicode_tables(); } -plan (tests => 51); +plan (tests => 52); use utf8; use open qw( :utf8 :std ); @@ -197,3 +197,27 @@ like( $@, qr/Bad name after Fï½ï½'/, 'Bad name after Fï½ï½\'' ); CORE::evalbytes "use charnames ':full'; use utf8; my \$x = \"\\N{abc$malformed_to_be}\""; like( $@, qr/Malformed UTF-8 character immediately after '\\N\{abc' at .* within string/, 'Malformed UTF-8 input to \N{}'); } + +# RT# 124216: Perl_sv_clear: Assertion +# If a parsing error occurred during a forced token within an interpolated +# context, the stack unwinding failed to restore PL_lex_defer and so after +# error recovery the state restored after the forced token was processed +# was the wrong one, resulting in the lexer thinking we're still inside a +# quoted string and things getting freed multiple times. +# +# \xe3\x80\xb0 are the utf8 bytes making up the character \x{3030}. +# The \x{3030} char isn't a legal var name, and this triggers the error. +# +# NB: this only failed if the closing quote of the interpolated string is +# the last char of the file (i.e. no trailing \n). + +{ + no utf8; + + fresh_perl_is(qq{use utf8; "\$\xe3\x80\xb0"}, <<EOF, { stderr => 1}, +Wide character in print at - line 1.\ +syntax error at - line 1, near "\$\xe3\x80\xb0" +Execution of - aborted due to compilation errors. +EOF + "RT# 124216"); +} diff --git a/toke.c b/toke.c index 2a99f0b..294cb8f 100644 --- a/toke.c +++ b/toke.c @@ -2342,6 +2342,7 @@ S_sublex_push(pTHX) SAVEI32(PL_lex_casemods); SAVEI32(PL_lex_starts); SAVEI8(PL_lex_state); + SAVEI8(PL_lex_defer); SAVESPTR(PL_lex_repl); SAVEVPTR(PL_lex_inpat); SAVEI16(PL_lex_inwhat); -- Perl5 Master Repository
