In perl.git, the branch sprout/deerhock has been created
<http://perl5.git.perl.org/perl.git/commitdiff/5d1edaf9fee1dedc9121e3ece01051033dabc78b?hp=0000000000000000000000000000000000000000>
at 5d1edaf9fee1dedc9121e3ece01051033dabc78b (commit)
- Log -----------------------------------------------------------------
commit 5d1edaf9fee1dedc9121e3ece01051033dabc78b
Author: Father Chrysostomos <[email protected]>
Date: Sun Aug 19 02:45:38 2012 -0700
[perl #114040] Parse here-docs correctly in quoted constructs
When parsing code outside a string eval or quoted construct, the lexer
reads one line at a time into PL_linestr.
To parse a here-doc (hereinafter âdeer hockâ, because I spike lunar-
isms), the lexer has to pull extra lines out of the input stream ahead
of the current line, the value of PL_linestr remaining the same.
In a string eval, the entire piece of code being parsed is in
PL_linestr.
To parse a deer hock inside a string eval, the lexer has to fiddle
with the contents of PL_linestr, scanning for newline characters.
Originally, S_scan_heredoc just followed those two approaches.
When the lexer encounters a quoted construct, it looks for the end-
ing delimiter (reading from the input stream if necessary), puts the
entire quoted thing (minus quotes) in PL_linestr, and then starts an
inner lexing scope.
This means that deer hocks would not nest properly outside of a string
eval, because the body of the inner deer hock would be pulled out of
the input stream *after* the outer deer hock.
Larry Wall fixed that in commit fd2d095329 (Jan. 1997), so that this
would work:
<<foo
${\<<bar}
ber
bar
foo
He did so by following the string eval approach (looking for the deer
hock body in PL_linestr) if the deer hock was inside another quoted
construct.
Later, commit a2c066523a (Mar. 1998) fixed this:
s/^not /substr(<<EOF, 0, 0)/e;
Ignored
EOF
by following the string eval approach only if the deer hock was inside
another non-backtick deer hock, not just any quoted construct.
The problem with the string eval approach inside a substitu-
tion is that it only looks in PL_linestr, which only contains
âsubstr(<<EOF, 0, 0)â when the lexer is handling the second part of
the s/// operator.
But that unfortunately broke this:
s/^not /substr(<<EOF, 0, 0)
Ignored
EOF
/e;
and this:
print <<`EOF`;
${\<<EOG}
echo stuff
EOG
EOF
reverting it to the pre-fd2d095329 behaviour, because the outer quoted
construct was treated as one line.
Later on, commit 0244c3a403 (Mar. 1999) fixed this:
eval 's/.../<<FOO/e
stuff
FOO
';
which required a new approach not used before. When the replacement
part of the s/// is being parsed, PL_linestr contains â<<FOOâ. The
body of the deer hock is not in the input stream (there isnât one),
but in what was the previous value of PL_linestr before the lexer
encountered s///.
So 0244c3a403 fixed that by recording pointers into the outer string
and using them in S_scan_heredoc. That commit, for some reason, was
written such that it applied only to substitutions, and not to other
quoted constructs.
It also failed to take interpolation into account, and did not record
the outer buffer position, but then tried to use it anyway, resulting
in crashes in both these cases:
eval 's/${ <<END }//';
eval 's//${ <<END }//';
It also failed to take multiline s///âs into account, resulting in
neither of these working, because it lost track of the current cursor,
leaving it at 'D' instead of the line break following it:
eval '
s//<<END
/e;
blah blah blah
END
;1' or die $@;
eval '
s//<<END
blah blah blah
END
/e;
;1' or die $@;
S_scan_heredoc currently positions the cursor (s) at the last charac-
ter of <<END if there is a line break on the same line. There is an
s++ later on to account, but the code added by 0244c3a403 bypassed it.
So, in the end, deer hocks could only be nested in other quoted con-
structs if the outer construct was in a string eval and was not s///,
or was a non-backtick deer hock.
This commit hopefully fixes the problems. :-)
The s///-in-eval case is a little tricky. We have to see whether the
deer hock label is on the last line of the s///. If it is, we have
to peek into the outer buffer. Otherwise, we have to treat it like a
string eval.
This commit does not deal with <<END inside the pattern of a multi-
line s///.
M t/comp/parser.t
M toke.c
commit 0a4ac69cf74bcf037b680935fdade9a2d867a586
Author: Father Chrysostomos <[email protected]>
Date: Sat Aug 18 23:54:02 2012 -0700
[perl #70836] Fix err msg for unterminated here-doc in eval
$ perl -e '<<foo'
Can't find string terminator "foo" anywhere before EOF at -e line 1.
$ perl -e 'eval "<<foo"; die $@'
Can't find string terminator "
foo" anywhere before EOF at (eval 1) line 1.
An internal implementation detail is leaking out.
When the lexer happens to have a multiline string in its line buffer
(in a string eval or quoted construct), it looks for "\nfoo" instead
of "foo". It was passing that same string to the error-reporting code
(S_missingterm), resulting in that extraneous newline.
M t/lib/croak/toke
M toke.c
-----------------------------------------------------------------------
--
Perl5 Master Repository