Luke Palmer wrote:
This list is for people interested in building the Perl 6 compiler. Now
you have your first real task!
We have to make a formal grammar for Perl 6. Perl 6 is a huge language,
so the task seems better done incrementally by the community...
Send patches to this list.
OK, I'll bite. In contrast to Luke's 50-thousand-foot level, I'm
diving down into the goriest of details. At the end of this message
is a rule for whitespace within Perl code, and supporting rules for
comments and pod.
I'm not posting this as a diff, because I have the faint suspicion
that others might have been hacking on this file offline. But I
gather these rules should go in the "TOKENS" section.
[By the way, shouldn't this grammar be called "Perl" rather than
"Perl6::Grammar"? Also, is this file now available in some repository
somewhere?]
I'd like reviewers to pay special attention to the pod stuff. It's
not clear to me what the precise rules are or should be for blank
lines preceding pod commands. I got from S02 the idea that we should
allow standalone =begin/=end sections (and that they should nest).
But does the =end line have to be preceded by a blank line? As far as
I can tell, the =begin line does not. In the interest of symmetry, I
have written the rules to not require a blank line before the closing
=end either. Even though this appears to violate the usual rules for
pod.
(Another guy called) Luke
====================================
# Whitespace definition for Perl code.
rule ws() {
# Case 1: Unicode space characters, comments, or POD blocks, or
# any combination thereof.
[ \s | «comment» | «pod» ]+
# Case 2: We're looking at a non-word-constituent or EOF,
# meaning zero-width counts as whitespace.
| <before \W> | $
# Case 3: We must be looking at a word constituent. We match
# whitespace at BOF or after a non-word-constituent.
| ^ | <after \W>
}
# Comment definition for Perl code.
rule comment() {
# A hash ("#"), then everything through the next newline or EOF.
<'#'> .*? [ \n | $ ]
}
# A POD block, as extended for P6. This is a =begin/=end pair, a =for
# paragraph, or a standard =<anything>/=cut block.
rule pod() {
# Case 1: a =begin/=end block, in its own rule so it can
# recurse.
«pod_begin_end_block»
# Case 2: a =for paragraph. "=for" at BOL, plus any space
# character, starts it, and the first blank line (or EOF) ends
# it.
| ^^=for \s :: .*? [ \n \h* \n | $ ]
# Case 3: any arbitrary POD block. Starts with "=" at BOL,
# followed by a letter, ends with "=cut" at BOL or at EOF.
| ^^=<+<alpha>> :: .*? [ \n =cut [ \s | $ ] | $ ]
}
# A (recursive) =begin/=end POD block.
rule pod_begin_end_block() {
# Starts with "=begin" at BOL, followed by an optional name
# which we save to match with the corresponding "=end".
^^=begin [ \h+ $<name> := (\S+) | \h* \n ]
# Next comes any number of single characters or nested =begin/
# =end blocks -- but the smallest number that will match...
[ . | «pod_begin_end_block» ]*?
# ...an "=end" at BOL followed by the name saved above, or
# followed by nothing if there wasn't one. If we make it to EOF
# without finding the "=end" line, we blow up.
[
^^=end [ <( $<name> )> :: \h+ $<name> | <null> ] \h* [ \n | $ ]
|
$ <commit> { fail "Unterminated =begin/=end block" }
]
}