Re: Let the hacking commence!

Luke Blanshard Sat, 08 Jan 2005 02:56:25 -0800

Luke Palmer wrote:

This list is for people interested in building the Perl 6 compiler. Now you have your first real task!
We have to make a formal grammar for Perl 6.  Perl 6 is a huge language,
so the task seems better done incrementally by the community...
Send patches to this list.


OK, I'll bite.  In contrast to Luke's 50-thousand-foot level, I'm
diving down into the goriest of details.  At the end of this message
is a rule for whitespace within Perl code, and supporting rules for
comments and pod.

I'm not posting this as a diff, because I have the faint suspicion
that others might have been hacking on this file offline.  But I
gather these rules should go in the "TOKENS" section.

[By the way, shouldn't this grammar be called "Perl" rather than
"Perl6::Grammar"?  Also, is this file now available in some repository
somewhere?]

I'd like reviewers to pay special attention to the pod stuff.  It's
not clear to me what the precise rules are or should be for blank
lines preceding pod commands.  I got from S02 the idea that we should
allow standalone =begin/=end sections (and that they should nest).
But does the =end line have to be preceded by a blank line?  As far as
I can tell, the =begin line does not.  In the interest of symmetry, I
have written the rules to not require a blank line before the closing
=end either.  Even though this appears to violate the usual rules for
pod.


(Another guy called) Luke


====================================

# Whitespace definition for Perl code.
rule ws() {
      # Case 1: Unicode space characters, comments, or POD blocks, or
      # any combination thereof.
    [ \s | Ťcommentť | Ťpodť ]+

      # Case 2: We're looking at a non-word-constituent or EOF,
      # meaning zero-width counts as whitespace.
  | <before \W> | $

      # Case 3: We must be looking at a word constituent.  We match
      # whitespace at BOF or after a non-word-constituent.
  | ^ | <after \W>
}

# Comment definition for Perl code.
rule comment() {
      # A hash ("#"), then everything through the next newline or EOF.
    <'#'> .*? [ \n | $ ]
}

# A POD block, as extended for P6.  This is a =begin/=end pair, a =for
# paragraph, or a standard =<anything>/=cut block.
rule pod() {
      # Case 1: a =begin/=end block, in its own rule so it can
      # recurse.
    Ťpod_begin_end_blockť

      # Case 2: a =for paragraph.  "=for" at BOL, plus any space
      # character, starts it, and the first blank line (or EOF) ends
      # it.
  | ^^=for \s :: .*? [ \n \h* \n | $ ]

      # Case 3: any arbitrary POD block.  Starts with "=" at BOL,
      # followed by a letter, ends with "=cut" at BOL or at EOF.
  | ^^=<+<alpha>> :: .*? [ \n =cut [ \s | $ ] | $ ]
}

# A (recursive) =begin/=end POD block.
rule pod_begin_end_block() {
      # Starts with "=begin" at BOL, followed by an optional name
      # which we save to match with the corresponding "=end".
    ^^=begin [ \h+ $<name> := (\S+) | \h* \n ]

      # Next comes any number of single characters or nested =begin/
      # =end blocks -- but the smallest number that will match...
    [ . | Ťpod_begin_end_blockť ]*?

      # ...an "=end" at BOL followed by the name saved above, or
      # followed by nothing if there wasn't one.  If we make it to EOF
      # without finding the "=end" line, we blow up.
    [
      ^^=end [ <( $<name> )> :: \h+ $<name> | <null> ] \h* [ \n | $ ]
    |
      $ <commit> { fail "Unterminated =begin/=end block" }
    ]
}

Re: Let the hacking commence!

Reply via email to