[svn:perl6-synopsis] r14454 - doc/trunk/design/syn

larry Tue, 11 Sep 2007 11:55:04 -0700

Author: larry
Date: Tue Sep 11 11:54:28 2007
New Revision: 14454

Modified:
   doc/trunk/design/syn/S05.pod


Log:
Last (we hope) major revision of regex syntax.


Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod        (original)
+++ doc/trunk/design/syn/S05.pod        Tue Sep 11 11:54:28 2007
@@ -14,9 +14,9 @@
    Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
                Larry Wall <[EMAIL PROTECTED]>
    Date: 24 Jun 2002
-   Last Modified: 6 Sep 2007
+   Last Modified: 11 Sep 2007
    Number: 5
-   Version: 64
+   Version: 65
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I<regex> rather than "regular
@@ -36,14 +36,18 @@
 =head1 New match result and capture variables
 
 The underlying match result object is now available as the C<$/>
-variable, which is implicitly lexically scoped.  All access to the
-current (or most recent) match is through this variable, even when
+variable, which is implicitly lexically scoped.  All user access to the
+most recent match is through this variable, even when
 it doesn't look like it.  The individual capture variables (such as C<$0>,
 C<$1>, etc.) are just elements of C<$/>.
 
 By the way, unlike in Perl 5, the numbered capture variables now
 start at C<$0> instead of C<$1>.  See below.
 
+During the execution of a match, the current match state is stored in a
+C<$_> variable lexically scoped to an appropriate portion of the match.
+This is transparent to the user for simple matches.
+
 =head1 Unchanged syntactic features
 
 The following regex features use the same syntax as in Perl 5:
@@ -75,9 +79,11 @@
 While the syntax of C<|> does not change, the default semantics do
 change slightly.  We are attempting to concoct a pleasing mixture
 of declarative and procedural matching so that we can have the
-best of both.  See the section below on "Longest-token matching".
+best of both.  In short, you need not write your own tokener for
+a grammar because Perl will write one for you.  See the section
+below on "Longest-token matching".
 
-=head1 Simplified lexical parsing
+=head1 Simplified lexical parsing of patterns
 
 Unlike traditional regular expressions, Perl 6 does not require
 you to memorize an arbitrary list of metacharacters.  Instead it
@@ -202,58 +208,49 @@
 =item *
 
 The C<:c> (or C<:continue>) modifier causes the pattern to continue
-scanning from the string's current C<.pos>:
+scanning from the specified position (defaulting to C<$/.to>):
 
-     m:c/ pattern /        # start at end of
-                           # previous match on $_
+     m:c($p)/ pattern /     # start scanning at position $p
 
 Note that this does not automatically anchor the pattern to the starting
 location.  (Use C<:p> for that.)  The pattern you supply to C<split>
 has an implicit C<:c> modifier.
 
-The C<:continue> modifier takes an optional argument of type C<StrPos>
-which specifies the point at which to start scanning for a match.
-This should not be used unless you know what you're doing, or just
-happen to like hard-to-debug infinite loops.
+String positions are of type C<StrPos> and should generally be treated
+as opaque.
 
 =item *
 
 The C<:p> (or C<:pos>) modifier causes the pattern to try to match only at
-the string's current C<.pos>:
+the specified string position:
 
-     m:p/ pattern /        # match at end of
-                           # previous match on $_
+     m:pos($p)/ pattern /  # match at position $p
 
-Since this is implicitly anchored to the position, it's suitable for
-building parsers and lexers.  The pattern you supply to a Perl macro's
-C<is parsed> trait has an implicit C<:p> modifier.
+If the argument is omitted, it defaults to C<$/.to>.  (Unlike in
+Perl 5, the string itself has no clue where its last match ended.)
+All subrule matches are implicitly passed their starting position.
+Likewise, the pattern you supply to a Perl macro's C<is parsed>
+trait has an implicit C<:p> modifier.
 
 Note that
 
-     m:c/pattern/
+     m:c($p)/pattern/
 
 is roughly equivalent to
 
-     m:p/.*? <( pattern )> /
-
-Also note that any regex called as a subrule is implicitly anchored to the
-current position anyway.
-
-The C<:pos> modifier takes an optional argument of type C<StrPos>
-which specifies the point at which to attempt a match.  This should not
-be used lightly.  Put it in the category of a "goto".
+     m:p($p)/.*? <( pattern )> /
 
 =item *
 
 The new C<:s> (C<:sigspace>) modifier causes whitespace sequences
 to be considered "significant"; they are replaced by a whitespace
-matching rule, C<< <+ws> >>.  That is,
+matching rule, C<< <.ws> >>.  That is,
 
      m:s/ next cmd =   <condition>/
 
 is the same as:
 
-     m/ <+ws> next <+ws> cmd <+ws> = <+ws> <condition>/
+     m/ <.ws> next <.ws> cmd <.ws> = <.ws> <condition>/
 
 which is effectively the same as:
 
@@ -265,9 +262,9 @@
 
 or equivalently,
 
-     m { (a|\*) <+ws> (b|\+) }
+     m { (a|\*) <.ws> (b|\+) }
 
-C<< <+ws> >> can't decide what to do until it sees the data.
+C<< <.ws> >> can't decide what to do until it sees the data.
 It still does the right thing.  If not, define your own C<< ws >>
 and C<:sigspace> will use that.
 
@@ -275,8 +272,8 @@
 the parser rules automatically handle whitespace policy for you.
 In this context, whitespace often includes comments, depending on
 how the grammar chooses to define its whitespace rule.  Although the
-default C<< <+ws> >> subrule recognizes no comment construct, any
-grammar is free to override the rule.  The C<< <+ws> >> rule is not
+default C<< <.ws> >> subrule recognizes no comment construct, any
+grammar is free to override the rule.  The C<< <.ws> >> rule is not
 intended to mean the same thing everywhere.
 
 It's also possible to pass an argument to C<:sigspace> specifying
@@ -285,7 +282,7 @@
 important to distinguish the significant whitespace in the pattern from
 the "whitespace" being matched, so we'll call the pattern's whitespace
 I<sigspace>, and generally reserve I<whitespace> to indicate whatever
-C<< <+ws> >> matches in the current grammar. The correspondence
+C<< <.ws> >> matches in the current grammar. The correspondence
 between sigspace and whitespace is primarily metaphorical, which is
 why the correspondence is both useful and (potentially) confusing.
 
@@ -336,16 +333,15 @@
 If followed by an C<x>, it means repetition.  Use C<:x(4)> for the
 general form.  So
 
-     s:4x [ (<+ident>) = (\N+) $$] [$0 => $1];
+     s:4x [ (<.ident>) = (\N+) $$] [$0 => $1];
 
 is the same as:
 
-     s:x(4) [ (<+ident>) = (\N+) $$] [$0 => $1];
+     s:x(4) [ (<.ident>) = (\N+) $$] [$0 => $1];
 
 which is almost the same as:
 
-     $_.pos = 0;
-     s:c[ (<+ident>) = (\N+) $$] = "$0 => $1" for 1..4;
+     s:c[ (<.ident>) = (\N+) $$] = "$0 => $1" for 1..4;
 
 except that the string is unchanged unless all four matches are found.
 However, ranges are allowed, so you can say C<:x(1..4)> to change anywhere
@@ -418,6 +414,9 @@
 (especially if it isn't implemented yet, or is never implemented),
 all pieces of C<$/> are considered copy-on-write, if not read-only.
 
+[Conjecture: this should really associate a pattern with a string variable,
+not a (presumably immutable) string value.]
+
 =item *
 
 The new C<:keepall> modifier causes this regex and all invoked subrules
@@ -450,7 +449,7 @@
 and these are equivalent to
 
     $string ~~ m/^ \d+: $/;
-    $string ~~ m/^ <+ws> \d+: <+ws> $/;
+    $string ~~ m/^ <.ws> \d+: <.ws> $/;
 
 =item *
 
@@ -778,7 +777,7 @@
 However, a variable used as the left side of a binding or submatch
 operator is not used for matching.
 
-    $x := <ident>
+    $x = <ident>
     $0 ~~ <ident>
 
 If you do want to match C<$0> again and then use that as the submatch,
@@ -788,7 +787,11 @@
 
 It is non-sensical to bind to something that is not a variable:
 
-    "$0" := <ident>     # ERROR
+    "$0" = <ident>     # ERROR
+
+Variables used in bindings are lexically scoped to the rest of the regex.
+If the match succeeds they are remembered in the C<Match> object's hash,
+with a key corresponding to the variable name without the sigil.
 
 =item *
 
@@ -990,6 +993,15 @@
 
     <foo('bar')>
 
+If the first character after the identifier is an C<=>, then the identifier
+is taken as an alias for what follows.  In particular,
+
+    <foo=bar>
+
+is just shorthand for
+
+    $foo=<bar>
+
 If the first character after the identifier is whitespace, the
 subsequent text (following any whitespace) is passed as a regex, so:
 
@@ -1009,22 +1021,7 @@
 To pass a string with leading whitespace, or to interpolate any values
 into the string, you must use the parenthesized form.
 
-If the first character is a plus or minus, the rest of the assertion
-is parsed as a set of character classes (though the definition of
-character class is intentionally vague, and may include any other rule
-whether it matches characters exclusively or not).
-
-An initial identifier is taken as a character class, so the first
-character after the identifier doesn't matter in this case, and you
-can use whitespace however you like.  Therefore
-
-    <foo+bar-baz>
-
-can be written
-
-    <+ foo + bar - baz>
-
-Likewise an initial left square bracket indicates character class syntax.  
(See below.)
+No other characters are allowed after the initial identifier.
 
 Subrule matches are considered declarative to the extent that
 the front of the subrule is itself considered declarative.  If a
@@ -1045,7 +1042,7 @@
                              #   \s* otherwise
 
      / <at($pos)> /          # match only at a particular StrPos
-                             # short for <?{ .pos == $pos }>
+                             # short for <?{ .pos === $pos }>
                              # (considered declarative until $pos changes)
 
 The C<after> assertion implements lookbehind by reversing the syntax
@@ -1059,30 +1056,23 @@
 
 =item *
 
-A leading C<+> causes a named assertion not to capture what it matches (see
+A leading C<.> causes a named assertion not to capture what it matches (see
 L<Subrule captures>. For example:
 
      / <ident>  <ws>  /      # $/<ident> and $/<ws> both captured
-     / <+ident> <ws>  /      # only $/<ws> captured
-     / <+ident> <+ws> /      # nothing captured
+     / <.ident> <ws>  /      # only $/<ws> captured
+     / <.ident> <.ws> /      # nothing captured
 
 The non-capturing behavior may be overridden with a C<:keepall>.
 
-The rest of the assertion is reparsed as if the C<+> (and any following
-whitespace) weren't there, so it is legal (but redundant) to say:
-
-    <+++ws>
-    <+ + +ws>
-
 =item *
 
 A leading C<$> indicates an indirect subrule.  The variable must contain
 either a C<Regex> object, or a string to be compiled as the regex.  The
 string is never matched literally.
 
-By default C<< <$foo> >> is captured into C<< $<foo> >>, but you can
-use the C<< <+$foo> >> form to suppress capture, and you can always say
-C<< $<$foo> := <$foo> >> if you prefer to include the sigil in the key.
+Such an assertion is not captured.  (No assertion with leading punctuation
+is captured by default.)  You may always bind it explicitly, of course.
 
 A subrule is considered declarative to the extent that the front of it
 is declarative, and to the extent that the variable doesn't change.
@@ -1108,9 +1098,7 @@
 That is, a string is forced to be compiled as a subrule instead of being
 matched literally.  (There is no difference for a C<Regex> object.)
 
-By default C<< <@foo> >> is captured into C<< $<foo> >>, but you can
-use the C<< <[EMAIL PROTECTED]> >> form to suppress capture, and you can 
always say
-C<< $<@foo> := <@foo> >> if you prefer to include the sigil in the key.
+This assertion is not automatically captured.
 
 =item *
 
@@ -1119,9 +1107,7 @@
 to a regex at match time.  (Numeric values may still indicate "false match".
 and a closure may do whatever it likes.)
 
-By default C<< <%foo> >> is captured into C<< $<foo> >>, but you can
-use the C<< <+%foo> >> form to suppress capture, and you can always say
-C<< $<%foo> := <%foo> >> if you prefer to include the sigil in the key.
+This assertion is not automatically captured.
 
 As with bare hash, the longest key matches according to the venerable
 I<longest-token rule>.
@@ -1131,7 +1117,7 @@
 A leading C<{> indicates code that produces a regex to be interpolated
 into the pattern at that point as a subrule:
 
-     / (<+ident>)  <{ %cache{$0} //= get_body_for($0) }> /
+     / (<.ident>)  <{ %cache{$0} //= get_body_for($0) }> /
 
 The closure is guaranteed to be run at the canonical time; it declares
 a sequence point, and is considered to be procedural.
@@ -1169,7 +1155,7 @@
 time you use it unless the string changes.  (Any external lexical
 variable names must be rebound each time though.)  Subrules may not be
 interpolated with unbalanced bracketing.  An interpolated subrule
-keeps its own inner C<$/>, so its parentheses never count toward the
+keeps its own inner match result as a single item, so its parentheses never 
count toward the
 outer regexes groupings.  (In other words, parenthesis numbering is always
 lexically scoped.)
 
@@ -1201,7 +1187,7 @@
 
      / <[a..z_]>* /
 
-Whitespace is ignored within square brackets and after the initial C<+>.
+Whitespace is ignored within square brackets:
 
      / <[ a..z _ ]>* /
 
@@ -1210,6 +1196,7 @@
 A leading C<-> indicates a complemented character class:
 
      / <-[a..z_]> <-alpha> /
+     / <- [a..z_]> <- alpha> /  # whitespace allowed after -
 
 This is essentially the same as using negative lookahead and dot:
 
@@ -1220,11 +1207,11 @@
 =item *
 
 A leading C<+> may also be supplied to indicate that the following
-character class is to matched in a positive sense
+character class is to matched in a positive sense.
 
      / <+[a..z_]>* /
      / <+[ a..z _ ]>* /
-     / <+[ a .. z _ ] >* /
+     / <+ [ a .. z _ ] >* /      # whitespace allowed after +
 
 =item *
 
@@ -1233,18 +1220,12 @@
 
      / <[a..z] - [aeiou] + xdigit> /      # consonant or hex digit
 
-If such a combination starts with a named character class, a leading
-C<+> is allowed but not required, provided the next character is a
-character set operation:
-
-     / <+alpha-[Jj]> /              # J-less alpha
-     / <alpha-[Jj]> /               # same thing
-     / <+alpha - [ Jj ]> /          # still the same thing
+A named character class may be used by itself:
 
-However, whitespace is not allowed after the first identifier if it
-immediately follows the left angle.
+    <alpha>
 
-     / <alpha - [Jj]> /             # WRONG, means <alpha(/- [Jj]/)>
+However, in order to combine classes you must prefix a named
+character class with C<+> or C<->.
 
 =item *
 
@@ -1278,8 +1259,8 @@
 were not there.  In addition to forcing zero-width, it also suppresses
 any named capture:
 
-    <alpha>     # match a letter and capture in $<alpha>
-    <+alpha>    # match a letter, don't capture
+    <alpha>     # match a letter and capture to $alpha (eventually $<alpha>)
+    <.alpha>    # match a letter, don't capture
     <?alpha>    # match null before a letter, don't capture
 
 =item *
@@ -1291,7 +1272,7 @@
 
     <~~>       # call myself recursively
     <~~0>      # match according to $0's pattern
-    <~~foo>    # match according to $<foo>'s rule
+    <~~foo>    # match according to $foo's pattern
 
 Note that this rematches the pattern associated with the name, not
 the string matched.  So
@@ -1346,7 +1327,7 @@
 match "C<foo>" backwards.  The use of C<< <(...)> >> affects only the
 meaning of the I<result object> and the positions of the beginning and
 ending of the match.  That is, after the match above, C<$()> contains
-only the digits matched, and C<.pos> is pointing to after the digits.
+only the digits matched, and C<$/.to> is pointing to after the digits.
 Other captures (named or numbered) are unaffected and may be accessed
 through C<$/>.
 
@@ -1356,7 +1337,7 @@
 
 A C<«> or C<<< << >>> token indicates a left word boundary.  A C<»> or
 C<<< >> >>> token indicates a right word boundary.  (As separate tokens,
-these need not be balanced.)  Perl 5's C<\b> is replaced by a C<< <+wb> >>
+these need not be balanced.)  Perl 5's C<\b> is replaced by a C<< <.wb> >>
 "word boundary" assertion, while C<\B> becomes C<< <!wb> >>.  (None of
 these are dependent on the definition of C<< <ws> >>, but only on the C<\w>
 definition of "word" characters.)
@@ -1768,31 +1749,36 @@
 
 =item *
 
-The null pattern is now illegal.
+The empty pattern is now illegal.
 
 =item *
 
 To match whatever the prior successful regex matched, use:
 
-     /<prior>/
+     / <prior> /
 
 =item *
 
-To match the zero-width string, use:
+To match the zero-width string, you must use some explicit
+representation of the null match:
 
-     /<null>/
+    / '' /;
+    / <?> /;
 
 For example:
 
-     split /<+null>/, $string
+     split /''/, $string
+
+splits between characters.  But then, so does this:
 
-splits between characters.
+     split '', $string
 
 =item *
 
-To match a null alternative, use:
+Likewise, to match a empty alternative, use something like:
 
-     /a|b|c|<+null>/
+     /a|b|c|<?>/
+     /a|b|c|''/
 
 This makes it easier to catch errors like this:
 
@@ -1828,7 +1814,8 @@
      $something = "";
      /a|b|c|$something/;
 
-In particular, <?> also matches the null string, and <!> always fails.
+In particular, C<< <?> >> always matches the null string successfuly,
+and C<< <!> >> always fails to match anything.
 
 =back
 
@@ -1887,7 +1874,7 @@
 
 =item *
 
-Any atom that is quantified with a minimally match (using the C<?> modifier).
+Any atom that is quantified with a minimal match (using the C<?> modifier).
 
 =item *
 
@@ -1915,9 +1902,15 @@
 are simulated in any of various ways, such as by Thompson NFA, it may
 be possible to know when to fire off the assertions without backchecks.)
 
-Greedy quantifiers and characters classes do not terminate a token pattern.
+Greedy quantifiers and character classes do not terminate a token pattern.
 Zero-width assertions such as word boundaries are also okay.
 
+For a pattern that starts with a positive lookahead assertion,
+the assertion is assumed to be more specific than the subsequent
+pattern, so the lookahead's pattern is treated as the longest token;
+the longest-token matcher will be smart enough to rematch any text
+traversed by the lookahead when (and if) it continues the match.
+
 Oddly enough, the C<token> keyword specifically does not determine
 the scope of a token, except insofar as a token pattern usually
 doesn't do much matching of whitespace.  In contrast, the C<rule>
@@ -1959,9 +1952,11 @@
 
 A match always returns a Match object, which is also available
 as C<$/>, which is a contextual lexical declared in the outer
-subroutine that is calling the regex.  (A closure lexically embedded
-in a regex does not redeclare C<$/>, so C<$/> always refers to the
-current match, not any prior submatch done within the closure).
+subroutine that is calling the regex.  (A regex declares its own
+lexical C<$/> variable, which always refers to the most recent
+submatch within the rule, if any.)  The current match state is
+kept in the regex's C<$_> variable which will eventually get
+processed into the user's C<$/> variable when the match completes.
 
 =item *
 
@@ -1991,9 +1986,9 @@
 In string context it evaluates to the stringified value of its
 I<result object>, which is usually the entire matched string:
 
-     print %hash{ "{$text ~~ /<+ident>/}" };
+     print %hash{ "{$text ~~ /<.ident>/}" };
      # or equivalently:
-     $text ~~ /<+ident>/  &&  print %hash{~$/};
+     $text ~~ /<.ident>/  &&  print %hash{~$/};
 
 But generally you should say C<~$/> if you mean C<~$/>.
 
@@ -2010,11 +2005,11 @@
 
 When used as a scalar, a C<Match> object evaluates to its underlying
 result object.  Usually this is just the entire match string, but
-you can override that by calling C<return> inside a regex:
+you can override that by calling C<reduce> inside a regex:
 
     my $moose = $(m:{
         <antler> <body>
-        { return Moose.new( body => $<body>().attach($<antler>) ) }
+        { reduce Moose.new( body => $body().attach($antler) ) }
         # match succeeds -- ignore the rest of the regex
     });
 
@@ -2037,8 +2032,8 @@
 
 This means that these two work the same:
 
-    / <moose> { return $$<moose> as Moose } /
-    / <moose> { return $<moose>  as Moose } /
+    / <moose> { reduce $moose as Moose } /
+    / <moose> { reduce $$moose as Moose } /
 
 =item *
 
@@ -2120,28 +2115,27 @@
 =item *
 
 This returned object is also automatically assigned to the lexical
-C<$/> variable, unless the match statement is inside another regex. That is:
+C<$/> variable of the current surroundings. That is:
 
      $str ~~ /pattern/;
      say "Matched" if $/;
 
 =item *
 
-Inside a regex, the C<$/> variable holds the current regex's
-incomplete C<Match> object (which can be modified via the internal C<$/>).
-For example:
-
-    $str ~~ / foo                 # Match 'foo'
-               { $/ = 'bar' }     # But pretend we matched 'bar'
-             /;
-    say $/;                       # says 'bar'
-
-This is slightly dangerous, insofar as you might return something that
-does not behave like a C<Match> object to some context that requires
-one.  Fortunately, you normally just want to return a result object instead:
+Inside a regex, the C<$_> variable holds the current regex's incomplete
+C<Match> object, known as a match state.  Generally this should not
+be modified unless you know how to create and propagate match states.
+All regexes actually return match states even when you think they're
+returning something else, because the match states keep track of
+the success and failures of the pattern for you.
+
+Fortunately, when you just want to return a different result object instead
+of the default C<Match> object, you may associate your return value with
+the current match state using the C<reduce> function, which works something
+like a C<return>, but doesn't clobber the match state:
 
     $str ~~ / foo                 # Match 'foo'
-               { return 'bar' }   # But pretend we matched 'bar'
+               { reduce 'bar' }   # But pretend we matched 'bar'
              /;
     say $();                      # says 'bar'
 
@@ -2459,10 +2453,10 @@
 
 For example, this regex contains three subrules:
 
-      # subrule       subrule      subrule
-      #  __^__    _______^______    __^__
-      # |     |  |              |  |     |
-     m/ <ident>  $<spaces>:=(\s*)  <digit>+ /
+      # subrule       subrule     subrule
+      #  __^__    _______^_____    __^__
+      # |     |  |             |  |     |
+     m/ <ident>  $spaces = (\s*)  <digit>+ /
 
 =item *
 
@@ -2503,8 +2497,8 @@
 =item *
 
 Note that it makes no difference whether a subrule is angle-bracketed
-(C<< <ident> >>) or aliased (C<< $<ident> := (<alpha>\w*) >>). The name's
-the thing.
+(C<< <ident> >>) or aliased internally (C<< <ident=name> >>) or aliased
+externally (C<< $ident = (<alpha>\w*) >>). The name's the thing.
 
 
 =back
@@ -2552,7 +2546,7 @@
 then only the I<final> name counts when deciding whether it is or isn't
 repeated. For example:
 
-     if mm/ mv <file> $<dir>:=<file> / {
+     if mm/ mv <file> <dir=file> / {
          $from = $<file>;  # Only one subrule named <file>, so scalar
          $to   = $<dir>;   # The Capture Formerly Known As <file>
      }
@@ -2606,10 +2600,10 @@
 
 If a named scalar alias is applied to a set of I<capturing> parens:
 
-        #          ______/capturing parens\______
-        #         |                              |
-        #         |                              |
-      mm/ $<key>:=( (<[A..E]>) (\d**{3..6}) (X?) ) /;
+        #         ______/capturing parens\______
+        #        |                              |
+        #        |                              |
+      mm/ $key = ( (<[A..E]>) (\d**{3..6}) (X?) ) /;
 
 then the outer capturing parens no longer capture into the array of
 C<$/> as unaliased parens would. Instead the aliased parens capture
@@ -2664,10 +2658,10 @@
 
 If a named scalar alias is applied to a set of I<non-capturing> brackets:
 
-        #          ___/non-capturing brackets\___
-        #         |                              |
-        #         |                              |
-      mm/ $<key>:=[ (<[A..E]>) (\d**{3..6}) (X?) ] /;
+        #         ___/non-capturing brackets\___
+        #        |                              |
+        #        |                              |
+      mm/ $key = [ (<[A..E]>) (\d**{3..6}) (X?) ] /;
 
 then the corresponding C<< $/<key> >> Match object contains only the string
 matched by the non-capturing brackets.
@@ -2717,7 +2711,7 @@
 entry whose key is the name of the alias. And it I<no longer> assigns
 anything to the hash entry whose key is the subrule name. That is:
 
-     if m/ ID\: $<id>:=<ident> / {
+     if m/ ID\: <id=ident> / {
          say "Identified as $/<id>";    # $/<ident> is undefined
      }
 
@@ -2727,7 +2721,7 @@
 object. This is particularly useful for differentiating two or more calls to
 the same subrule in the same scope. For example:
 
-     if mm/ mv <file>+ $<dir>:=<file> / {
+     if mm/ mv <file>+ <dir=file> / {
          @from = @($<file>);
          $to   = $<dir>;
      }
@@ -2742,7 +2736,7 @@
 
 If a numbered alias is used instead of a named alias:
 
-     m/ $1:=(<-[:]>*) \:  $0:=<ident> /
+     m/ $1=(<-[:]>*) \:  $0=<ident> /
 
 the behavior is exactly the same as for a named alias (i.e. the various
 cases described above), except that the resulting C<Match> object is
@@ -2756,9 +2750,9 @@
 alias number (much like enum values increment from the last explicit
 value). That is:
 
-      #  ---$1---    -$2-    ---$6---    -$7-
-      # |        |  |    |  |        |  |    |
-     m/ $1:=(food)  (bard)  $6:=(bazd)  (quxd) /;
+      #  --$1---    -$2-    --$6---    -$7-
+      # |       |  |    |  |       |  |    |
+     m/ $1=(food)  (bard)  $6=(bazd)  (quxd) /;
 
 =item *
 
@@ -2766,8 +2760,8 @@
 Perl5 semantics for consecutive subpattern numbering in alternations:
 
      $tune_up = rx/ (don't) (ray) (me) (for) (solar tea), (d'oh!)
-                  | $6:=(every) (green) (BEM) (devours) (faces)
-                  #             $7      $8    $9        $10
+                  | $6 = (every) (green) (BEM) (devours) (faces)
+                  #              $7      $8    $9        $10
                   /;
 
 =item *
@@ -2794,12 +2788,12 @@
 
 
       # Perl 6 simulating Perl 5...
-      #                  $1
-      #  ________________/\________________
-      # |         $2          $3       $4  |
-      # |      ___/\___   ____/\____   /\  |
-      # |     |        | |          | |  | |
-     m/ $1:=[ (<[A..E]>) (\d**{3..6}) (X?) ] /;
+      #                 $1
+      #  _______________/\________________
+      # |        $2          $3       $4  |
+      # |     ___/\___   ____/\____   /\  |
+      # |    |        | |          | |  | |
+     m/ $1=[ (<[A..E]>) (\d**{3..6}) (X?) ] /;
 
 The non-capturing brackets don't introduce a scope, so the subpatterns within
 them are at regex scope, and hence numbered at the top level. Aliasing the
@@ -2832,7 +2826,7 @@
 In other words, aliasing and quantification are completely orthogonal.
 For example:
 
-     if mm/ mv $0:=<file>+ / {
+     if mm/ mv $0=<file>+ / {
          # <file>+ returns a list of Match objects,
          # so $0 contains an array of Match objects,
          # one for each successful call to <file>
@@ -2841,7 +2835,7 @@
      }
 
 
-     if m/ mv \s+ $<from>:=(\S+ \s+)* / {
+     if m/ mv \s+ $from=(\S+ \s+)* / {
          # Quantified subpattern returns a list of Match objects,
          # so $/<from> contains an array of Match
          # objects, one for each successful match of the subpattern
@@ -2857,7 +2851,7 @@
 brackets (as described in L<Named scalar aliases applied to 
 non-capturing brackets>). For example:
 
-     "coffee fifo fumble" ~~ m/ $<effs>:=[f <-[f]>**{1..2} \s*]+ /;
+     "coffee fifo fumble" ~~ m/ $effs = [f <-[f]>**{1..2} \s*]+ /;
 
      say $<effs>;    # prints "fee fifo fum"
 
@@ -2873,11 +2867,11 @@
 An alias can also be specified using an array as the alias instead of a scalar.
 For example:
 
-     m/ mv \s+ @<from>:=[(\S+) \s+]* <dir> /;
+     m/ mv \s+ @from = [(\S+) \s+]* <dir> /;
 
 =item *
 
-Using the C<< @<alias>:= >> notation instead of a C<< $<alias>:= >>
+Using the C<< @alias= >> notation instead of a C<< $alias= >>
 mandates that the corresponding hash entry or array element I<always>
 receives an array of C<Match> objects, even if the
 construct being aliased would normally return a single C<Match> object.
@@ -2885,11 +2879,11 @@
 structurally different alternations (by enforcing array captures in all
 branches):
 
-     mm/ Mr?s? @<names>:=<ident> W\. @<names>:=<ident>
-        | Mr?s? @<names>:=<ident>
+     mm/ Mr?s? @names=<ident> W\. @names=<ident>
+        | Mr?s? @names=<ident>
         /;
 
-     # Aliasing to @<names> means $/<names> is always
+     # Aliasing to @names means $/<names> is always
      # an Array object, so...
 
      say @($/<names>);
@@ -2899,8 +2893,8 @@
 For convenience and consistency, C<< @<key> >> can also be used outside a
 regex, as a shorthand for C<< @( $/<key> ) >>. That is:
 
-     mm/ Mr?s? @<names>:=<ident> W\. @<names>:=<ident>
-        | Mr?s? @<names>:=<ident>
+     mm/ Mr?s? @names=<ident> W\. @names=<ident>
+        | Mr?s? @names=<ident>
         /;
 
      say @<names>;
@@ -2911,18 +2905,18 @@
 brackets, it captures the substrings matched by each repetition of the
 brackets into separate elements of the corresponding array. That is:
 
-     mm/ mv $<files>:=[ f.. \s* ]* /; # $/<files> assigned a single
-                                      # Match object containing the
-                                      # complete substring matched by
-                                      # the full set of repetitions
-                                      # of the non-capturing brackets
-
-     mm/ mv @<files>:=[ f.. \s* ]* /; # $/<files> assigned an array,
-                                      # each element of which is a
-                                      # Match object containing
-                                      # the substring matched by Nth
-                                      # repetition of the non-
-                                      # capturing bracket match
+     mm/ mv $files=[ f.. \s* ]* /; # $/<files> assigned a single
+                                   # Match object containing the
+                                   # complete substring matched by
+                                   # the full set of repetitions
+                                   # of the non-capturing brackets
+
+     mm/ mv @files=[ f.. \s* ]* /; # $/<files> assigned an array,
+                                   # each element of which is a
+                                   # Match object containing
+                                   # the substring matched by Nth
+                                   # repetition of the non-
+                                   # capturing bracket match
 
 =item *
 
@@ -2933,7 +2927,7 @@
 an array alias on a subpattern flattens and collects all nested
 subpattern captures within the aliased subpattern. For example:
 
-     if mm/ $<pairs>:=( (\w+) \: (\N+) )+ / {
+     if mm/ $pairs=( (\w+) \: (\N+) )+ / {
          # Scalar alias, so $/<pairs> is assigned an array
          # of Match objects, each of which has its own array
          # of two subcaptures...
@@ -2945,7 +2939,7 @@
      }
 
 
-     if mm/ @<pairs>:=( (\w+) \: (\N+) )+ / {
+     if mm/ @pairs=( (\w+) \: (\N+) )+ / {
          # Array alias, so $/<pairs> is assigned an array
          # of Match objects, each of which is flattened out of
          # the two subcaptures within the subpattern
@@ -2965,7 +2959,7 @@
 
      rule pair { (\w+) \: (\N+) \n }
 
-     if mm/ $<pairs>:=<pair>+ / {
+     if mm/ $pairs=<pair>+ / {
          # Scalar alias, so $/<pairs> contains an array of
          # Match objects, each of which is the result of the
          # <pair> subrule call...
@@ -2977,7 +2971,7 @@
      }
 
 
-     if mm/ mv @<pairs>:=<pair>+ / {
+     if mm/ mv @pairs=<pair>+ / {
          # Array alias, so $/<pairs> contains an array of
          # Match objects, all flattened down from the
          # nested arrays inside the Match objects returned
@@ -3004,13 +2998,13 @@
 appropriate element of the regex's match array rather than to a key of
 its match hash. For example:
 
-     if m/ mv  \s+  @0:=((\w+) \s+)+  $1:=((\W+) (\s*))* / {
-         #          |                 |
-         #          |                 |
-         #          |                  \_ Scalar alias, so $1 gets an
-         #          |                     array, with each element
-         #          |                     a Match object containing
-         #          |                     the two nested captures
+     if m/ mv  \s+  @0=((\w+) \s+)+  $1=((\W+) (\s*))* / {
+         #          |                |
+         #          |                |
+         #          |                 \_ Scalar alias, so $1 gets an
+         #          |                    array, with each element
+         #          |                    a Match object containing
+         #          |                    the two nested captures
          #          |
          #           \___ Array alias, so $0 gets a flattened array of
          #                just the (\w+) captures from each repetition
@@ -3040,7 +3034,7 @@
 An alias can also be specified using a hash as the alias variable,
 instead of a scalar or an array. For example:
 
-     m/ mv %<location>:=( (<ident>) \: (\N+) )+ /;
+     m/ mv %location=( (<ident>) \: (\N+) )+ /;
 
 =item *
 
@@ -3062,7 +3056,7 @@
 
      rule one_to_many {  (\w+) \: (\S+) (\S+) (\S+) }
 
-     if mm/ %0:=<one_to_many>+ / {
+     if mm/ %0=<one_to_many>+ / {
          # $/[0] contains a hash, in which each key is provided by
          # the first subcapture within C<one_to_many>, and each
          # value is an array containing the
@@ -3094,11 +3088,11 @@
 
 Instead of using internal aliases like:
 
-     m/ mv  @<files>:=<ident>+  $<dir>:=<ident> /
+     m/ mv  @files=<ident>+  $dir=<ident> /
 
 the name of an ordinary variable can be used as an I<external> alias, like so:
 
-     m/ mv  @files:=<ident>+  $dir:=<ident> /
+     m/ mv  @files=<ident>+  $dir=<ident> /
 
 =item *
 
@@ -3185,10 +3179,10 @@
 the angles is used as part of the key.  Suppose the earlier example
 parsed whitespace:
 
-     / <key> <+ws> '=>' <+ws> <value> { %hash{$<key>} = $<value> } /
+     / <key> <.ws> '=>' <.ws> <value> { %hash{$key} = $value } /
 
-The two instances of C<< <+ws> >> above would store an array of two
-values accessible as C<< @<+ws> >>.  It would also store the literal
+The two instances of C<< <.ws> >> above would store an array of two
+values accessible as C<< @<.ws> >>.  It would also store the literal
 match into C<< $<'=\>'> >>.  Just to make sure nothing is forgotten,
 under C<:keepall> any text or whitespace not otherwise remembered is
 attached as an extra property on the subsequent node. (The name of
@@ -3251,20 +3245,20 @@
      grammar Letter {
          rule text     { <greet> <body> <close> }
 
-         rule greet { [Hi|Hey|Yo] $<to>:=(\S+?) , $$}
+         rule greet { [Hi|Hey|Yo] $to=(\S+?) , $$}
 
          rule body     { <line>+? }   # note: backtracks forwards via +?
 
-         rule close { Later dude, $<from>:=(.+) }
+         rule close { Later dude, $from=(.+) }
 
          # etc.
      }
 
      grammar FormalLetter is Letter {
 
-         rule greet { Dear $<to>:=(\S+?) , $$}
+         rule greet { Dear $to=(\S+?) , $$}
 
-         rule close { Yours sincerely, $<from>:=(.+) }
+         rule close { Yours sincerely, $from=(.+) }
 
      }

[svn:perl6-synopsis] r14454 - doc/trunk/design/syn

Reply via email to