Author: larry
Date: Thu May 11 09:55:36 2006
New Revision: 9197

Modified:
   doc/trunk/design/syn/S05.pod

Log:
Changed :words/:w to :sigspace/:s and invented ss/// and ms// (or maybe mm//).


Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod        (original)
+++ doc/trunk/design/syn/S05.pod        Thu May 11 09:55:36 2006
@@ -14,9 +14,9 @@
    Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
                Larry Wall <[EMAIL PROTECTED]>
    Date: 24 Jun 2002
-   Last Modified: 24 Apr 2006
+   Last Modified: 11 May 2006
    Number: 5
-   Version: 23
+   Version: 24
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I<regex> because they haven't been
@@ -151,10 +151,13 @@
 
 =item *
 
-The new C<:w> (C<:words>) modifier causes whitespace sequences to be
-replaced by C<\s*> or C<\s+> subpattern as defined by the C<< <?ws> >> rule.
+The new C<:s> (C<:sigspace>) modifier causes whitespace sequences
+to be considered "significant".  That is, they are replaced by a
+whitespace matching rule, C<< <?ws> >>.
 
-     m:w/ next cmd =   <condition>/
+Anyway,
+
+     m:s/ next cmd =   <condition>/
 
 Same as:
 
@@ -166,17 +169,43 @@
 
 But in the case of
 
-     m:w { (a|\*) (b|\+) }
+     m:s {(a|\*) (b|\+)}
 
 or equivalently,
 
      m { (a|\*) <?ws> (b|\+) }
 
-C<< <?ws> >> can't decide what to do until it sees the data.  It still does
-the right thing.  If not, define your own C<< <?ws> >> and C<:w> will use that.
+C<< <?ws> >> can't decide what to do until it sees the data.
+It still does the right thing.  If not, define your own C<< <?ws> >>
+and C<:sigspace> will use that.
 
-In general you don't need to use C<:w> within grammars because
+In general you don't need to use C<:sigspace> within grammars because
 the parser rules automatically handle whitespace policy for you.
+In this context, whitespace often includes comments, depending on
+how the grammar chooses to define its whitespace rule.  Although the
+default C<< <?ws> >> subrule recognizes no comment construct, any
+grammar is free to override the rule.  The C<< <?ws> >> rule is not
+intended to mean the same thing everywhere.
+
+It's also possible to pass an argument to C<:sigspace> specifying
+a completely different subrule to apply.  This can be any rule, it
+doesn't have to match whitespace.  When discussing this modifier, it is
+important to distinguish the significant whitespace in the pattern from
+the "whitespace" being matched, so we'll call the pattern's whitespace
+I<sigspace>, and generally reserve I<whitespace> to indicate whatever
+C<< <?ws> >> matches in the current grammar. The correspondence
+between sigspace and whitespace is primarily metaphorical, which is
+why the correspondence is both useful and (potentially) confusing.
+
+The C<:s> modifier is considered sufficiently important that
+match variants are defined for them:
+
+    ms/match some words/                       # same as m:sigspace
+    ss/match some words/replace those words/   # same ss s:sigspace
+
+Conjecture: This might become sufficiently idiomatic that C<ms//> would
+be better as a "stuttered" C<mm//> instead, much as C<qq//> became idiomatic.
+It would also match C<ss///> that way.
 
 =item *
 
@@ -311,10 +340,10 @@
 
 =item *
 
-The C<:i>, C<:w>, C<:Perl5>, and Unicode-level modifiers can be
+The C<:i>, C<:s>, C<:Perl5>, and Unicode-level modifiers can be
 placed inside the regex (and are lexically scoped):
 
-     m/:w alignment = [:i left|right|cent[er|re]] /
+     m/:s alignment = [:i left|right|cent[er|re]] /
 
 =item *
 
@@ -389,7 +418,7 @@
 =item *
 
 Whitespace is now always metasyntactic, i.e. used only for layout
-and not matched literally (but see the C<:w> modifier described above).
+and not matched literally (but see the C<:sigspace> modifier described above).
 
 =back
 
@@ -604,8 +633,8 @@
      / <before pattern> /    # was /(?=pattern)/
      / <after pattern> /     # was /(?<pattern)/
 
-     / <ws> /                # match whitespace by :w policy
-     / <sp> /                # match a space char
+     / <ws> /                # match whitespace by :s policy
+     / <sp> /                # match the SPACE character (U+0020)
 
      / <at($pos)> /          # match only at a particular StrPos
                             # short for <?{ .pos == $pos }>
@@ -966,8 +995,8 @@
 
 If either form needs modifiers, they go before the opening delimiter:
 
-     $regex = regex :g:w:i { my name is (.*) };
-         $regex = rx:g:w:i / my name is (.*) /;    # same thing
+     $regex = regex :g:s:i { my name is (.*) };
+         $regex = rx:g:s:i / my name is (.*) /;    # same thing
 
 Space is necessary after the final modifier if you use any
 bracketing character for the delimiter.  (Otherwise it would be taken as
@@ -978,7 +1007,7 @@
 You may not use colons for the delimiter.  Space is allowed between
 modifiers:
 
-     $regex = rx :g :w :i / my name is (.*) /;
+     $regex = rx :g :s :i / my name is (.*) /;
 
 =item *
 
@@ -1072,10 +1101,10 @@
 
 The other is the C<rule> declarator, for declaring non-terminal
 productions in a grammar.  Like a C<token>, it also does not backtrack
-by default.  In addition, a C<rule> regex also assumes C<:words>.
+by default.  In addition, a C<rule> regex also assumes C<:sigspace>.
 A C<rule> is really short for:
 
-    regex :ratchet :words { ... }
+    regex :ratchet :sigspace { ... }
 
 =item *
 
@@ -1125,7 +1154,7 @@
 Backtracking over a single colon causes the regex engine not to retry
 the preceding atom:
 
-     m:w/ \( <expr> [ , <expr> ]*: \) /
+     ms/ \( <expr> [ , <expr> ]*: \) /
 
 (i.e. there's no point trying fewer C<< <expr> >> matches, if there's
 no closing parenthesis on the horizon)
@@ -1138,7 +1167,7 @@
 Backtracking over a double colon causes the surrounding group of
 alternations to immediately fail:
 
-     m:w/ [ if :: <expr> <block>
+     ms/ [ if :: <expr> <block>
           | for :: <list> <block>
           | loop :: <loop_controls>? <block>
           ]
@@ -1161,7 +1190,7 @@
          | " [<alpha>|_] \w* "
      }
 
-     m:w/ get <ident>? /
+     ms/ get <ident>? /
 
 (i.e. using an unquoted reserved word as an identifier is not permitted)
 
@@ -1173,7 +1202,7 @@
      regex subname {
          ([<alpha>|_] \w*) <commit> { fail if %reserved{$0} }
      }
-     m:w/ sub <subname>? <block> /
+     ms/ sub <subname>? <block> /
 
 (i.e. using a reserved word as a subroutine name is instantly fatal
 to the I<surrounding> match as well)
@@ -1271,7 +1300,7 @@
 
 As a special case, however, the first null alternative in a match like
 
-     m:w/ [
+     ms/ [
           | if :: <expr> <block>
           | for :: <list> <block>
           | loop :: <loop_controls>? <block>
@@ -1281,7 +1310,7 @@
 is simply ignored.  Only the first alternative is special that way.
 If you write:
 
-     m:w/ [
+     ms/ [
               if :: <expr> <block>              |
               for :: <list> <block>             |
               loop :: <loop_controls>? <block>  |
@@ -1397,24 +1426,24 @@
 When used as an array, a C<Match> object pretends to be an array of all
 its positional captures.  Hence
 
-     ($key, $val) = m:w/ (\S+) => (\S+)/;
+     ($key, $val) = ms/ (\S+) => (\S+)/;
 
 can also be written:
 
-     $result = m:w/ (\S+) => (\S+)/;
+     $result = ms/ (\S+) => (\S+)/;
      ($key, $val) = @$result;
 
 To get a single capture into a string, use a subscript:
 
-     $mystring = "{ m:w/ (\S+) => (\S+)/[0] }";
+     $mystring = "{ ms/ (\S+) => (\S+)/[0] }";
 
 To get all the captures into a string, use a I<zen> slice:
 
-     $mystring = "{ m:w/ (\S+) => (\S+)/[] }";
+     $mystring = "{ ms/ (\S+) => (\S+)/[] }";
 
 Or cast it into an array:
 
-     $mystring = "@( m:w/ (\S+) => (\S+)/ )";
+     $mystring = "@( ms/ (\S+) => (\S+)/ )";
 
 Note that, as a scalar variable, C<$/> doesn't automatically flatten
 in list context.  Use C<@()> as a shorthand for C<@($/)> to flatten
@@ -1518,7 +1547,7 @@
         # |       subpattern  subpattern          |
         # |          __/\__    __/\__             |
         # |         |      |  |      |            |
-     m:w/ (I am the (walrus), ( khoo )**{2} kachoo) /;
+     ms/ (I am the (walrus), ( khoo )**{2} kachoo) /;
 
 
 =item *
@@ -1549,7 +1578,7 @@
         # |         subpat-B  subpat-C            |
         # |          __/\__    __/\__             |
         # |         |      |  |      |            |
-     m:w/ (I am the (walrus), ( khoo )**{2} kachoo) /;
+     ms/ (I am the (walrus), ( khoo )**{2} kachoo) /;
 
 then the C<Match> objects representing the matches made by I<subpat-B>
 and I<subpat-C> would be successively pushed onto the array inside I<subpat-
@@ -1835,7 +1864,7 @@
       # : $/<ident>   :        $/[0]<ident>         : :
       # :   __^__     :           __^__             : :
       # :  |     |    :          |     |            : :
-     m:w/  <ident> \: ( known as <ident> previously ) /
+     ms/  <ident> \: ( known as <ident> previously ) /
 
 
 =back
@@ -1854,7 +1883,7 @@
       #    $<ident>             $0<ident>
       #     __^__                 __^__
       #    |     |               |     |
-     m:w/  <ident> \: ( known as <ident> previously ) /
+     ms/  <ident> \: ( known as <ident> previously ) /
 
 =item *
 
@@ -1883,21 +1912,21 @@
 from a single quantified repetition) append their individual C<Match>
 objects to this array. For example:
 
-     if m:w/ mv <file> <file> / {
+     if ms/ mv <file> <file> / {
          $from = $<file>[0];
          $to   = $<file>[1];
      }
 
 Likewise, with a quantified subrule:
 
-     if m:w/ mv <file>**{2} / {
+     if ms/ mv <file>**{2} / {
          $from = $<file>[0];
          $to   = $<file>[1];
      }
 
 Likewise, with a mixture of both:
 
-     if m:w/ mv <file>+ <file> / {
+     if ms/ mv <file>+ <file> / {
          $to   = pop @{$<file>};
          @from = @{$<file>};
      }
@@ -1908,7 +1937,7 @@
 then only the I<final> name counts when deciding whether it is or isn't
 repeated. For example:
 
-     if m:w/ mv <file> $<dir>:=<file> / {
+     if ms/ mv <file> $<dir>:=<file> / {
          $from = $<file>;  # Only one subrule named <file>, so scalar
          $to   = $<dir>;   # The Capture Formerly Known As <file>
      }
@@ -1918,7 +1947,7 @@
 produce an array of C<Match> objects, since none of them has two or more
 C<< <file> >> subrules in the same lexical scope:
 
-     if m:w/ (keep) <file> | (toss) <file> / {
+     if ms/ (keep) <file> | (toss) <file> / {
          # Each <file> is in a separate alternation, therefore <file>
          # is not repeated in any one scope, hence $<file> is
          # not an Array object...
@@ -1926,7 +1955,7 @@
          $target = $<file>;
      }
 
-     if m:w/ <file> \: (<file>|none) / {
+     if ms/ <file> \: (<file>|none) / {
          # Second <file> nested in subpattern which confers a
          # different scope...
          $actual  = $/<file>;
@@ -1938,7 +1967,7 @@
 On the other hand, unaliased square brackets don't confer a separate
 scope (because they don't have an associated C<Match> object). So:
 
-     if m:w/ <file> \: [<file>|none] / { # Two <file>s in same scope
+     if ms/ <file> \: [<file>|none] / { # Two <file>s in same scope
          $actual  = $/<file>[0];
          $virtual = $/<file>[1] if $/<file>[1];
      }
@@ -1965,7 +1994,7 @@
         #          ______/capturing parens\_____
         #         |                             |
         #         |                             |
-     m:w/ $<key>:=( (<[A..E]>) (\d**{3..6}) (X?) ) /;
+     ms/ $<key>:=( (<[A..E]>) (\d**{3..6}) (X?) ) /;
 
 then the outer capturing parens no longer capture into the array of
 C<$/> (like unaliased parens would). Instead the aliased parens capture
@@ -2023,7 +2052,7 @@
         #          ___/non-capturing brackets\__
         #         |                             |
         #         |                             |
-     m:w/ $<key>:=[ (<[A..E]>) (\d**{3..6}) (X?) ] /;
+     ms/ $<key>:=[ (<[A..E]>) (\d**{3..6}) (X?) ] /;
 
 then the corresponding C<< $/<key> >> object contains only the string
 matched by the non-capturing brackets.
@@ -2083,7 +2112,7 @@
 object. This is particularly useful for differentiating two or more calls to
 the same subrule in the same scope. For example:
 
-     if m:w/ mv <file>+ $<dir>:=<file> / {
+     if ms/ mv <file>+ $<dir>:=<file> / {
          @from = @{$<file>};
          $to   = $<dir>;
      }
@@ -2241,7 +2270,7 @@
 structurally different alternations (by enforcing array captures in all
 branches):
 
-     m:w/ Mr?s? @<names>:=<ident> W\. @<names>:=<ident>
+     ms/ Mr?s? @<names>:=<ident> W\. @<names>:=<ident>
         | Mr?s? @<names>:=<ident>
         /;
 
@@ -2255,7 +2284,7 @@
 For convenience and consistency, C<< @<key> >> can also be used outside a
 regex, as a shorthand for C<< @{ $/<key> } >>. That is:
 
-     m:w/ Mr?s? @<names>:=<ident> W\. @<names>:=<ident>
+     ms/ Mr?s? @<names>:=<ident> W\. @<names>:=<ident>
         | Mr?s? @<names>:=<ident>
         /;
 
@@ -2289,7 +2318,7 @@
 an array alias on a subpattern flattens and collects all nested
 subpattern captures within the aliased subpattern. For example:
 
-     if m:w/ $<pairs>:=( (\w+) \: (\N+) )+ / {
+     if ms/ $<pairs>:=( (\w+) \: (\N+) )+ / {
          # Scalar alias, so $/<pairs> is assigned an array
          # of Match objects, each of which has its own array
          # of two subcaptures...
@@ -2301,7 +2330,7 @@
      }
 
 
-     if m:w/ @<pairs>:=( (\w+) \: (\N+) )+ / {
+     if ms/ @<pairs>:=( (\w+) \: (\N+) )+ / {
          # Array alias, so $/<pairs> is assigned an array
          # of Match objects, each of which is flattened out of
          # the two subcaptures within the subpattern
@@ -2321,7 +2350,7 @@
 
      rule pair { (\w+) \: (\N+) \n }
 
-     if m:w/ $<pairs>:=<pair>+ / {
+     if ms/ $<pairs>:=<pair>+ / {
          # Scalar alias, so $/<pairs> contains an array of
          # Match objects, each of which is the result of the
          # <pair> subrule call...
@@ -2333,7 +2362,7 @@
      }
 
 
-     if m:w/ mv @<pairs>:=<pair>+ / {
+     if ms/ mv @<pairs>:=<pair>+ / {
          # Array alias, so $/<pairs> contains an array of
          # Match objects, all flattened down from the
          # nested arrays inside the Match objects returned
@@ -2418,7 +2447,7 @@
 
      rule one_to_many {  (\w+) \: (\S+) (\S+) (\S+) }
 
-     if m:w/ %0:=<one_to_many>+ / {
+     if ms/ %0:=<one_to_many>+ / {
          # $/[0] contains a hash, in which each key is provided by
          # the first subcapture within C<one_to_many>, and each
          # value is an  array containing the
@@ -2511,14 +2540,14 @@
 
 For example:
 
-     if $text ~~ m:w:g/ (\S+:) <rocks> / {
+     if $text ~~ ms:g/ (\S+:) <rocks> / {
          say 'Full match context is: [$/]';
      }
 
 But the list of individual match objects corresponding to each separate
 match is also available:
 
-     if $text ~~ m:w:g/ (\S+:) <rocks> / {
+     if $text ~~ ms:g/ (\S+:) <rocks> / {
          say "Matched { +@@() } times";    # Note: forced eager here
 
          for @@() -> $m {

Reply via email to