Author: larry
Date: Wed Jan 17 14:05:20 2007
New Revision: 13528

Modified:
   doc/trunk/design/syn/S05.pod

Log:
Clarify how C<||> limits longest-token semantics.


Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod        (original)
+++ doc/trunk/design/syn/S05.pod        Wed Jan 17 14:05:20 2007
@@ -14,9 +14,9 @@
    Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
                Larry Wall <[EMAIL PROTECTED]>
    Date: 24 Jun 2002
-   Last Modified: 16 Jan 2007
+   Last Modified: 17 Jan 2007
    Number: 5
-   Version: 43
+   Version: 44
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I<regex> rather than "regular
@@ -73,14 +73,18 @@
 (You may now use C<||> to indicate the old temporal alternation.  That is,
 C<|> and C<||> now work within regex syntax much the same as they
 do outside of regex syntax, where they represent junctional and
-short-circuit OR.)  Every regex in Perl 6 is required to be able to
+short-circuit OR.  This includes the fact that C<|> has tighter
+precedence than C<||>.)  Every regex in Perl 6 is required to be able to
 return its list of initial constant strings (transitively including the
 initial constant strings of any initial subrule called by that regex).
 A logical alternation using C<|> then takes two or more of these lists
 and dispatches to the alternative that advertises the longest matching
 prefix, not necessarily to the alternative that comes first lexically.
 (However, in the case of a tie between alternatives, the earlier
-alternative does take precedence.)
+alternative does take precedence.)  Backtracking into a constant prefix
+(or into a :: that would backtrack over a constant prefix) causes
+the next longest match to be attempted, even if that is specified
+in a different subrule.
 
 Initial constants must take into account case sensitivity (or any other
 canonicalization primitives) and do the right thing even when propagated
@@ -90,6 +94,18 @@
 say, a trie, the trie must continue to have the appropriate semantics
 for the originating rule.
 
+The C<||> form has the old short-circuit semantics, and will not
+attempt to match its right side unless all possibilities (including
+all C<|> possibilities) are exhausted on its left.  The first C<||>
+in a regex makes constant strings on its left available to the
+outer longest-token matcher, but hides any subsequent tests from
+longest-token matching.  Every C<||> establishes a new longest-token
+table.  That is, if you use C<|> on the right side of C<||>, that
+right side establishes a new top level for longest-token processing
+for this subexpression and any called subrules.  The right side's
+longest-token list is invisible to the left of the C<||> or outside
+the regex containing the C<||>.
+
 =head1 Modifiers
 
 =over
@@ -506,11 +522,13 @@
 C<&> and C<&&> forms.  The C<&> form allows the compiler and/or the
 run-time system to decide which parts to evaluate first, and it is
 erroneous to assume either order happens consistently.  The C<&&>
-form short-circuits, and backtracking makes the right argument vary
-faster than the left.
+form guarantees left-to-right order, and backtracking makes the right
+argument vary faster than the left.
 
-The C<&> and C<&&> operators are list associative like C<|> and C<||>,
-but have tighter precedence.
+The C<&> operator is list associative like C<|>, but has slightly
+tighter precedence.  Likewise C<&&> has slightly tighter precedence
+than C<||>.  As with the normal junctional and short-circuit operators,
+C<&> and C<|> are both tighter than C<&&> and C<||>.
 
 =back
 

Reply via email to