Author: larry Date: Wed Jan 17 14:05:20 2007 New Revision: 13528 Modified: doc/trunk/design/syn/S05.pod
Log: Clarify how C<||> limits longest-token semantics. Modified: doc/trunk/design/syn/S05.pod ============================================================================== --- doc/trunk/design/syn/S05.pod (original) +++ doc/trunk/design/syn/S05.pod Wed Jan 17 14:05:20 2007 @@ -14,9 +14,9 @@ Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and Larry Wall <[EMAIL PROTECTED]> Date: 24 Jun 2002 - Last Modified: 16 Jan 2007 + Last Modified: 17 Jan 2007 Number: 5 - Version: 43 + Version: 44 This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them I<regex> rather than "regular @@ -73,14 +73,18 @@ (You may now use C<||> to indicate the old temporal alternation. That is, C<|> and C<||> now work within regex syntax much the same as they do outside of regex syntax, where they represent junctional and -short-circuit OR.) Every regex in Perl 6 is required to be able to +short-circuit OR. This includes the fact that C<|> has tighter +precedence than C<||>.) Every regex in Perl 6 is required to be able to return its list of initial constant strings (transitively including the initial constant strings of any initial subrule called by that regex). A logical alternation using C<|> then takes two or more of these lists and dispatches to the alternative that advertises the longest matching prefix, not necessarily to the alternative that comes first lexically. (However, in the case of a tie between alternatives, the earlier -alternative does take precedence.) +alternative does take precedence.) Backtracking into a constant prefix +(or into a :: that would backtrack over a constant prefix) causes +the next longest match to be attempted, even if that is specified +in a different subrule. Initial constants must take into account case sensitivity (or any other canonicalization primitives) and do the right thing even when propagated @@ -90,6 +94,18 @@ say, a trie, the trie must continue to have the appropriate semantics for the originating rule. +The C<||> form has the old short-circuit semantics, and will not +attempt to match its right side unless all possibilities (including +all C<|> possibilities) are exhausted on its left. The first C<||> +in a regex makes constant strings on its left available to the +outer longest-token matcher, but hides any subsequent tests from +longest-token matching. Every C<||> establishes a new longest-token +table. That is, if you use C<|> on the right side of C<||>, that +right side establishes a new top level for longest-token processing +for this subexpression and any called subrules. The right side's +longest-token list is invisible to the left of the C<||> or outside +the regex containing the C<||>. + =head1 Modifiers =over @@ -506,11 +522,13 @@ C<&> and C<&&> forms. The C<&> form allows the compiler and/or the run-time system to decide which parts to evaluate first, and it is erroneous to assume either order happens consistently. The C<&&> -form short-circuits, and backtracking makes the right argument vary -faster than the left. +form guarantees left-to-right order, and backtracking makes the right +argument vary faster than the left. -The C<&> and C<&&> operators are list associative like C<|> and C<||>, -but have tighter precedence. +The C<&> operator is list associative like C<|>, but has slightly +tighter precedence. Likewise C<&&> has slightly tighter precedence +than C<||>. As with the normal junctional and short-circuit operators, +C<&> and C<|> are both tighter than C<&&> and C<||>. =back