Author: larry Date: Tue Jan 16 11:09:42 2007 New Revision: 13523 Modified: doc/trunk/design/syn/S05.pod
Log: Tweak | to provide longest-token instead of short-circuit semantics. Now use || for old short-circuit semantics! Modified: doc/trunk/design/syn/S05.pod ============================================================================== --- doc/trunk/design/syn/S05.pod (original) +++ doc/trunk/design/syn/S05.pod Tue Jan 16 11:09:42 2007 @@ -14,9 +14,9 @@ Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and Larry Wall <[EMAIL PROTECTED]> Date: 24 Jun 2002 - Last Modified: 23 Dec 2006 + Last Modified: 16 Jan 2007 Number: 5 - Version: 41 + Version: 42 This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them I<regex> rather than "regular @@ -67,6 +67,29 @@ =back +While the syntax of C<|> does not change, the default semantics do +change slightly. Instead of representing temporal alternation, C<|> +now represents logical alternation with longest-token semantics. +(You may now use C<||> to indicate the old temporal alternation. That is, +C<|> and C<||> now work within regex syntax much the same as they +do outside of regex syntax, where they represent junctional and +short-circuit OR.) Every regex in Perl 6 is required to be able to +return its list of initial constant strings (transitively including the +initial constant strings of any initial subrule called by that regex). +A logical alternation using C<|> then takes two or more of these lists +and dispatches to the alternative that advertises the longest matching +prefix, not necessarily to the alternative that comes first lexically. +(However, in the case of a tie between alternatives, the first earlier +alternative does take precedence.) + +Initial constants must take into account case sensitivity (or any other +canonicalization primitives) and do the right thing even when propagated +up to rules that don't have the same canonicalization. That is, they +must continue to represent the set of matches that the lower rule would +match. If and when the optimizer turns such a list of prefixes into, +say, a trie, the trie must continue to have the appropriate semantics +for the originating rule. + =head1 Modifiers =over @@ -1319,6 +1342,10 @@ put an explicit C<!> after the alternation to enable backing into another alternative if the first pick fails. +The C<::> also has the effect of hiding any constant string on the right +from "longest token" processing by C<|>. Only the left side is evaluated +for initial constancy. + =item * Backtracking over a triple colon causes the current regex to fail