Author: larry
Date: Tue Jan 16 11:09:42 2007
New Revision: 13523
Modified:
doc/trunk/design/syn/S05.pod
Log:
Tweak | to provide longest-token instead of short-circuit semantics.
Now use || for old short-circuit semantics!
Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod (original)
+++ doc/trunk/design/syn/S05.pod Tue Jan 16 11:09:42 2007
@@ -14,9 +14,9 @@
Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
Larry Wall <[EMAIL PROTECTED]>
Date: 24 Jun 2002
- Last Modified: 23 Dec 2006
+ Last Modified: 16 Jan 2007
Number: 5
- Version: 41
+ Version: 42
This document summarizes Apocalypse 5, which is about the new regex
syntax. We now try to call them I<regex> rather than "regular
@@ -67,6 +67,29 @@
=back
+While the syntax of C<|> does not change, the default semantics do
+change slightly. Instead of representing temporal alternation, C<|>
+now represents logical alternation with longest-token semantics.
+(You may now use C<||> to indicate the old temporal alternation. That is,
+C<|> and C<||> now work within regex syntax much the same as they
+do outside of regex syntax, where they represent junctional and
+short-circuit OR.) Every regex in Perl 6 is required to be able to
+return its list of initial constant strings (transitively including the
+initial constant strings of any initial subrule called by that regex).
+A logical alternation using C<|> then takes two or more of these lists
+and dispatches to the alternative that advertises the longest matching
+prefix, not necessarily to the alternative that comes first lexically.
+(However, in the case of a tie between alternatives, the first earlier
+alternative does take precedence.)
+
+Initial constants must take into account case sensitivity (or any other
+canonicalization primitives) and do the right thing even when propagated
+up to rules that don't have the same canonicalization. That is, they
+must continue to represent the set of matches that the lower rule would
+match. If and when the optimizer turns such a list of prefixes into,
+say, a trie, the trie must continue to have the appropriate semantics
+for the originating rule.
+
=head1 Modifiers
=over
@@ -1319,6 +1342,10 @@
put an explicit C<!> after the alternation to enable backing into
another alternative if the first pick fails.
+The C<::> also has the effect of hiding any constant string on the right
+from "longest token" processing by C<|>. Only the left side is evaluated
+for initial constancy.
+
=item *
Backtracking over a triple colon causes the current regex to fail