Author: larry
Date: Mon Oct 6 18:15:17 2008
New Revision: 14588
Modified:
doc/trunk/design/syn/S05.pod
Log:
Added ~ twiddle macro to make it easier to write bracketing constructs.
Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod (original)
+++ doc/trunk/design/syn/S05.pod Mon Oct 6 18:15:17 2008
@@ -14,9 +14,9 @@
Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
Larry Wall <[EMAIL PROTECTED]>
Date: 24 Jun 2002
- Last Modified: 7 Jul 2008
+ Last Modified: 6 Oct 2008
Number: 5
- Version: 83
+ Version: 84
This document summarizes Apocalypse 5, which is about the new regex
syntax. We now try to call them I<regex> rather than "regular
@@ -685,6 +685,59 @@
[ <ident> !~~ 'moose' ] || 'squirrel'
+=item *
+
+The C<~> operator is a helper for matching nested subrules with a
+specific terminator as the goal. It appears to be placed between the
+opening and closing bracket, like so:
+
+ '(' ~ ')' <expression>
+
+However, it mostly ignores the left argument, and operates on the next
+two atoms (which may be quantified). Its operation on those next
+two atoms is to "twiddle" them so that they are actually matched in
+reverse order. Hence the expression above, at first blush, is merely
+shortand for:
+
+ '(' <expression> ')'
+
+But beyond that, when it rewrites the atoms it also inserts the
+apparatus that will set up the inner expression to recognize the
+terminator, and to produce an appropriate error message if the
+inner expression does not terminate on the required closing atom.
+So it really does pay attention to the left bracket as well, and it
+actually rewrites our example to something more like:
+
+ $<OPEN> = '(' <SETGOAL: ')'> <expression> [ $GOAL || <FAILGOAL> ]
+
+Note that you can use this construct to set up expectations for
+a closing construct even when there's no opening bracket:
+
+ <null> ~ ')' \d+
+
+By default the error message uses the name of the current rule as an
+indicator of the abstract goal of the parser at that point. However,
+often this is not terribly informative, especially when rules are named
+according to an internal scheme that will not make sense to the user.
+The C<:dba> ("doing business as") adverb may be used to set up a more
informative name for
+what the following code is trying to parse:
+
+ token postfix:sym<[ ]> {
+ :dba<array subscript>
+ '[' ~ ']' <expression>
+ }
+
+Then instead of getting a message like:
+
+ Unable to parse expression in postfix:sym<[ ]>; couldn't find final ']'
+
+you'll get a message like:
+
+ Unable to parse expression in array subscript; couldn't find final ']'
+
+(The C<:dba> adverb may also be used to give names to alternations
+and alternatives, which helps the lexer give better error messages.)
+
=back
=head1 Bracket rationalization