Author: larry
Date: Mon Oct  6 18:15:17 2008
New Revision: 14588

Modified:
   doc/trunk/design/syn/S05.pod

Log:
Added ~ twiddle macro to make it easier to write bracketing constructs.


Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod        (original)
+++ doc/trunk/design/syn/S05.pod        Mon Oct  6 18:15:17 2008
@@ -14,9 +14,9 @@
    Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
                Larry Wall <[EMAIL PROTECTED]>
    Date: 24 Jun 2002
-   Last Modified: 7 Jul 2008
+   Last Modified: 6 Oct 2008
    Number: 5
-   Version: 83
+   Version: 84
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I<regex> rather than "regular
@@ -685,6 +685,59 @@
 
     [ <ident> !~~ 'moose' ] || 'squirrel'
 
+=item *
+
+The C<~> operator is a helper for matching nested subrules with a
+specific terminator as the goal.  It appears to be placed between the
+opening and closing bracket, like so:
+
+    '(' ~ ')' <expression>
+
+However, it mostly ignores the left argument, and operates on the next
+two atoms (which may be quantified).  Its operation on those next
+two atoms is to "twiddle" them so that they are actually matched in
+reverse order.  Hence the expression above, at first blush, is merely
+shortand for:
+
+    '(' <expression> ')'
+
+But beyond that, when it rewrites the atoms it also inserts the
+apparatus that will set up the inner expression to recognize the
+terminator, and to produce an appropriate error message if the
+inner expression does not terminate on the required closing atom.
+So it really does pay attention to the left bracket as well, and it
+actually rewrites our example to something more like:
+
+    $<OPEN> = '(' <SETGOAL: ')'> <expression> [ $GOAL || <FAILGOAL> ]
+
+Note that you can use this construct to set up expectations for
+a closing construct even when there's no opening bracket:
+
+    <null> ~ ')' \d+
+
+By default the error message uses the name of the current rule as an
+indicator of the abstract goal of the parser at that point.  However,
+often this is not terribly informative, especially when rules are named
+according to an internal scheme that will not make sense to the user.
+The C<:dba> ("doing business as") adverb may be used to set up a more 
informative name for
+what the following code is trying to  parse:
+
+    token postfix:sym<[ ]> {
+       :dba<array subscript>
+       '[' ~ ']' <expression>
+    }
+
+Then instead of getting a message like:
+
+    Unable to parse expression in postfix:sym<[ ]>; couldn't find final ']'
+
+you'll get a message like:
+
+    Unable to parse expression in array subscript; couldn't find final ']'
+
+(The C<:dba> adverb may also be used to give names to alternations
+and alternatives, which helps the lexer give better error messages.)
+
 =back
 
 =head1 Bracket rationalization

Reply via email to