Author: larry Date: Mon Oct 6 18:15:17 2008 New Revision: 14588 Modified: doc/trunk/design/syn/S05.pod
Log: Added ~ twiddle macro to make it easier to write bracketing constructs. Modified: doc/trunk/design/syn/S05.pod ============================================================================== --- doc/trunk/design/syn/S05.pod (original) +++ doc/trunk/design/syn/S05.pod Mon Oct 6 18:15:17 2008 @@ -14,9 +14,9 @@ Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and Larry Wall <[EMAIL PROTECTED]> Date: 24 Jun 2002 - Last Modified: 7 Jul 2008 + Last Modified: 6 Oct 2008 Number: 5 - Version: 83 + Version: 84 This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them I<regex> rather than "regular @@ -685,6 +685,59 @@ [ <ident> !~~ 'moose' ] || 'squirrel' +=item * + +The C<~> operator is a helper for matching nested subrules with a +specific terminator as the goal. It appears to be placed between the +opening and closing bracket, like so: + + '(' ~ ')' <expression> + +However, it mostly ignores the left argument, and operates on the next +two atoms (which may be quantified). Its operation on those next +two atoms is to "twiddle" them so that they are actually matched in +reverse order. Hence the expression above, at first blush, is merely +shortand for: + + '(' <expression> ')' + +But beyond that, when it rewrites the atoms it also inserts the +apparatus that will set up the inner expression to recognize the +terminator, and to produce an appropriate error message if the +inner expression does not terminate on the required closing atom. +So it really does pay attention to the left bracket as well, and it +actually rewrites our example to something more like: + + $<OPEN> = '(' <SETGOAL: ')'> <expression> [ $GOAL || <FAILGOAL> ] + +Note that you can use this construct to set up expectations for +a closing construct even when there's no opening bracket: + + <null> ~ ')' \d+ + +By default the error message uses the name of the current rule as an +indicator of the abstract goal of the parser at that point. However, +often this is not terribly informative, especially when rules are named +according to an internal scheme that will not make sense to the user. +The C<:dba> ("doing business as") adverb may be used to set up a more informative name for +what the following code is trying to parse: + + token postfix:sym<[ ]> { + :dba<array subscript> + '[' ~ ']' <expression> + } + +Then instead of getting a message like: + + Unable to parse expression in postfix:sym<[ ]>; couldn't find final ']' + +you'll get a message like: + + Unable to parse expression in array subscript; couldn't find final ']' + +(The C<:dba> adverb may also be used to give names to alternations +and alternatives, which helps the lexer give better error messages.) + =back =head1 Bracket rationalization