Author: finanalyst Date: 2009-03-08 09:43:16 +0100 (Sun, 08 Mar 2009) New Revision: 25745
Modified: docs/Perl6/Spec/S05-regex.pod Log: Added descriptions to standard rules, regrouped rules, added new rule <word> to enable definition of <wb>. Eliminated deprecated <lt><gt><dot><sp>. Changed deprecated <null> to <?> and deprecated <fail> to <!>. Modified: docs/Perl6/Spec/S05-regex.pod =================================================================== --- docs/Perl6/Spec/S05-regex.pod 2009-03-08 01:00:34 UTC (rev 25744) +++ docs/Perl6/Spec/S05-regex.pod 2009-03-08 08:43:16 UTC (rev 25745) @@ -14,9 +14,9 @@ Maintainer: Patrick Michaud <pmich...@pobox.com> and Larry Wall <la...@wall.org> Date: 24 Jun 2002 - Last Modified: 4 Mar 2009 + Last Modified: 8 Mar 2009 Number: 5 - Version: 88 + Version: 89 This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them I<regex> rather than "regular @@ -688,7 +688,7 @@ =item * The C<~> operator is a helper for matching nested subrules with a -specific terminator as the goal. It appears to be placed between the +specific terminator as the goal. It is designed to be placed between an opening and closing bracket, like so: '(' ~ ')' <expression> @@ -713,8 +713,10 @@ Note that you can use this construct to set up expectations for a closing construct even when there's no opening bracket: - <null> ~ ')' \d+ + <?> ~ ')' \d+ +Here <?> returns true on the first null string. + By default the error message uses the name of the current rule as an indicator of the abstract goal of the parser at that point. However, often this is not terribly informative, especially when rules are named @@ -1625,62 +1627,88 @@ =item * ident -=item * null +Match an identifier. -Deprecated, use <?> +=item * upper -=item * fail +Match a single uppercase character. -Deprecated, use <!> +=item * lower -=item * upper +Match a single lowercase character. -=item * lower - =item * alpha +Match a single alphabetic character. + =item * digit +Match a single digit. + =item * xdigit -=item * space +Match a single hexadecimal digit. =item * print +Match a single printable character. + =item * graph -=item * blank +Match a single "graphical" character. =item * cntrl +Match a single "control" character. A control character is usually one that doesn't produce output as such but instead controls the terminal somehow: for example newline and backspace are control characters. All characters with ord() less than 32 are usually classified as control characters (assuming ASCII, the ISO Latin character sets, and Unicode), as is the character with the ord() value of 127 (DEL ). + =item * punct +Match a single punctuation character. + =item * alnum -=item * sp +Match a single alphanumeric character. This is equivalent to <+alpha +digit> . -Deprecated, use ' ' += item * word -=item * lt +Match a single word character, viz., alphanumeric plus '_'. -Deprecated, use '<' +=item * wb -=item * gt +Returns true if we're at a word boundary. A word boundary is a spot between two characters, one which matches <word>, the other matches <-word>, counting the imaginary characters off the beginning and end of the string as matching a <-word>. -Deprecated, use '>' +=item * ws -=item * dot +Match whitespace between tokens. -Deprecated, use '.' +=item * space -=item * ws +Match a single whitespace character. Hence C< <ws> > is equivalent to C< <space>+ >. -=item * wb +=item * blank -=item * before +Match a single "blank" character. Differs from C< <space> > which include C<tab>'s etc. -=item * after +=item * before C<pattern> +Perform lookahead -- i.e., check if we're at a position where +C<pattern> matches. Returns a zero-width Match object on +success. + +=item * after C<pattern> + +Perform lookbehind -- i.e., check if the string before the +current position matches <pattern> (anchored at the end). +Returns a zero-width Match object on success. + +=item * <?> + +Match a null string, viz., always returns true + +=item * <!> + +Inverse of <?>, viz., always returns false. + =back =head1 Backslash reform