Author: larry Date: Mon Oct 9 15:35:46 2006 New Revision: 12964 Modified: doc/trunk/design/syn/S05.pod
Log: <!alpha> is not the same as <-alpha>, spotted by putter++ Made some of the whitespace rules more explicit. Modified: doc/trunk/design/syn/S05.pod ============================================================================== --- doc/trunk/design/syn/S05.pod (original) +++ doc/trunk/design/syn/S05.pod Mon Oct 9 15:35:46 2006 @@ -16,7 +16,7 @@ Date: 24 Jun 2002 Last Modified: 9 Oct 2006 Number: 5 - Version: 37 + Version: 38 This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them I<regex> rather than "regular @@ -691,16 +691,18 @@ To pass a string with leading whitespace you must use the parenthesized form. -If the first character is a plus or minus, the initial identifier is taken -as a character class, so +If the first character is a plus or minus, the initial identifier +is taken as a character class, so the first character after the +identifier doesn't matter in this case, and you can use whitespace +however you like. Therefore <foo+bar-baz> -is equivalent to +can be written - <+foo+bar-baz> + <+ foo + bar - baz> -(See below.) +Likewise an initial left square bracket indicates character class syntax. (See below.) =item * @@ -882,6 +884,10 @@ / <[a..z_]>* / / <+[a..z_]>* / + / <+[ a..z _ ]>* / + / <+ [ a .. z _ ] >* / + +Whitespace is ignored within square brackets and after the initial C<+>. =item * @@ -893,12 +899,14 @@ / <![a..z_]> . <!alpha> . / +Whitespace is ignored after the initial C<->. + =item * Character classes can be combined (additively or subtractively) within -a single set of angle brackets. For example: +a single set of angle brackets. Whitespace is ignored. For example: - / <[a..z]-[aeiou]+xdigit> / # consonant or hex digit + / <[a..z] - [aeiou] + xdigit> / # consonant or hex digit If such a combination starts with a named character class, a leading C<+> is allowed but not required, provided the next character is a @@ -906,6 +914,12 @@ / <+alpha-[Jj]> / # J-less alpha / <alpha-[Jj]> / # same thing + / <+ alpha - [ Jj ]> / # still the same thing + +However, whitespace is not allowed after the first identifier if it +immediately follows the left angle. + + / <alpha - [Jj]> / # WRONG, means <alpha(/- [Jj]/)> =item * @@ -955,7 +969,9 @@ / <!before _ > / # We aren't before an _ Note that C<< <!alpha> >> is different from C<< <-alpha> >> because the -latter matches C</./> when it is not an alpha. +latter matches C</./> when it is not an alpha. Note also that as a +metacharacter C<!> doesn't change the parsing rules of whatever follows +(unlike, say, C<+> or C<->). =back @@ -995,6 +1011,11 @@ these are dependent on the definition of C<< <ws> >>, but only on the C<\s> definition of whitespace.) +item * + +A C<< < >> followed by whitespace is illegal. Use C<< \< >> to match a literal +left angle. + =back =head1 Backslash reform @@ -1004,7 +1025,7 @@ =item * The C<\p> and C<\P> properties become intrinsic grammar rules such as -(C<< <alpha> >> and C<< <!alpha> >>). They may be combined using the +(C<< <alpha> >> and C<< <-alpha> >>). They may be combined using the above-mentioned character class notation: C<< <[_]+alpha+digit> >>. Regardless of the higher-level character class names, low-level Unicode properties are always available with a prefix of C<is>.