Author: larry Date: Mon Oct 22 15:31:16 2007 New Revision: 14466 Modified: doc/trunk/design/syn/S05.pod
Log: Scanning behavior of method regex forms unrelated to :ratchet notes PerlJam++ Modified: doc/trunk/design/syn/S05.pod ============================================================================== --- doc/trunk/design/syn/S05.pod (original) +++ doc/trunk/design/syn/S05.pod Mon Oct 22 15:31:16 2007 @@ -14,9 +14,9 @@ Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and Larry Wall <[EMAIL PROTECTED]> Date: 24 Jun 2002 - Last Modified: 13 Sep 2007 + Last Modified: 22 Oct 2007 Number: 5 - Version: 66 + Version: 67 This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them I<regex> rather than "regular @@ -431,23 +431,6 @@ (Note: for portions of patterns subject to longest-token analysis, a C<:> is ignored in any case, since there will be no backtracking necessary.) -The C<:ratchet> modifier also implies that the anchoring on either -end is controlled by context. When a ratcheted regex is called as -a subrule, the front is anchored to the current position (as with -C<:p>), while the end is not anchored, since the calling context -will likely wish to continue parsing. However, when a ratcheted -regex is called directly, it is automatically anchored on both ends. -(You may override this with an explicit C<:p> or C<:c>.) Thus, -you can do direct pattern matching using a token or rule: - - $string ~~ token { \d+ } - $string ~~ rule { \d+ } - -and these are equivalent to - - $string ~~ m/^ \d+: $/; - $string ~~ m/^ <.ws> \d+: <.ws> $/; - =item * The new C<:panic> modifier causes this regex and all invoked subrules @@ -1747,7 +1730,7 @@ =back -=head1 Named Regexes +=head1 Regex Routines, Named and Anonymous =over @@ -1782,6 +1765,42 @@ rule identification { [soft|hard]ware <type> <serial_number> } +These keyword-declared regexes are officially of type C<Method>, +which is derived from C<Routine>. + +In general, the anchoring of any subrule call is controlled by context. +When a regex, token, or rule method is called as a subrule, the +front is anchored to the current position (as with C<:p>), while +the end is not anchored, since the calling context will likely wish +to continue parsing. However, when such a method is smartmatched +directly, it is automatically anchored on both ends to the beginning +and end of the string. Thus, you can do direct pattern matching +by using an anonymous regex routine as a standalone pattern: + + $string ~~ regex { \d+ } + $string ~~ token { \d+ } + $string ~~ rule { \d+ } + +and these are equivalent to + + $string ~~ m/^ \d+ $/; + $string ~~ m/^ \d+: $/; + $string ~~ m/^ <.ws> \d+: <.ws> $/; + +The basic rule of thumb is that the keyword-defined methods never +do implicit C<.*?>-like scanning, while the C<m//> and C<s//> +quotelike forms do such scanning in the absence of explicit anchoring. + +The C<rx//> and C<//> forms can go either way: they scan when used +directly within a smartmatch or boolean context, but when called +indirectly as a subrule they do not scan. That is, the object returned +by C<rx//> behaves like C<m//> when used directly, but like C<regex> +C<{}> when used as a subrule: + + $pattern = rx/foo/; + $string ~~ $pattern; # equivalent to m/foo/; + $string ~~ /'[' <$pattern> ']'/ # equivalent to /'[foo]'/ + =back =head1 Nothing is illegal