Author: larry
Date: Thu Jan 17 10:22:06 2008
New Revision: 14490
Modified:
doc/trunk/design/syn/S05.pod
Log:
Clarifications suggested by moritz++ and rhr++
Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod (original)
+++ doc/trunk/design/syn/S05.pod Thu Jan 17 10:22:06 2008
@@ -14,9 +14,9 @@
Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
Larry Wall <[EMAIL PROTECTED]>
Date: 24 Jun 2002
- Last Modified: 10 Jan 2008
+ Last Modified: 17 Jan 2008
Number: 5
- Version: 70
+ Version: 71
This document summarizes Apocalypse 5, which is about the new regex
syntax. We now try to call them I<regex> rather than "regular
@@ -192,7 +192,9 @@
The C<:ii> variant may be used on a substitution to change the
substituted string to the same case pattern as the matched string.
-Case info is carried across on a character by character basis. If the
+
+If the pattern is matched without the C<:sigspace> modifier, case
+info is carried across on a character by character basis. If the
right string is longer than the left one, the case of the final
character is replicated. Titlecase is carried across if possible
regardless of whether the resulting letter is at the beginning of
@@ -200,7 +202,28 @@
corresponding uppercase character is used. (This policy can be
modified within a lexical scope by a language-dependent Unicode
declaration to substitute titlecase according to the orthographic
-rules of the specified language.)
+rules of the specified language.) Characters that carry no case
+information leave their corresponding replacement character unchanged.
+
+If the pattern is matched with C<:sigspace>, then a slightly smarter
+algorithm is used which attempts to determine if there is a uniform
+capitalization policy over each matched word, and applies the same
+policy to each replacement word. If there doesn't seem to be a uniform
+policy on the left, the policy for each word is carried over word by
+word, with the last pattern word replicated if necessary. If a word
+does not appear to have a recognizable policy, the replacement word
+is translated character for character as in the non-sigspace case.
+Recognized policies include:
+
+ lc()
+ uc()
+ ucfirst(lc())
+ lcfirst(uc())
+ capitalize()
+
+In any case, only the officially matched string part of the pattern
+match counts, so any sort of lookahead or contextual matching is not
+included in the analysis.
=item *
@@ -220,6 +243,7 @@
the right string is longer than the left one, the remaining characters
are substituted without any modification. (Note that NFD/NFC distinctions
are usually immaterial, since Perl encapsulates that in grapheme mode.)
+Under C<:sigspace> the preceding rules are applied word by word.
=item *