In perl.git, the branch blead has been updated <https://perl5.git.perl.org/perl.git/commitdiff/7835a09a181366ad4d4188409a4c0e3a6236fcf5?hp=ac6d2595875ea2813009c120fd54eb70c9ed2b0a>
- Log ----------------------------------------------------------------- commit 7835a09a181366ad4d4188409a4c0e3a6236fcf5 Author: Karl Williamson <[email protected]> Date: Wed Feb 13 10:02:13 2019 -0700 perlrecharclass: Note many fewer xdigits than digts This adds a note explaining why there are only two sets of hex digits commit 4f5c9941bb6f93a967e4cc3ef19c9d39351f0ad3 Author: Karl Williamson <[email protected]> Date: Wed Feb 13 09:33:56 2019 -0700 perlrecharclass: Rmv obsolete RFC The deleted text asked for comments on a proposal that never went anywhere. commit 8a0ab3a428cea2944929ca5ef777479a404de162 Author: Karl Williamson <[email protected]> Date: Wed Feb 13 09:30:29 2019 -0700 perlrecharclass: Clarify See http://blogs.perl.org/users/tom_wyant/2019/01/untrusted-numeric-input.html ----------------------------------------------------------------------- Summary of changes: pod/perlrecharclass.pod | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod index fb9dc432b0..e07638844b 100644 --- a/pod/perlrecharclass.pod +++ b/pod/perlrecharclass.pod @@ -786,21 +786,21 @@ is valid and matches '0', '1', any alphabetic character, and the percent sign. Perl recognizes the following POSIX character classes: - alpha Any alphabetical character ("[A-Za-z]"). - alnum Any alphanumeric character ("[A-Za-z0-9]"). + alpha Any alphabetical character (e.g., [A-Za-z]). + alnum Any alphanumeric character (e.g., [A-Za-z0-9]). ascii Any character in the ASCII character set. blank A GNU extension, equal to a space or a horizontal tab ("\t"). cntrl Any control character. See Note [2] below. - digit Any decimal digit ("[0-9]"), equivalent to "\d". + digit Any decimal digit (e.g., [0-9]), equivalent to "\d". graph Any printable character, excluding a space. See Note [3] below. - lower Any lowercase character ("[a-z]"). + lower Any lowercase character (e.g., [a-z]). print Any printable character, including a space. See Note [4] below. punct Any graphical character excluding "word" characters. Note [5]. space Any whitespace character. "\s" including the vertical tab ("\cK"). - upper Any uppercase character ("[A-Z]"). - word A Perl extension ("[A-Za-z0-9_]"), equivalent to "\w". - xdigit Any hexadecimal digit ("[0-9a-fA-F]"). + upper Any uppercase character (e.g., [A-Z]). + word A Perl extension (e.g., [A-Za-z0-9_]), equivalent to "\w". + xdigit Any hexadecimal digit (e.g., [0-9a-fA-F]). Note [7]. Like the L<Unicode properties|/Unicode Properties>, most of the POSIX properties match the same regardless of whether case-insensitive (C</i>) @@ -841,7 +841,7 @@ equivalent. space \p{PosixSpace} \p{XPosixSpace} [6] upper \p{PosixUpper} \p{XPosixUpper} word \p{PosixWord} \p{XPosixWord} \w - xdigit \p{PosixXDigit} \p{XPosixXDigit} + xdigit \p{PosixXDigit} \p{XPosixXDigit} [7] =over 4 @@ -896,6 +896,16 @@ v5.18. In earlier versions, these differ only in that in non-locale matching, C<\p{XPerlSpace}> did not match the vertical tab, C<\cK>. Same for the two ASCII-only range forms. +=item [7] + +Unlike C<[[:digit:]]> which matches digits in many writing systems, such +as Thai and Devanagari, there are currently only two sets of hexadecimal +digits, and it is unlikely that more will be added. This is because you +not only need the ten digits, but also the six C<[A-F]> (and C<[a-f]>) +to correspond. That means only the Latin script is suitable for these, +and Unicode has only two sets of these, the familiar ASCII set, and the +fullwidth forms starting at U+FF10 (FULLWIDTH DIGIT ZERO). + =back There are various other synonyms that can be used besides the names @@ -969,12 +979,6 @@ The POSIX class matches the same as the ASCII range counterpart. Which rules apply are determined as described in L<perlre/Which character set modifier is in effect?>. -It is proposed to change this behavior in a future release of Perl so that -whether or not Unicode rules are in effect would not change the -behavior: Outside of locale, the POSIX classes -would behave like their ASCII-range counterparts. If you wish to -comment on this proposal, send email to C<[email protected]>. - =head4 Negation of POSIX character classes X<character class, negation> -- Perl5 Master Repository
