In perl.git, the branch blead has been updated

<https://perl5.git.perl.org/perl.git/commitdiff/7835a09a181366ad4d4188409a4c0e3a6236fcf5?hp=ac6d2595875ea2813009c120fd54eb70c9ed2b0a>

- Log -----------------------------------------------------------------
commit 7835a09a181366ad4d4188409a4c0e3a6236fcf5
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 13 10:02:13 2019 -0700

    perlrecharclass: Note many fewer xdigits than digts
    
    This adds a note explaining why there are only two sets of hex digits

commit 4f5c9941bb6f93a967e4cc3ef19c9d39351f0ad3
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 13 09:33:56 2019 -0700

    perlrecharclass: Rmv obsolete RFC
    
    The deleted text asked for comments on a proposal that never went
    anywhere.

commit 8a0ab3a428cea2944929ca5ef777479a404de162
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 13 09:30:29 2019 -0700

    perlrecharclass: Clarify
    
    See 
http://blogs.perl.org/users/tom_wyant/2019/01/untrusted-numeric-input.html

-----------------------------------------------------------------------

Summary of changes:
 pod/perlrecharclass.pod | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod
index fb9dc432b0..e07638844b 100644
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@@ -786,21 +786,21 @@ is valid and matches '0', '1', any alphabetic character, 
and the percent sign.
 
 Perl recognizes the following POSIX character classes:
 
- alpha  Any alphabetical character ("[A-Za-z]").
- alnum  Any alphanumeric character ("[A-Za-z0-9]").
+ alpha  Any alphabetical character (e.g., [A-Za-z]).
+ alnum  Any alphanumeric character (e.g., [A-Za-z0-9]).
  ascii  Any character in the ASCII character set.
  blank  A GNU extension, equal to a space or a horizontal tab ("\t").
  cntrl  Any control character.  See Note [2] below.
- digit  Any decimal digit ("[0-9]"), equivalent to "\d".
+ digit  Any decimal digit (e.g., [0-9]), equivalent to "\d".
  graph  Any printable character, excluding a space.  See Note [3] below.
- lower  Any lowercase character ("[a-z]").
+ lower  Any lowercase character (e.g., [a-z]).
  print  Any printable character, including a space.  See Note [4] below.
  punct  Any graphical character excluding "word" characters.  Note [5].
  space  Any whitespace character. "\s" including the vertical tab
         ("\cK").
- upper  Any uppercase character ("[A-Z]").
- word   A Perl extension ("[A-Za-z0-9_]"), equivalent to "\w".
- xdigit Any hexadecimal digit ("[0-9a-fA-F]").
+ upper  Any uppercase character (e.g., [A-Z]).
+ word   A Perl extension (e.g., [A-Za-z0-9_]), equivalent to "\w".
+ xdigit Any hexadecimal digit (e.g., [0-9a-fA-F]).  Note [7].
 
 Like the L<Unicode properties|/Unicode Properties>, most of the POSIX
 properties match the same regardless of whether case-insensitive (C</i>)
@@ -841,7 +841,7 @@ equivalent.
    space      \p{PosixSpace}       \p{XPosixSpace}          [6]
    upper      \p{PosixUpper}       \p{XPosixUpper}
    word       \p{PosixWord}        \p{XPosixWord}   \w
-   xdigit     \p{PosixXDigit}      \p{XPosixXDigit}
+   xdigit     \p{PosixXDigit}      \p{XPosixXDigit}         [7]
 
 =over 4
 
@@ -896,6 +896,16 @@ v5.18.  In earlier versions, these differ only in that in 
non-locale
 matching, C<\p{XPerlSpace}> did not match the vertical tab, C<\cK>.
 Same for the two ASCII-only range forms.
 
+=item [7]
+
+Unlike C<[[:digit:]]> which matches digits in many writing systems, such
+as Thai and Devanagari, there are currently only two sets of hexadecimal
+digits, and it is unlikely that more will be added.  This is because you
+not only need the ten digits, but also the six C<[A-F]> (and C<[a-f]>)
+to correspond.  That means only the Latin script is suitable for these,
+and Unicode has only two sets of these, the familiar ASCII set, and the
+fullwidth forms starting at U+FF10 (FULLWIDTH DIGIT ZERO).
+
 =back
 
 There are various other synonyms that can be used besides the names
@@ -969,12 +979,6 @@ The POSIX class matches the same as the ASCII range 
counterpart.
 Which rules apply are determined as described in
 L<perlre/Which character set modifier is in effect?>.
 
-It is proposed to change this behavior in a future release of Perl so that
-whether or not Unicode rules are in effect would not change the
-behavior:  Outside of locale, the POSIX classes
-would behave like their ASCII-range counterparts.  If you wish to
-comment on this proposal, send email to C<[email protected]>.
-
 =head4 Negation of POSIX character classes
 X<character class, negation>
 

-- 
Perl5 Master Repository

Reply via email to