In perl.git, the branch blead has been updated <http://perl5.git.perl.org/perl.git/commitdiff/f6067adc61108c3398de698bb0294d95f09b55ef?hp=41c3b428c4a3ce29a0f80c7f63eda133089137de>
- Log ----------------------------------------------------------------- commit f6067adc61108c3398de698bb0294d95f09b55ef Author: Karl Williamson <[email protected]> Date: Mon Mar 19 23:15:07 2012 -0600 charnames: Clarify viacode pod This mentions that viacode's return can change as a result of corrections to the Unicode standard. M lib/charnames.pm commit ffec675822f6354e94f29a96daa07ef9465a43bc Author: Karl Williamson <[email protected]> Date: Mon Mar 19 23:14:28 2012 -0600 charnames pod: slight rewording M lib/charnames.pm commit 228e8c7b6cef1e12cb12f45083fdcae7b35fba27 Author: Karl Williamson <[email protected]> Date: Mon Mar 19 22:10:18 2012 -0600 charnames: re-order pod sections This merely moves one =head1 section to later in the pod, so that future changes will make more sense; and it has to bump the version. M lib/_charnames.pm M lib/charnames.pm ----------------------------------------------------------------------- Summary of changes: lib/_charnames.pm | 2 +- lib/charnames.pm | 109 ++++++++++++++++++++++++++++++++++------------------ 2 files changed, 72 insertions(+), 39 deletions(-) diff --git a/lib/_charnames.pm b/lib/_charnames.pm index 02dbef0..d29af30 100644 --- a/lib/_charnames.pm +++ b/lib/_charnames.pm @@ -7,7 +7,7 @@ package _charnames; use strict; use warnings; use File::Spec; -our $VERSION = '1.29'; +our $VERSION = '1.30'; use unicore::Name; # mktables-generated algorithmically-defined names use bytes (); # for $bytes::hint_bits diff --git a/lib/charnames.pm b/lib/charnames.pm index 07c1b70..495c303 100644 --- a/lib/charnames.pm +++ b/lib/charnames.pm @@ -1,7 +1,7 @@ package charnames; use strict; use warnings; -our $VERSION = '1.29'; +our $VERSION = '1.30'; use unicore::Name; # mktables-generated algorithmically-defined names use _charnames (); # The submodule for this where most of the work gets done @@ -328,43 +328,6 @@ Also, both these methods currently allow only single characters to be named. To name a sequence of characters, use a L<custom translator|/CUSTOM TRANSLATORS> (described below). -=head1 charnames::viacode(I<code>) - -Returns the full name of the character indicated by the numeric code. -For example, - - print charnames::viacode(0x2722); - -prints "FOUR TEARDROP-SPOKED ASTERISK". - -The name returned is the official name for the code point, if -available; otherwise your custom alias for it. This means that your -alias will only be returned for code points that don't have an official -Unicode name (nor alias) such as private use code points. -Until Unicode 6.1, the 4 control characters U+0080, U+0081, U+0084, and U+0099 -did not have names (actually, to be precise they still don't, but they do have -aliases, which for most purposes are indistiunguishable from true names). -To preserve backwards compatibility, any alias you define for these code -points will be returned by this function, in preference to the official alias. - -If you define more than one name for the code point, it is indeterminate -which one will be returned. - -The function returns C<undef> if no name is known for the code point. -In Unicode the proper name of these is the empty string, which -C<undef> stringifies to. (If you ask for a code point past the legal -Unicode maximum of U+10FFFF that you haven't assigned an alias to, you -get C<undef> plus a warning.) - -The input number must be a non-negative integer, or a string beginning -with C<"U+"> or C<"0x"> with the remainder considered to be a -hexadecimal integer. A literal numeric constant must be unsigned; it -will be interpreted as hex if it has a leading zero or contains -non-decimal hex digits; otherwise it will be interpreted as decimal. - -Notice that the name returned for U+FEFF is "ZERO WIDTH NO-BREAK -SPACE", not "BYTE ORDER MARK". - =head1 charnames::string_vianame(I<name>) This is a runtime equivalent to C<\N{...}>. I<name> can be any expression @@ -397,6 +360,76 @@ character, even ones that aren't legal under the C<S<use bytes>> pragma, See L</BUGS> for the circumstances in which the behavior differs from that described above. +=head1 charnames::viacode(I<code>) + +Returns the full name of the character indicated by the numeric code. +For example, + + print charnames::viacode(0x2722); + +prints "FOUR TEARDROP-SPOKED ASTERISK". + +The name returned is the "best" (defined below) official name or alias +for the code point, if +available; otherwise your custom alias for it, if defined; otherwise C<undef>. +This means that your alias will only be returned for code points that don't +have an official Unicode name (nor alias) such as private use code points. + +If you define more than one name for the code point, it is indeterminate +which one will be returned. + +As mentioned, the function returns C<undef> if no name is known for the code +point. In Unicode the proper name of these is the empty string, which +C<undef> stringifies to. (If you ask for a code point past the legal +Unicode maximum of U+10FFFF that you haven't assigned an alias to, you +get C<undef> plus a warning.) + +The input number must be a non-negative integer, or a string beginning +with C<"U+"> or C<"0x"> with the remainder considered to be a +hexadecimal integer. A literal numeric constant must be unsigned; it +will be interpreted as hex if it has a leading zero or contains +non-decimal hex digits; otherwise it will be interpreted as decimal. + +As mentioned above under L</ALIASES>, Unicode 6.1 defines extra names +(synonyms or aliases) for some code points, most of which were already +available as Perl extensions. All these are accepted by C<\N{...}> and the +other functions in this module, but C<viacode> has to choose which one +name to return for a given input code point, so it returns the "best" name. +To understand how this works, it is helpful to know more about the Unicode +name properties. All code points actually have only a single name, which +(starting in Unicode 2.0) can never change once a character has been assigned +to the code point. But mistakes have been made in assigning names, for +example sometimes a clerical error was made during the publishing of the +Standard which caused words to be misspelled, and there was no way to correct +those. The Name_Alias property was eventually created to handle these +situations. If a name was wrong, a corrected synonym would be published for +it, using Name_Alias. C<viacode> will return that corrected synonym as the +"best" name for a code point. (It is even possible, though it hasn't happened +yet, that the correction itself will need to be corrected, and so another +Name_Alias can be created for that code point; C<viacode> will return the +most recent correction.) + +The Unicode name for each of the control characters (such as LINE FEED) is the +empty string. However almost all had names assigned by other standards, such +as the ASCII Standard, or were in common use. C<viacode> returns these names +as the "best" ones available. Unicode 6.1 has created Name_Aliases for each +of them, including alternate names, like NEW LINE. C<viacode> uses the +original name, "LINE FEED" in preference to the alternate. Similarly the +name returned for U+FEFF is "ZERO WIDTH NO-BREAK SPACE", not "BYTE ORDER +MARK". + +Until Unicode 6.1, the 4 control characters U+0080, U+0081, U+0084, and U+0099 +did not have names nor aliases. +To preserve backwards compatibility, any alias you define for these code +points will be returned by this function, in preference to the official name. + +Some code points also have abbreviated names, such as "LF" or "NL". +C<viacode> never returns these. + +Because a name correction may be added in future Unicode releases, the name +that C<viacode> returns may change as a result. This is a rare event, but it +does happen. + =head1 CUSTOM TRANSLATORS The mechanism of translation of C<\N{...}> escapes is general and not -- Perl5 Master Repository
