In perl.git, the branch blead has been updated <http://perl5.git.perl.org/perl.git/commitdiff/f34228d67754c62c57c3533a692e29f905d8f15a?hp=42e9b60980bb8e29e76629e14c6aa945194c0647>
- Log ----------------------------------------------------------------- commit f34228d67754c62c57c3533a692e29f905d8f15a Author: Pali <[email protected]> Date: Sun Sep 18 17:52:36 2016 +0200 perluniintro: Use uppercase UTF-8 encoding name Reason is consistency with other documentation files. M pod/perluniintro.pod commit 2426b0d31ace94812fc516e3680266d55d1781c0 Author: Pali <[email protected]> Date: Sun Sep 18 17:45:57 2016 +0200 perluniintro: Fix comment, Encode::decode does not have to return string with UTF8 flag set M pod/perluniintro.pod commit 016b3422ad385c1af7faa4f0a71c244fdf999e5c Author: Pali <[email protected]> Date: Sun Sep 18 17:44:22 2016 +0200 perluniintro: Suggest to use utf8::decode() instead heavy Encode when sequence of bytes is valid UTF-8 M pod/perluniintro.pod commit 6d8e74506f971081362433e3d39fe2e4da9fb302 Author: Pali <[email protected]> Date: Sun Sep 18 17:19:59 2016 +0200 pod: Suggest to use strict :encoding(UTF-8) PerlIO layer over not strict :encoding(utf8) For data exchange it is better to use strict UTF-8 encoding and not perl's utf8. M lib/PerlIO.pm M lib/open.pm M pod/perlfunc.pod M pod/perlrun.pod M pod/perlunicode.pod M pod/perluniintro.pod ----------------------------------------------------------------------- Summary of changes: lib/PerlIO.pm | 4 ++-- lib/open.pm | 12 ++++++------ pod/perlfunc.pod | 10 +++++----- pod/perlrun.pod | 2 +- pod/perlunicode.pod | 2 +- pod/perluniintro.pod | 20 ++++++++++---------- 6 files changed, 25 insertions(+), 25 deletions(-) diff --git a/lib/PerlIO.pm b/lib/PerlIO.pm index 2e27f98bba..7658ce497b 100644 --- a/lib/PerlIO.pm +++ b/lib/PerlIO.pm @@ -1,6 +1,6 @@ package PerlIO; -our $VERSION = '1.09'; +our $VERSION = '1.10'; # Map layer name to package that defines it our %alias; @@ -104,7 +104,7 @@ is chosen to render simple text parts (i.e. non-accented letters, digits and common punctuation) human readable in the encoded file. (B<CAUTION>: This layer does not validate byte sequences. For reading input, -you should instead use C<:encoding(utf8)> instead of bare C<:utf8>.) +you should instead use C<:encoding(UTF-8)> instead of bare C<:utf8>.) Here is how to write your native data out using UTF-8 (or UTF-EBCDIC) and then read it back in. diff --git a/lib/open.pm b/lib/open.pm index fd22e1b9e7..ca3cf7b409 100644 --- a/lib/open.pm +++ b/lib/open.pm @@ -1,7 +1,7 @@ package open; use warnings; -our $VERSION = '1.10'; +our $VERSION = '1.11'; require 5.008001; # for PerlIO::get_layers() @@ -153,7 +153,7 @@ open - perl pragma to set default PerlIO layers for input and output use open IO => ':locale'; - use open ':encoding(utf8)'; + use open ':encoding(UTF-8)'; use open ':locale'; use open ':encoding(iso-8859-7)'; @@ -195,8 +195,8 @@ For example: These are equivalent - use open ':encoding(utf8)'; - use open IO => ':encoding(utf8)'; + use open ':encoding(UTF-8)'; + use open IO => ':encoding(UTF-8)'; as are these @@ -221,8 +221,8 @@ The C<:std> subpragma on its own has no effect, but if combined with the C<:utf8> or C<:encoding> subpragmas, it converts the standard filehandles (STDIN, STDOUT, STDERR) to comply with encoding selected for input/output handles. For example, if both input and out are -chosen to be C<:encoding(utf8)>, a C<:std> will mean that STDIN, STDOUT, -and STDERR are also in C<:encoding(utf8)>. On the other hand, if only +chosen to be C<:encoding(UTF-8)>, a C<:std> will mean that STDIN, STDOUT, +and STDERR are also in C<:encoding(UTF-8)>. On the other hand, if only output is chosen to be in C<< :encoding(koi8r) >>, a C<:std> will cause only the STDOUT and STDERR to be in C<koi8r>. The C<:locale> subpragma implicitly turns on C<:std>. diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 2b962aa9a3..6f62f3fb40 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -6090,7 +6090,7 @@ Note the I<characters>: depending on the status of the socket, either (8-bit) bytes or characters are received. By default all sockets operate on bytes, but for example if the socket has been changed using L<C<binmode>|/binmode FILEHANDLE, LAYER> to operate with the -C<:encoding(utf8)> I/O layer (see the L<open> pragma), the I/O will +C<:encoding(UTF-8)> I/O layer (see the L<open> pragma), the I/O will operate on UTF8-encoded Unicode characters, not bytes. Similarly for the C<:encoding> layer: in that case pretty much any characters can be read. @@ -6650,7 +6650,7 @@ of the file) from the L<Fcntl> module. Returns C<1> on success, false otherwise. Note the emphasis on bytes: even if the filehandle has been set to operate -on characters (for example using the C<:encoding(utf8)> I/O layer), the +on characters (for example using the C<:encoding(UTF-8)> I/O layer), the L<C<seek>|/seek FILEHANDLE,POSITION,WHENCE>, L<C<tell>|/tell FILEHANDLE>, and L<C<sysseek>|/sysseek FILEHANDLE,POSITION,WHENCE> @@ -6889,7 +6889,7 @@ Note the I<characters>: depending on the status of the socket, either (8-bit) bytes or characters are sent. By default all sockets operate on bytes, but for example if the socket has been changed using L<C<binmode>|/binmode FILEHANDLE, LAYER> to operate with the -C<:encoding(utf8)> I/O layer (see L<C<open>|/open FILEHANDLE,EXPR>, or +C<:encoding(UTF-8)> I/O layer (see L<C<open>|/open FILEHANDLE,EXPR>, or the L<open> pragma), the I/O will operate on UTF-8 encoded Unicode characters, not bytes. Similarly for the C<:encoding> layer: in that case pretty much any characters can be sent. @@ -8535,7 +8535,7 @@ to the current position plus POSITION; and C<2> to set it to EOF plus POSITION, typically negative. Note the emphasis on bytes: even if the filehandle has been set to operate -on characters (for example using the C<:encoding(utf8)> I/O layer), the +on characters (for example using the C<:encoding(UTF-8)> I/O layer), the L<C<seek>|/seek FILEHANDLE,POSITION,WHENCE>, L<C<tell>|/tell FILEHANDLE>, and L<C<sysseek>|/sysseek FILEHANDLE,POSITION,WHENCE> @@ -8702,7 +8702,7 @@ the actual filehandle. If FILEHANDLE is omitted, assumes the file last read. Note the emphasis on bytes: even if the filehandle has been set to operate -on characters (for example using the C<:encoding(utf8)> I/O layer), the +on characters (for example using the C<:encoding(UTF-8)> I/O layer), the L<C<seek>|/seek FILEHANDLE,POSITION,WHENCE>, L<C<tell>|/tell FILEHANDLE>, and L<C<sysseek>|/sysseek FILEHANDLE,POSITION,WHENCE> diff --git a/pod/perlrun.pod b/pod/perlrun.pod index 9d59a6af36..b4bb5a3c6a 100644 --- a/pod/perlrun.pod +++ b/pod/perlrun.pod @@ -1121,7 +1121,7 @@ A pseudolayer that enables a flag in the layer below to tell Perl that output should be in utf8 and that input should be regarded as already in valid utf8 form. B<WARNING: It does not check for validity and as such should be handled with extreme caution for input, because security violations -can occur with non-shortest UTF-8 encodings, etc.> Generally C<:encoding(utf8)> is +can occur with non-shortest UTF-8 encodings, etc.> Generally C<:encoding(UTF-8)> is the best option when reading UTF-8 encoded data. =item :win32 diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index ba5e312d02..23818a1ee4 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -1889,7 +1889,7 @@ work under 5.6, so you should be safe to try them out. A filehandle that should read or write UTF-8 if ($] > 5.008) { - binmode $fh, ":encoding(utf8)"; + binmode $fh, ":encoding(UTF-8)"; } =item * diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod index 5a865c9912..d35de34581 100644 --- a/pod/perluniintro.pod +++ b/pod/perluniintro.pod @@ -358,7 +358,7 @@ The C<Encode> module knows about many encodings and has interfaces for doing conversions between those encodings: use Encode 'decode'; - $data = decode("iso-8859-3", $data); # convert from legacy to utf-8 + $data = decode("iso-8859-3", $data); # convert from legacy =head2 Unicode I/O @@ -393,7 +393,7 @@ many encodings have several aliases. Note that the C<:utf8> layer must always be specified exactly like that; it is I<not> subject to the loose matching of encoding names. Also note that currently C<:utf8> is unsafe for input, because it accepts the data without validating that it is indeed valid -UTF-8; you should instead use C<:encoding(utf-8)> (with or without a +UTF-8; you should instead use C<:encoding(UTF-8)> (with or without a hyphen). See L<PerlIO> for the C<:utf8> layer, L<PerlIO::encoding> and @@ -406,7 +406,7 @@ Unicode or legacy encodings does not magically turn the data into Unicode in Perl's eyes. To do that, specify the appropriate layer when opening files - open(my $fh,'<:encoding(utf8)', 'anything'); + open(my $fh,'<:encoding(UTF-8)', 'anything'); my $line_of_unicode = <$fh>; open(my $fh,'<:encoding(Big5)', 'anything'); @@ -415,8 +415,8 @@ layer when opening files The I/O layers can also be specified more flexibly with the C<open> pragma. See L<open>, or look at the following example. - use open ':encoding(utf8)'; # input/output default encoding will be - # UTF-8 + use open ':encoding(UTF-8)'; # input/output default encoding will be + # UTF-8 open X, ">file"; print X chr(0x100), "\n"; close X; @@ -485,12 +485,12 @@ by repeatedly encoding the data: local $/; ## read in the whole file of 8-bit characters $t = <F>; close F; - open F, ">:encoding(utf8)", "file"; + open F, ">:encoding(UTF-8)", "file"; print F $t; ## convert to UTF-8 on output close F; If you run this code twice, the contents of the F<file> will be twice -UTF-8 encoded. A C<use open ':encoding(utf8)'> would have avoided the +UTF-8 encoded. A C<use open ':encoding(UTF-8)'> would have avoided the bug, or explicitly opening also the F<file> for input as UTF-8. B<NOTE>: the C<:utf8> and C<:encoding> features work only if your @@ -788,7 +788,7 @@ If you have a raw sequence of bytes that you know should be interpreted via a particular encoding, you can use C<Encode>: use Encode 'from_to'; - from_to($data, "iso-8859-1", "utf-8"); # from latin-1 to utf-8 + from_to($data, "iso-8859-1", "UTF-8"); # from latin-1 to UTF-8 The call to C<from_to()> changes the bytes in C<$data>, but nothing material about the nature of the string has changed as far as Perl is @@ -817,8 +817,8 @@ pack/unpack to convert to/from Unicode. If you have a sequence of bytes you B<know> is valid UTF-8, but Perl doesn't know it yet, you can make Perl a believer, too: - use Encode 'decode_utf8'; - $Unicode = decode_utf8($bytes); + $Unicode = $bytes; + utf8::decode($Unicode); or: -- Perl5 Master Repository
