In perl.git, the branch blead has been updated <http://perl5.git.perl.org/perl.git/commitdiff/c83d5090d8022c6cf4240c0a13309bcd1ccbfaed?hp=c1ac151ab04e29c3a8a22e7035487bd0d8793f63>
- Log ----------------------------------------------------------------- commit c83d5090d8022c6cf4240c0a13309bcd1ccbfaed Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 25 22:33:22 2017 -0700 perlapi: Fix grammar M toke.c commit 8e179dd8df306c5088bf6c15b494826d48278928 Author: Pali <p...@cpan.org> Date: Sun Sep 18 17:25:48 2016 +0200 pod: Suggest to use strict UTF-8 encoding when dealing with external data For data exchange it is not good idea to use not strict perl's extended dialect of utf8 encoding. M pod/perldiag.pod M pod/perlfunc.pod M pod/perlpacktut.pod M pod/perlunicode.pod M pod/perlunicook.pod M pod/perlunifaq.pod M pod/perluniintro.pod commit 96b108235b7a4c239dbc0251abf17c3ef015c4d8 Author: Pali <p...@cpan.org> Date: Sun Sep 18 17:21:54 2016 +0200 perluniintro: Encode::encode_utf8() not always appropriate Do not suggest to use Encode::encode_utf8() when you need to know the byte length of a string Encode module could do some additional operations and bytes pragma is supposed to do that job. M pod/perluniintro.pod commit f8ac05207854091347d4c59c31cabb61ff952919 Author: Karl Williamson <k...@cpan.org> Date: Thu Jan 26 07:10:00 2017 -0700 Add Pali to AUTHORS M AUTHORS ----------------------------------------------------------------------- Summary of changes: AUTHORS | 1 + pod/perldiag.pod | 2 +- pod/perlfunc.pod | 4 ++-- pod/perlpacktut.pod | 7 ++++--- pod/perlunicode.pod | 8 ++++---- pod/perlunicook.pod | 8 ++++---- pod/perlunifaq.pod | 6 ++++-- pod/perluniintro.pod | 15 ++++++--------- toke.c | 2 +- 9 files changed, 27 insertions(+), 26 deletions(-) diff --git a/AUTHORS b/AUTHORS index 6c2dd2131f..4e4756b494 100644 --- a/AUTHORS +++ b/AUTHORS @@ -930,6 +930,7 @@ Ollivier Robert <robe...@keltia.freenix.fr> Osvaldo Villalon <ovilla...@dextratech.com> Owain G. Ainsworth <o...@nicotinebsd.org> Owen Taylor <o...@cornell.edu> +Pali <p...@cpan.org> Papp Zoltan <pa...@elte.hu> parv <p...@pair.com> Pascal Rigaux <pi...@mandriva.com> diff --git a/pod/perldiag.pod b/pod/perldiag.pod index 585c512753..76edb9b1da 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -3407,7 +3407,7 @@ the variable, C<%s>, part of the message. One possible cause is that you set the UTF8 flag yourself for data that you thought to be in UTF-8 but it wasn't (it was for example legacy -8-bit data). To guard against this, you can use Encode::decode_utf8. +8-bit data). To guard against this, you can use C<Encode::decode('UTF-8', ...)>. If you use the C<:encoding(UTF-8)> PerlIO layer for input, invalid byte sequences are handled gracefully, but if you use C<:utf8>, the flag is diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 1e32cca6dd..d4dc2dfd53 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -3763,8 +3763,8 @@ many elements these have. For that, use C<scalar @array> and C<scalar keys Like all Perl character operations, L<C<length>|/length EXPR> normally deals in logical characters, not physical bytes. For how many bytes a string encoded as -UTF-8 would take up, use C<length(Encode::encode_utf8(EXPR))> (you'll have -to C<use Encode> first). See L<Encode> and L<perlunicode>. +UTF-8 would take up, use C<length(Encode::encode('UTF-8', EXPR))> +(you'll have to C<use Encode> first). See L<Encode> and L<perlunicode>. =item __LINE__ X<__LINE__> diff --git a/pod/perlpacktut.pod b/pod/perlpacktut.pod index f40d1c2a93..f6a9411c8f 100644 --- a/pod/perlpacktut.pod +++ b/pod/perlpacktut.pod @@ -668,9 +668,10 @@ Usually you'll want to pack or unpack UTF-8 strings: my @hebrew = unpack( 'U*', $utf ); Please note: in the general case, you're better off using -Encode::decode_utf8 to decode a UTF-8 encoded byte string to a Perl -Unicode string, and Encode::encode_utf8 to encode a Perl Unicode string -to UTF-8 bytes. These functions provide means of handling invalid byte +L<C<Encode::decode('UTF-8', $utf)>|Encode/decode> to decode a UTF-8 +encoded byte string to a Perl Unicode string, and +L<C<Encode::encode('UTF-8', $str)>|Encode/encode> to encode a Perl Unicode +string to UTF-8 bytes. These functions provide means of handling invalid byte sequences and generally have a friendlier interface. =head2 Another Portable Binary Encoding diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 33e52b31b3..ba5e312d02 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -1904,7 +1904,7 @@ check the documentation to verify if this is still true. if ($] > 5.008) { require Encode; - $val = Encode::encode_utf8($val); # make octets + $val = Encode::encode("UTF-8", $val); # make octets } =item * @@ -1916,7 +1916,7 @@ want the UTF8 flag restored: if ($] > 5.008) { require Encode; - $val = Encode::decode_utf8($val); + $val = Encode::decode("UTF-8", $val); } =item * @@ -2017,8 +2017,8 @@ Perl's internal representation like so: sub my_escape_html ($) { my($what) = shift; return unless defined $what; - Encode::decode_utf8(Foo::Bar::escape_html( - Encode::encode_utf8($what))); + Encode::decode("UTF-8", Foo::Bar::escape_html( + Encode::encode("UTF-8", $what))); } Sometimes, when the extension does not convert data but just stores diff --git a/pod/perlunicook.pod b/pod/perlunicook.pod index ac305098eb..9a8d4daaa8 100644 --- a/pod/perlunicook.pod +++ b/pod/perlunicook.pod @@ -234,8 +234,8 @@ C<binmode> as described later below. or $ export PERL_UNICODE=A or - use Encode qw(decode_utf8); - @ARGV = map { decode_utf8($_, 1) } @ARGV; + use Encode qw(decode); + @ARGV = map { decode('UTF-8', $_, 1) } @ARGV; =head2 â 14: Decode program arguments as locale encoding @@ -289,8 +289,8 @@ Files opened without an encoding argument will be in UTF-8: $ export PERL_UNICODE=SDA or use open qw(:std :utf8); - use Encode qw(decode_utf8); - @ARGV = map { decode_utf8($_, 1) } @ARGV; + use Encode qw(decode); + @ARGV = map { decode('UTF-8', $_, 1) } @ARGV; =head2 â 19: Open file with specific encoding diff --git a/pod/perlunifaq.pod b/pod/perlunifaq.pod index 4135fbaeb2..ba391d423f 100644 --- a/pod/perlunifaq.pod +++ b/pod/perlunifaq.pod @@ -199,7 +199,9 @@ or by letting automatic decoding and encoding do all the work: =head2 What are C<decode_utf8> and C<encode_utf8>? These are alternate syntaxes for C<decode('utf8', ...)> and C<encode('utf8', -...)>. +...)>. Do not use these functions for data exchange. Instead use +C<decode('UTF-8', ...)> and C<encode('UTF-8', ...)>; see +L</What's the difference between UTF-8 and utf8?> below. =head2 What is a "wide character"? @@ -283,7 +285,7 @@ C<UTF-8> is the official standard. C<utf8> is Perl's way of being liberal in what it accepts. If you have to communicate with things that aren't so liberal, you may want to consider using C<UTF-8>. If you have to communicate with things that are too liberal, you may have to use C<utf8>. The full explanation is in -L<Encode>. +L<Encode/"UTF-8 vs. utf8 vs. UTF8">. C<UTF-8> is internally known as C<utf-8-strict>. The tutorial uses UTF-8 consistently, even where utf8 is actually used internally, because the diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod index cd62d4c126..5a865c9912 100644 --- a/pod/perluniintro.pod +++ b/pod/perluniintro.pod @@ -729,16 +729,13 @@ the output string will be UTF-8-encoded C<ab\x80c = \x{100}\n>, but C<$a> will stay byte-encoded. Sometimes you might really need to know the byte length of a string -instead of the character length. For that use either the -C<Encode::encode_utf8()> function or the C<bytes> pragma +instead of the character length. For that use the C<bytes> pragma and the C<length()> function: my $unicode = chr(0x100); print length($unicode), "\n"; # will print 1 - require Encode; - print length(Encode::encode_utf8($unicode)),"\n"; # will print 2 use bytes; - print length($unicode), "\n"; # will also print 2 + print length($unicode), "\n"; # will print 2 # (the 0xC4 0x80 of the UTF-8) no bytes; @@ -755,12 +752,12 @@ How Do I Detect Data That's Not Valid In a Particular Encoding? Use the C<Encode> package to try converting it. For example, - use Encode 'decode_utf8'; + use Encode 'decode'; - if (eval { decode_utf8($string, Encode::FB_CROAK); 1 }) { - # $string is valid utf8 + if (eval { decode('UTF-8', $string, Encode::FB_CROAK); 1 }) { + # $string is valid UTF-8 } else { - # $string is not valid utf8 + # $string is not valid UTF-8 } Or use C<unpack> to try decoding it: diff --git a/toke.c b/toke.c index 61ea45da9b..864c5269c3 100644 --- a/toke.c +++ b/toke.c @@ -669,7 +669,7 @@ S_cr_textfilter(pTHX_ int idx, SV *sv, int maxlen) Creates and initialises a new lexer/parser state object, supplying a context in which to lex and parse from a new source of Perl code. A pointer to the new state object is placed in L</PL_parser>. An entry -is made on the save stack so that upon unwinding the new state object +is made on the save stack so that upon unwinding, the new state object will be destroyed and the former value of L</PL_parser> will be restored. Nothing else need be done to clean up the parsing context. -- Perl5 Master Repository