Change 31462 by [EMAIL PROTECTED] on 2007/06/25 12:10:10
Apply doc suggestion from:
Subject: [perl #43287] perluniintro inaccurate answer to testing
encoding validity
From: Danny Rathjens (via RT) <[EMAIL PROTECTED]>
Date: Thu, 21 Jun 2007 17:35:26 -0700
Message-ID: <[EMAIL PROTECTED]>
Affected files ...
... //depot/perl/pod/perluniintro.pod#64 edit
Differences ...
==== //depot/perl/pod/perluniintro.pod#64 (text) ====
Index: perl/pod/perluniintro.pod
--- perl/pod/perluniintro.pod#63~30493~ 2007-03-07 05:23:23.000000000 -0800
+++ perl/pod/perluniintro.pod 2007-06-25 05:10:10.000000000 -0700
@@ -656,10 +656,11 @@
For example,
use Encode 'decode_utf8';
- if (decode_utf8($string_of_bytes_that_I_think_is_utf8)) {
- # valid
+ eval { decode_utf8($string, Encode::FB_CROAK) };
+ if ($@) {
+ # $string is valid utf8
} else {
- # invalid
+ # $string is not valid utf8
}
Or use C<unpack> to try decoding it:
@@ -667,9 +668,8 @@
use warnings;
@chars = unpack("C0U*", $string_of_bytes_that_I_think_is_utf8);
-If invalid, a C<Malformed UTF-8 character (byte 0x##) in unpack>
-warning is produced. The "C0" means
-"process the string character per character". Without that the
+If invalid, a C<Malformed UTF-8 character> warning is produced. The "C0" means
+"process the string character per character". Without that, the
C<unpack("U*", ...)> would work in C<U0> mode (the default if the format
string starts with C<U>) and it would return the bytes making up the UTF-8
encoding of the target string, something that will always work.
End of Patch.