More confusion about the valid range of characters in Perl. Both v5.8.8 and v5.10.0 Perl will pack('U', $v) for values of $v which are > 0x7FFF_FFFF. The result is the (non-standard) Perl utf8 encoding for such characters.
v5.8.8 Perl will unpack a string containing the non-standard encoding. v5.10.0 Perl will not. Consider: use warnings ; sub sp { my ($v) = @_ ; my $p = pack('U', $v) ; my @t = unpack('C*', $p) ; printf '\x%04X_%04X: ', ($v >> 16), $v & 0xFFFF ; print map sprintf('\x%02X', $_), @t ; print "\n" ; } ; sp(0x7FFF_FFFD) ; sp(0x8000_0000) ; sp(0xFFFF_FFFD) ; v5.8.8 result: \x7FFF_FFFD: \xFD\xBF\xBF\xBF\xBF\xBD \x8000_0000: \xFE\x82\x80\x80\x80\x80\x80 \xFFFF_FFFD: \xFE\x83\xBF\xBF\xBF\xBF\xBD v5.10.0 result: \x7FFF_FFFD: \x7FFFFFFD Malformed UTF-8 character (byte 0xfe) in unpack at tpbug.pl line 7. Malformed UTF-8 character (unexpected continuation byte 0x82, with no preceding start byte) in unpack at tpbug.pl line 7. Malformed UTF-8 character (unexpected continuation byte 0x80, with no preceding start byte) in unpack at tpbug.pl line 7. Malformed UTF-8 character (unexpected continuation byte 0x80, with no preceding start byte) in unpack at tpbug.pl line 7. Malformed UTF-8 character (unexpected continuation byte 0x80, with no preceding start byte) in unpack at tpbug.pl line 7. Malformed UTF-8 character (unexpected continuation byte 0x80, with no preceding start byte) in unpack at tpbug.pl line 7. Malformed UTF-8 character (unexpected continuation byte 0x80, with no preceding start byte) in unpack at tpbug.pl line 7. \x8000_0000: \x00\x00\x00\x00\x00\x00\x00 Malformed UTF-8 character (byte 0xfe) in unpack at tpbug.pl line 7. Malformed UTF-8 character (unexpected continuation byte 0x83, with no preceding start byte) in unpack at tpbug.pl line 7. Malformed UTF-8 character (unexpected continuation byte 0xbf, with no preceding start byte) in unpack at tpbug.pl line 7. Malformed UTF-8 character (unexpected continuation byte 0xbf, with no preceding start byte) in unpack at tpbug.pl line 7. Malformed UTF-8 character (unexpected continuation byte 0xbf, with no preceding start byte) in unpack at tpbug.pl line 7. Malformed UTF-8 character (unexpected continuation byte 0xbf, with no preceding start byte) in unpack at tpbug.pl line 7. Malformed UTF-8 character (unexpected continuation byte 0xbd, with no preceding start byte) in unpack at tpbug.pl line 7. \xFFFF_FFFD: \x00\x00\x00\x00\x00\x00\x00 And, FWIW, in 64-bit v5.8.8, pack('U', $v) appears to mask the $v value to unsigned 32-bits before attempting to pack ! -- Chris Hall highwayman.com +44 7970 277 383
signature.asc
Description: PGP signature