More confusion about the valid range of characters in Perl.

Both v5.8.8 and v5.10.0 Perl will pack('U', $v) for values of $v which
are > 0x7FFF_FFFF.  The result is the (non-standard) Perl utf8 encoding
for such characters.

v5.8.8 Perl will unpack a string containing the non-standard encoding.

v5.10.0 Perl will not.

Consider:

  use warnings ;

  sub sp {
    my ($v) = @_ ;

    my $p = pack('U', $v) ;
    my @t = unpack('C*', $p) ;

    printf '\x%04X_%04X: ', ($v >> 16), $v & 0xFFFF ;
    print map sprintf('\x%02X', $_), @t ;
    print "\n" ;
  } ;

  sp(0x7FFF_FFFD) ;
  sp(0x8000_0000) ;
  sp(0xFFFF_FFFD) ;

v5.8.8 result:

  \x7FFF_FFFD: \xFD\xBF\xBF\xBF\xBF\xBD
  \x8000_0000: \xFE\x82\x80\x80\x80\x80\x80
  \xFFFF_FFFD: \xFE\x83\xBF\xBF\xBF\xBF\xBD

v5.10.0 result:

  \x7FFF_FFFD: \x7FFFFFFD
  Malformed UTF-8 character (byte 0xfe) in unpack at tpbug.pl line 7.
  Malformed UTF-8 character (unexpected continuation byte 0x82, with no
    preceding start byte) in unpack at tpbug.pl line 7.
  Malformed UTF-8 character (unexpected continuation byte 0x80, with no
    preceding start byte) in unpack at tpbug.pl line 7.
  Malformed UTF-8 character (unexpected continuation byte 0x80, with no
    preceding start byte) in unpack at tpbug.pl line 7.
  Malformed UTF-8 character (unexpected continuation byte 0x80, with no
    preceding start byte) in unpack at tpbug.pl line 7.
  Malformed UTF-8 character (unexpected continuation byte 0x80, with no
    preceding start byte) in unpack at tpbug.pl line 7.
  Malformed UTF-8 character (unexpected continuation byte 0x80, with no
    preceding start byte) in unpack at tpbug.pl line 7.
  \x8000_0000: \x00\x00\x00\x00\x00\x00\x00
  Malformed UTF-8 character (byte 0xfe) in unpack at tpbug.pl line 7.
  Malformed UTF-8 character (unexpected continuation byte 0x83, with no
    preceding start byte) in unpack at tpbug.pl line 7.
  Malformed UTF-8 character (unexpected continuation byte 0xbf, with no
    preceding start byte) in unpack at tpbug.pl line 7.
  Malformed UTF-8 character (unexpected continuation byte 0xbf, with no
    preceding start byte) in unpack at tpbug.pl line 7.
  Malformed UTF-8 character (unexpected continuation byte 0xbf, with no
    preceding start byte) in unpack at tpbug.pl line 7.
  Malformed UTF-8 character (unexpected continuation byte 0xbf, with no
    preceding start byte) in unpack at tpbug.pl line 7.
  Malformed UTF-8 character (unexpected continuation byte 0xbd, with no
    preceding start byte) in unpack at tpbug.pl line 7.
  \xFFFF_FFFD: \x00\x00\x00\x00\x00\x00\x00

And, FWIW, in 64-bit v5.8.8, pack('U', $v) appears to mask the $v value
to unsigned 32-bits before attempting to pack !

-- 
Chris Hall               highwayman.com            +44 7970 277 383

Attachment: signature.asc
Description: PGP signature

Reply via email to