Porters and Storable Maintainers,
utf8 data
Perl 5.6 added support for Unicode characters with code
points >
255, and Perl 5.8 has full support for Unicode
characters in hash
keys. Perl internally encodes strings with these
characters using
utf8, and Storable serializes them as utf8. By default,
if an
older version of Perl encounters a utf8 value it cannot
represent,
it will "croak()". To change this behaviour so that
Storable dese-
rializes utf8 encoded values as the string of bytes
(effectively
dropping the is_utf8 flag) set $Storable::drop_utf8 to
some "TRUE"
value. This is a form of data loss, because with
$drop_utf8 true,
it becomes impossible to tell whether the original data
was the
Unicode string, or a series of bytes that happen to be
valid utf8.
But this does not work as advertised on Perl 5.8.x.
#
use strict;
use warnings;
use Storable qw/dclone/;
my $obj = [ "\x{5c0f}\x{98fc} \x{5f3e}" ];
local ($Storable::drop_utf8) = 1;
my $clone = dclone($obj);
warn utf8::is_utf8($clone->[0]); # not false!
__END__
And here is one of functions in Storable.xs
static SV *retrieve_utf8str(pTHX_ stcxt_t *cxt, char *cname)
{
SV *sv;
TRACEME(("retrieve_utf8str"));
sv = retrieve_scalar(aTHX_ cxt, cname);
if (sv) {
#ifdef HAS_UTF8_SCALARS
SvUTF8_on(sv);
#else
if (cxt->use_bytes < 0)
cxt->use_bytes
= (SvTRUE(perl_get_sv("Storable::drop_utf8", TRUE))
? 1 : 0);
if (cxt->use_bytes == 0)
UTF8_CROAK();
#endif
}
return sv;
}
In other words, $Storable::drop_utf8 is effective iff
HAS_UTF8_SCALARS is false. Should we fix the code or documentation?
Dan the Unstorable