Porters and Storable Maintainers,

       utf8 data
Perl 5.6 added support for Unicode characters with code points > 255, and Perl 5.8 has full support for Unicode characters in hash keys. Perl internally encodes strings with these characters using utf8, and Storable serializes them as utf8. By default, if an older version of Perl encounters a utf8 value it cannot represent, it will "croak()". To change this behaviour so that Storable dese- rializes utf8 encoded values as the string of bytes (effectively dropping the is_utf8 flag) set $Storable::drop_utf8 to some "TRUE" value. This is a form of data loss, because with $drop_utf8 true, it becomes impossible to tell whether the original data was the Unicode string, or a series of bytes that happen to be valid utf8.

But this does not work as advertised on Perl 5.8.x.

#
use strict;
use warnings;
use Storable qw/dclone/;
my $obj = [ "\x{5c0f}\x{98fc} \x{5f3e}" ];
local ($Storable::drop_utf8) = 1;
my $clone = dclone($obj);
warn utf8::is_utf8($clone->[0]); # not false!
__END__

And here is one of functions in Storable.xs

static SV *retrieve_utf8str(pTHX_ stcxt_t *cxt, char *cname)
{
    SV *sv;

    TRACEME(("retrieve_utf8str"));

    sv = retrieve_scalar(aTHX_ cxt, cname);
    if (sv) {
#ifdef HAS_UTF8_SCALARS
        SvUTF8_on(sv);
#else
        if (cxt->use_bytes < 0)
            cxt->use_bytes
                = (SvTRUE(perl_get_sv("Storable::drop_utf8", TRUE))
                   ? 1 : 0);
        if (cxt->use_bytes == 0)
            UTF8_CROAK();
#endif
    }

    return sv;
}

In other words, $Storable::drop_utf8 is effective iff HAS_UTF8_SCALARS is false. Should we fix the code or documentation?

Dan the Unstorable

Reply via email to