[EMAIL PROTECTED] said: > The encryption, of course, works with octets. I've just (version 2.13) > introduced a first attempt at handling utf8 string arguments; this is > still undocumented so I can change it if there's a better way. Currently, > at the top of sub encrypt, there is:
> use bytes; > ... > sub encrypt { my ($str,$key)[EMAIL PROTECTED]; > if ($] > 5.007 && Encode::is_utf8($str)) { > Encode::_utf8_off($str); > # $str = Encode::encode_utf8($str); > } > ... > Is this the right sort of way to do it (e.g. functionality, portability) > ? The man page for Encode still says that twiddling the utf8 flag yourself involves "messing with internals" that might change in later releases. (Maybe someone else will comment on that.) Personally, I'd go for the "Encode::encode_utf8($str)" in order to get an unaltered copy of the text with the flag turned off. > It means that after decrypting again the is_utf8 information is lost; But > I don't see a way round that because 1) Perl's not the only language > involved, 2) putting encoding information into the cyphertext would break > backward compatibility and give an attacker a known-plaintext attack. I have seen a lot of people putting a BOM (byte-order-mark, U+FEFF) at the start of unicode text, even when encoding it as utf8 (where it shows up as a three-byte sequence). So if you're encrypting a utf8 string, you could just make sure there's a proper BOM at the start of it. Then, when you decrypt in a perl script, and you see a BOM rendered as a three-byte sequence, you know you can decode the octets into utf8. Hopefully, that's not too much of a giveaway for attackers, since it's only three bytes, and it might not be predictable whether there would be a BOM in a given cipher text. > Would it be worth giving sub decrypt an option to decode the plaintext > into Perl's internal form (if it's well-formed), or should I leave that > to the user and the Encode module ? If the plaintext is not utf8 (and not ascii), you have to leave it to the user and the Encode module. If it is utf8, I think it'll be a great benefit to perl users to decode the plaintext into utf8 before returning it. Providing an optional arg to the decrypt sub to control how it handles the utf8 flag sounds like a good idea. Dave Graff