* Gisle Aas wrote:
>More interesting is:
>
> decode("UTF8", "Bj\xEF\xBF\xBFrn")
>
>where "\xEF\xBF\xBF" is not legal UTF-8 because "\x{FFFF}" is not
>legal Unicode. Either the whole sequence "\xEF\xBF\xBF" is replaced
>by "\x{FFFD}" or each bad byte is giving us
>"Bj\x{FFFD}\x{FFFD}\x{FFFD}rn". I think the later will be more sane,
>especially when you hit on perl 64-bit extension to UTF-8..
I think it should do whatever comes closest to the requirements or
suggestions in Unicode or RFC 3629; I am not sure what that would be
though.
>> Now that we have this problem, introducing more places where one needs
>> to carefully check the documentation what is considered UTF-8 does not
>> seem like the best option, having decode_utf8() and decode(utf8=>...)
>> mean some- thing different is likely going to cause confusion. Maybe
>> this could go the other way round, i.e. introduce a new encoding
>> "UTF-8-Strict" or something.
>
>This is certainly more backwards compatible, but do we really want
>perl applications to exchange illegal UTF-8 by default?
Hmm, maybe I should ask why you proposed to keep the old behavior of
encode_utf8 in the first place? The change would make more sense to
me if both encode("UTF-8" => ...) and encode_utf8(...) were changed.
--
Bj�rn H�hrmann � mailto:[EMAIL PROTECTED] � http://bjoern.hoehrmann.de
Weinh. Str. 22 � Telefon: +49(0)621/4309674 � http://www.bjoernsworld.de
68309 Mannheim � PGP Pub. KeyID: 0xA4357E78 � http://www.websitedev.de/