* Gisle Aas wrote: >More interesting is: > > decode("UTF8", "Bj\xEF\xBF\xBFrn") > >where "\xEF\xBF\xBF" is not legal UTF-8 because "\x{FFFF}" is not >legal Unicode. Either the whole sequence "\xEF\xBF\xBF" is replaced >by "\x{FFFD}" or each bad byte is giving us >"Bj\x{FFFD}\x{FFFD}\x{FFFD}rn". I think the later will be more sane, >especially when you hit on perl 64-bit extension to UTF-8..
I think it should do whatever comes closest to the requirements or suggestions in Unicode or RFC 3629; I am not sure what that would be though. >> Now that we have this problem, introducing more places where one needs >> to carefully check the documentation what is considered UTF-8 does not >> seem like the best option, having decode_utf8() and decode(utf8=>...) >> mean some- thing different is likely going to cause confusion. Maybe >> this could go the other way round, i.e. introduce a new encoding >> "UTF-8-Strict" or something. > >This is certainly more backwards compatible, but do we really want >perl applications to exchange illegal UTF-8 by default? Hmm, maybe I should ask why you proposed to keep the old behavior of encode_utf8 in the first place? The change would make more sense to me if both encode("UTF-8" => ...) and encode_utf8(...) were changed. -- Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/