At 23:39 -0700 04/26/2009, Glenn Linderman wrote: >On approximately 4/25/2009 5:35 AM, came the following characters from >the keyboard of Martin v. Löwis: >>> Because the encoding is not reliably reversible. >> >> Why do you say that? The encoding is completely reversible >> (unless we disagree on what "reversible" means). >> >>> I'm +1 on the concept, -1 on the PEP, due solely to the lack of a >>> reversible encoding. >> >> Then please provide an example for a setup where it is not reversible. >> >> Regards, >> Martin > >It is reversible if you know that it is decoded, and apply the encoding. > But if you don't know that has been encoded, then applying the reverse >transform can convert an undecoded str that matches the decoded str to >the form that it could have, but never did take. > >The problem is that there is no guarantee that the str interface >provides only strictly conforming Unicode, so decoding bytes to >non-strictly conforming Unicode, can result in a data pun between >non-strictly conforming Unicode coming from the str interface vs bytes >being decoded to non-strictly conforming Unicode coming from the bytes >interface. ...
Maybe this is a dumb idea, but some people might be reassured if the half-surrogates had some particular pattern that is unlikely to occur even in unreasonable text (as half-surrogates are an error in Unicode). The pattern could be some sequence of half-surrogate encoded bytes, framing the intended data, as is done for RFC 2047 internationalized header fields in email. It would take up a few more bytes in the string, but no matter. It would also make it easier to diagnose when decoding was not properly done. FWIW, I like the idea in the PEP, now that I think I understand it. (BTW, gotta love what the email package is doing to the Subject: header field. ;-') -- ____________________________________________________________________ TonyN.:' <mailto:tonynel...@georgeanelson.com> ' <http://www.georgeanelson.com/> _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com