On Fri, Jan 12, 2018, at 03:10, Stephen J. Turnbull wrote:
>  > Other than that, all the differences are adding the fall-throughs in the
>  > range U+0080 to U+009F. For example, elsewhere in windows-1255, the byte
>  > b'\xff' is undefined, and it remains undefined in WHATWG's mapping.
> 
> I really do not want those fall-throughs to control characters in the
> stdlib, since they have no textual interpretation in any standard
> encoding.  My interpretation is "you're under attack, shutter the
> windows and call the cops".  If people want to use codecs
> incorporating them, they should have to import them separately in the
> context of a defensive framework that deals with them at a higher
> level.

There are plenty of standard encodings that do have actual representations of 
the control characters. It's not clear why you consider it more dangerous for 
the "windows-1252" encoding to be able to return '\x81' for b'\x81' than for 
"latin-1" to do the same, or for "utf-8" to return it for b'\xc2\x81'. These 
characters exist. Supporting them in encodings that contain them in the real 
world, regardless what was submitted to the Unicode consortium, doesn't add any 
new attack surface.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to