On Thu, Jan 18, 2018, at 11:04, Stephen J. Turnbull wrote:
> Nathaniel Smith writes:
> 
>  > It's also nice to be able to parse some HTML data, make a few changes
>  > in memory, and then serialize it back to HTML. Having this crash on
>  > random documents is rather irritating, esp. if these documents are
>  > standards-compliant HTML as in this case.
> 
> This example doesn't make sense to me.  Why would *conformant* HTML
> crash the codec?  Unless you're saying the source is non-conformant
> and *lied* about the encoding?

I think his point is that the WHATWG standard is the one that governs HTML and 
therefore HTML that uses these encodings (including the C1 characters) are 
conformant to *that* standard, regardless of their status with regards to 
anything published by Unicode, and that the new encodings (whatever they are 
called), including the round-trip for b'\x81' as \u0081, are the ones 
identified by a statement in an HTML document that it uses windows-1252, and 
therefore such a statement is not a lie.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to