On Thu, Jan 18, 2018, at 11:04, Stephen J. Turnbull wrote: > Nathaniel Smith writes: > > > It's also nice to be able to parse some HTML data, make a few changes > > in memory, and then serialize it back to HTML. Having this crash on > > random documents is rather irritating, esp. if these documents are > > standards-compliant HTML as in this case. > > This example doesn't make sense to me. Why would *conformant* HTML > crash the codec? Unless you're saying the source is non-conformant > and *lied* about the encoding?
I think his point is that the WHATWG standard is the one that governs HTML and therefore HTML that uses these encodings (including the C1 characters) are conformant to *that* standard, regardless of their status with regards to anything published by Unicode, and that the new encodings (whatever they are called), including the round-trip for b'\x81' as \u0081, are the ones identified by a statement in an HTML document that it uses windows-1252, and therefore such a statement is not a lie. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/