I just read PEP 3112, and I believe it contains a flaw/underspecification. It says
# Each shortstringchar or longstringchar must be a character between 1 # and 127 inclusive, regardless of any encoding declaration [2] in the # source file. What does that mean? In particular, what is "a character between 1 and 127"? Assuming this refers to ordinal values in some encoding: what encoding? It's particularly puzzling that it says "regardless of any encoding declaration of the source file". I fear (but hope that I'm wrong) that this was meant to mean "use the bytes as they are stored on disk in the source file". If so: is the attached file valid Python? In case your editor can't render it: it reads #! -*- coding: iso-2022-jp -*- a = b"Питон" But if you look at the file with a hex editor, you see it contains only bytes between 1 and 127. I would hope that this code is indeed ill-formed (i.e. that the byte representation on disk is irrelevant, and only the Unicode ordinals of the source characters matter) If so, can the specification please be updated to clarify that 1. in Grammar changes: Each shortstringchar or longstringchar must be a character whose Unicode ordinal value is between 1 and 127 inclusive. 2. in Semantics: The bytes in the new object are obtained as if encoding a string literal with "iso-8859-1" Regards, Martin
#! -*- coding: iso-2022-jp -*- a = b"$B'1'Z'd'`'_(B"
_______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com