On 5/6/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > I just read PEP 3112, and I believe it contains a > flaw/underspecification. > > It says > > # Each shortstringchar or longstringchar must be a character between 1 > # and 127 inclusive, regardless of any encoding declaration [2] in the > # source file. > > What does that mean? In particular, what is "a character between 1 and > 127"? > > Assuming this refers to ordinal values in some encoding: what encoding? > It's particularly puzzling that it says "regardless of any encoding > declaration of the source file". > > I fear (but hope that I'm wrong) that this was meant to mean "use the > bytes as they are stored on disk in the source file". If so: is the > attached file valid Python? In case your editor can't render it: it > reads > > #! -*- coding: iso-2022-jp -*- > a = b"Питон" > > But if you look at the file with a hex editor, you see it contains > only bytes between 1 and 127. > > I would hope that this code is indeed ill-formed (i.e. that > the byte representation on disk is irrelevant, and only the > Unicode ordinals of the source characters matter) > > If so, can the specification please be updated to clarify that > 1. in Grammar changes: Each shortstringchar or longstringchar must > be a character whose Unicode ordinal value is between 1 and > 127 inclusive. > 2. in Semantics: The bytes in the new object are obtained as if > encoding a string literal with "iso-8859-1"
Sounds like a good fix to me; I agree that bytes literals, like Unicode literals, should not vary depending on the source encoding. In step 2, can't you use "ascii" as the encoding? -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com