On 2/15/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Adam Olsen wrote: > > (I wonder if maybe they should be an error in 2.x as well. Source > > encoding is for unicode literals, not str literals.) > > Source encoding applies to the entire source code, including (byte) > string literals, comments, identifiers, and keywords. IOW, if you > declare your source encoding is utf-8, the keyword "print" must > be represented with the bytes that represent the Unicode letters > for "p","r","i","n", and "t" in UTF-8.
Although it does apply to the entire source file, I think this is more for convenience (try telling an editor that only a single line is Shift_JIS!) than to allow 8-bit (or 16-bit?!) str literals. Indeed, you could have arbitrary 8-bit str literals long before the source encoding was added. Keywords and identifiers continue to be limited to ascii characters (even if they make a roundtrip through other encodings), and comments continue to be ignored. Source encoding exists so that you can write u"123" with the encoding stated once at the top of the file, rather than "123".decode('utf-8') with the encoding repeated everywhere. Making it an error to have 8-bit str literals in 2.x would help educate the user that they will change behavior in 3.0 and not be 8-bit str literals anymore. -- Adam Olsen, aka Rhamphoryncus _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com