Re: PEP 8: Byte Order Mark (BOM) vs coding cookie

Terry Reedy Sun, 24 Aug 2008 21:01:19 -0700


twyk wrote:

PEP 8 says ...

Files using ASCII (or UTF-8, for Python 3.0) should not have a coding
 cookie.

What about a BOM (Byte Order Mark)?  Per Wikipedia ...

http://en.wikipedia.org/wiki/Byte-order_mark#endnote_UTF-8)
'In UTF-8, this is not really a "byte order" mark. It identifies thetext as UTF-8 but doesn't say anything about the byte order, becauseUTF-8 does not have byte order issues.'
So is it good style to omit the BOM in UTF-8 for Python 3.0?


According to Unicode manual, yes.

http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf

The endian order entry for UTF-8 in Table 2-4 is marked N/A because

UTF-8 code units are 8 bits in size, and the usual machine issues ofendian order for larger code units do not apply. The serialized order ofthe bytes must not depart from the order defined by the UTF-

8 encoding form. Use of a BOM is neither required nor recommended for

UTF-8, but may be encountered in contexts where UTF-8 data is convertedfrom other encoding forms that use a BOM or where the BOM is used as aUTF-8 signature. See the “Byte Order Mark” subsection in Section 16.8,Specials, for more information.

Since Ascii files *are*, by intentional design, UTF-8 files, and sincePython assumes Ascii/UTF-8 as the default, in the absence of a codingcookie, it does not need the signature.


--
http://mail.python.org/mailman/listinfo/python-list

Re: PEP 8: Byte Order Mark (BOM) vs coding cookie

Reply via email to