Terry J. Reedy added the comment:

Python should have a uniform definition of 'Python source' in both the doc and 
in practice in all source code processing functions. Currently, "2. Lexical 
analysis" in the Language Manual just says "Python reads program text as 
Unicode code points; the encoding of a source file can be given by an encoding 
declaration and defaults to UTF-8." UTF-8 encodes code point U+0000 as a null 
byte and this code point is nowhere excluded in the doc. (The definition of 
string literals uses 'source character' without any additional specification, 
so I take it to mean 'Unicode code point'.)

If U+0000 is a legal 'source character', it, as with other control chars not 
given special meaning, should be a SyntaxError unless occurring in a comment or 
string literal. Eval and exec exclude even the latter with 
TypeError: source code string cannot contain null bytes
If null bytes are legal, this is wrong.

Simply truncating lines as done by the CPython parser is wrong whether not not 
U+0000 is legal.

The simplest change would be to change the parser to match exec and add " other 
than U+000" after "Unicode code points" in the sentence quoted above.

----------
nosy: +terry.reedy

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20115>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to