Re: [Python-Dev] Reading Python source file

Serhiy Storchaka Tue, 17 Nov 2015 08:12:02 -0800

On 17.11.15 05:05, MRAB wrote:

As I understand it, *nix expects the shebang to be b'#!', which means
that the
first line should be ASCII-compatible (it's possible that the UTF-8 BOM
might
be present). This kind of suggests that encodings like UTF-16 would cause a
problem on such systems.


The encoding line also needs to be ASCII-compatible.

I believe that the recent thread "Support of UTF-16 and UTF-32 source
encodings" also concluded that UTF-16 and UTF-32 shouldn't be supported.

This means that you could treat the first 2 lines as though they were some
kind of extended ASCII (Latin-1?), the line ending being '\n' or '\r' or
'\r\n'.

Once you'd identify the encoding, you could decode everything (including
the
shebang line) using that encoding.

Yes, that is what I were going to implement (and already halfway here).My question is whether it is worth to complicate the code further topreserve reading by the line. In any case after reading the first linethat doesn't contain neither coding cookie, nor non-comment tokens, weneed to wait the second line.

(What should happen if the encoding line then decoded differently, i.e.
encoding_line.decode(encoding) != encoding_line.decode('latin-1')?)


The parser should got the line decoded with specified encoding.

_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reading Python source file

Reply via email to