On 17.11.15 05:05, MRAB wrote:
As I understand it, *nix expects the shebang to be b'#!', which means
that the
first line should be ASCII-compatible (it's possible that the UTF-8 BOM
might
be present). This kind of suggests that encodings like UTF-16 would cause a
problem on such systems.

The encoding line also needs to be ASCII-compatible.

I believe that the recent thread "Support of UTF-16 and UTF-32 source
encodings" also concluded that UTF-16 and UTF-32 shouldn't be supported.

This means that you could treat the first 2 lines as though they were some
kind of extended ASCII (Latin-1?), the line ending being '\n' or '\r' or
'\r\n'.

Once you'd identify the encoding, you could decode everything (including
the
shebang line) using that encoding.

Yes, that is what I were going to implement (and already halfway here). My question is whether it is worth to complicate the code further to preserve reading by the line. In any case after reading the first line that doesn't contain neither coding cookie, nor non-comment tokens, we need to wait the second line.

(What should happen if the encoding line then decoded differently, i.e.
encoding_line.decode(encoding) != encoding_line.decode('latin-1')?)

The parser should got the line decoded with specified encoding.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to