Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-15 Thread Jeff Allen
I'm approaching this from the premise that we would like to avoid needless surprises for users not versed in text encoding. I did a simple experiment with notepad on Windows 7 as if a naïve user. If I write the one-line program: print("Hello world.") # by Jeff It runs, no surprise. We may

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-15 Thread Laura Creighton
In a message of Sun, 15 Nov 2015 12:56:18 +, Paul Moore writes: >On 15 November 2015 at 07:23, Stephen J. Turnbull wrote: >> I don't see any good reason for allowing non-ASCII-compatible >> encodings in the reference CPython interpreter. > >>From PEP 263: > > Any

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-15 Thread Random832
"Stephen J. Turnbull" writes: > I don't see any good reason for allowing non-ASCII-compatible > encodings in the reference CPython interpreter. There might be a case for having the tokenizer not care about encodings at all and just operate on a stream of unicode characters

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-15 Thread Paul Moore
On 15 November 2015 at 07:23, Stephen J. Turnbull wrote: > I don't see any good reason for allowing non-ASCII-compatible > encodings in the reference CPython interpreter. >From PEP 263: Any encoding which allows processing the first two lines in the way

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-15 Thread Paul Moore
On 15 November 2015 at 16:40, Stephen J. Turnbull wrote: > What PEP 263 did do was to specify that non-ASCII-compatible encodings > are not supported by the PEP 263 mechanism for declaring the encoding > of a Python source program. That's because it looks for a "magic >

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-15 Thread Raymond Hettinger
> On Nov 15, 2015, at 9:34 AM, Guido van Rossum wrote: > > Let me just unilaterally end this discussion. It's fine to disregard > the future possibility of using UTF-16 or -32 for Python source code. > Serhiy can happily rip out any comments or dead code dealing with that >

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-15 Thread Stephen J. Turnbull
Random832 writes: > "Stephen J. Turnbull" writes: > > I don't see any good reason for allowing non-ASCII-compatible > > encodings in the reference CPython interpreter. > > There might be a case for having the tokenizer not care about encodings > at all and just operate

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-15 Thread M.-A. Lemburg
On 14.11.2015 23:56, Victor Stinner wrote: > These encodings are rarely used. I don't think that any text editor use > them. Editors use ascii, latin1, utf8 and... all locale encoding. But I > don't know any OS using UTF-16 as a locale encoding. UTF-32 wastes disk > space. UTF-16 is used a lot

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-15 Thread Guido van Rossum
Let me just unilaterally end this discussion. It's fine to disregard the future possibility of using UTF-16 or -32 for Python source code. Serhiy can happily rip out any comments or dead code dealing with that possibility. -- --Guido van Rossum (python.org/~guido)

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-15 Thread Stephen J. Turnbull
Laura Creighton writes: > Steve Turnbull, who lives in Japan, and speaks and writes Japanese > is saying that "he cannot see any reason for allowing non-ASCII > compatible encodings in Cpython". > > This makes me wonder. > > Is this along the lines of 'even in Japan we do not want such

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Chris Angelico
On Sun, Nov 15, 2015 at 12:47 PM, Glenn Linderman wrote: > On 11/14/2015 5:37 PM, Chris Angelico wrote: > > On Sun, Nov 15, 2015 at 12:27 PM, Glenn Linderman > wrote: > > Notepad defaults to ANSI encoding, as I think it always has. UTF-8 is an >

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread eryksun
On Sat, Nov 14, 2015 at 7:06 PM, Steve Dower wrote: > The native encoding on Windows has been UTF-16 since Windows NT. Obviously > we've survived without Python tokenization support for a long time, but > every API uses it. Windows 2000 was the first version to have broad

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread eryksun
On Sat, Nov 14, 2015 at 7:15 PM, Chris Angelico wrote: > Can the py.exe launcher handle a UTF-16 shebang? (I'm pretty sure Unix > program loaders won't.) That alone might be a reason for strongly > encouraging ASCII-compat encodings. The launcher supports shebangs encoded as

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Stephen J. Turnbull
Steve Dower writes: > Saying [UTF-16] is rarely used is rather exposing your own > unawareness though - it could arguably be the most commonly used > encoding (depending on how you define "used"). Because we're discussing the storage of .py files, the relevant definition is the one used by

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Random832
Glenn Linderman writes: > On 11/14/2015 5:37 PM, Chris Angelico wrote: > > Thanks. Is "ANSI" always an eight-bit ASCII-compatible encoding? > > I wouldn't trust an answer to this question that didn't come from > someone that used Windows with Chinese, Japanese, or Korean,

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Steven D'Aprano
On Sat, Nov 14, 2015 at 09:19:37PM +0200, Serhiy Storchaka wrote: > If the support of UTF-16 and UTF-32 is planned, I'll take this to > attention during refactoring. But in many places besides the tokenizer > the ASCII compatible encoding of source files is expected. Perhaps another way of

[Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Serhiy Storchaka
For now UTF-16 and UTF-32 source encodings are not supported. There is a comment in Parser/tokenizer.c: /* Disable support for UTF-16 BOMs until a decision is made whether this needs to be supported. */ Can we make a decision whether this support will be added in foreseeable

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Victor Stinner
These encodings are rarely used. I don't think that any text editor use them. Editors use ascii, latin1, utf8 and... all locale encoding. But I don't know any OS using UTF-16 as a locale encoding. UTF-32 wastes disk space. Ok, even if it exists, Python already accepts a very wide range of

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Benjamin Peterson
I agree that supporting UTF-16 doesn't seem terribly useful. Also, thank you for giving the tokenizer some love! On Sat, Nov 14, 2015, at 11:19, Serhiy Storchaka wrote: > For now UTF-16 and UTF-32 source encodings are not supported. There is a > comment in Parser/tokenizer.c: > > /*

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Serhiy Storchaka
On 15.11.15 00:56, Victor Stinner wrote: These encodings are rarely used. I don't think that any text editor use them. Editors use ascii, latin1, utf8 and... all locale encoding. But I don't know any OS using UTF-16 as a locale encoding. UTF-32 wastes disk space. AFAIK the standard Windows

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Glenn Linderman
On 11/14/2015 3:21 PM, Serhiy Storchaka wrote: On 15.11.15 00:56, Victor Stinner wrote: These encodings are rarely used. I don't think that any text editor use them. Editors use ascii, latin1, utf8 and... all locale encoding. But I don't know any OS using UTF-16 as a locale encoding. UTF-32

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Random832
Victor Stinner writes: > These encodings are rarely used. I don't think that any text editor > use them. MS Windows' Notepad can be made to use UTF-16. ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Steve Dower
stin...@gmail.com> Sent: ‎11/‎14/‎2015 14:58 To: "Serhiy Storchaka" <storch...@gmail.com> Cc: "python-dev@python.org" <python-dev@python.org> Subject: Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings These encodings are rarely used. I don't think that

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Chris Angelico
On Sun, Nov 15, 2015 at 12:06 PM, Steve Dower wrote: > The native encoding on Windows has been UTF-16 since Windows NT. Obviously > we've survived without Python tokenization support for a long time, but > every API uses it. > > I've hit a few cases where it would have

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Glenn Linderman
On 11/14/2015 5:15 PM, Chris Angelico wrote: Can the py.exe launcher handle a UTF-16 shebang? (I'm pretty sure Unix program loaders won't.) That alone might be a reason for strongly encouraging ASCII-compat encodings. That raises an interesting question about if py.exe can handle a leading

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Random832
Chris Angelico writes: > Can the py.exe launcher handle a UTF-16 shebang? (I'm pretty sure Unix > program loaders won't.) A lot of them can't handle UTF-8 with a BOM, either. > That alone might be a reason for strongly encouraging ASCII-compat > encodings. A "python" or

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Glenn Linderman
On 11/14/2015 5:15 PM, Chris Angelico wrote: I think even Notepad defaults to UTF-8 for files, now. Just installed Windows 10 on a new machine, and upgraded to the latest Windows 10 release, 1511. Notepad defaults to ANSI encoding, as I think it always has. UTF-8 is an option, and it does

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Chris Angelico
On Sun, Nov 15, 2015 at 12:27 PM, Glenn Linderman wrote: > Notepad defaults to ANSI encoding, as I think it always has. UTF-8 is an > option, and it does seem to try to notice the original encoding of the file, > when editing old files, but when creating a new one

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Glenn Linderman
On 11/14/2015 5:37 PM, Chris Angelico wrote: On Sun, Nov 15, 2015 at 12:27 PM, Glenn Linderman wrote: Notepad defaults to ANSI encoding, as I think it always has. UTF-8 is an option, and it does seem to try to notice the original encoding of the file, when editing old