[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-06-04 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset ca612a9728b83472d9d286bbea74972d426ed344 by Victor Stinner in 
branch 'master':
bpo-36778: Remove outdated comment from CodePageTest (GH-13807)
https://github.com/python/cpython/commit/ca612a9728b83472d9d286bbea74972d426ed344


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-06-04 Thread STINNER Victor


Change by STINNER Victor :


--
pull_requests: +13692
pull_request: https://github.com/python/cpython/pull/13807

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-13 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset 3aef48e3157f52a8bcdbacf47a35d0016348735e by Victor Stinner in 
branch 'master':
bpo-36778: Update cp65001 codec documentation (GH-13240)
https://github.com/python/cpython/commit/3aef48e3157f52a8bcdbacf47a35d0016348735e


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-10 Thread STINNER Victor


Change by STINNER Victor :


--
pull_requests: +13150

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-09 Thread STINNER Victor


STINNER Victor  added the comment:

> Since we aren't backporting ARM32 changes, I don't think it's important to 
> fix this test in 3.7.  I am trying to get the buildbot tests for Windows 
> ARM32 to zero errors.

Ok, thanks. I close the issue.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-09 Thread Paul Monson


Paul Monson  added the comment:

Thanks Victor!  Since we aren't backporting ARM32 changes, I don't think it's 
important to fix this test in 3.7.  I am trying to get the buildbot tests for 
Windows ARM32 to zero errors.

Windows IoT Core runs on Raspberry Pi and similar devices: 
https://developer.microsoft.com/en-us/windows/iot

Windows NanoServer is a very small version of Windows Server for running in 
Docker containers hosted on Windows Server.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-09 Thread STINNER Victor


STINNER Victor  added the comment:

Paul Monson: Your initial issue has been fixed in the master branch.

I'm not sure what are Windows IoT Core and Windows Nano Server. Do you care of 
Python 3.7? If someone wants to support running test_site with ANSI code page 
set to 65001, I suggest to fix test_site directly like PR 13072 in Python 3.7. 
My attempt to avoid functools made cp65001 codec way slower. Fixing one 
specific test should not make Python that much slower ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-09 Thread STINNER Victor


STINNER Victor  added the comment:

About the ANSI code page, Lib/encodings/__init__.py calls _winapi.GetACP() to 
avoid relying on locale.getpreferredencoding() which lies when UTF-8 Mode is 
enabled:

import _winapi
ansi_code_page = "cp%s" % _winapi.GetACP()
if encoding == ansi_code_page:
import encodings.mbcs
return encodings.mbcs.getregentry()

INADA-san:
> So I don't think it is lie.  It is just "what encoding name we should choose 
> when GetACP() returned 65001.".
> With your PR 13230, cp65001 is truly utf-8.  So returning "utf-8" seems right 
> behavior.

Well, feel free to propose a PR. I have no strong opinion on this level of 
detail :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-09 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset d267ac20c309e37d85a986b4417aa8ab4d05dabc by Victor Stinner in 
branch 'master':
bpo-36778: cp65001 encoding becomes an alias to utf_8 (GH-13230)
https://github.com/python/cpython/commit/d267ac20c309e37d85a986b4417aa8ab4d05dabc


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-09 Thread Inada Naoki


Inada Naoki  added the comment:

> I dislike lying in the locale module. This change is basically useless with 
> my PR 13230.


Note that Python produce "cpNNN" encoding name, not Windows.
https://github.com/python/cpython/blob/137be34180a20dba53948d126b961069f299f153/Modules/_localemodule.c#L395

So I don't think it is lie.  It is just "what encoding name we should choose 
when GetACP() returned 65001.".
With your PR 13230, cp65001 is truly utf-8.  So returning "utf-8" seems right 
behavior.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-09 Thread Paul Monson


Paul Monson  added the comment:

Sorry that was supposed to say:
I can verify that PR 13230 fixes the issue with test_startup_imports on Windows 
IoT Core ARM32

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-09 Thread Paul Monson


Paul Monson  added the comment:

I can verify that PR 13110 fixes the issue with test_startup_imports on Windows 
IoT Core ARM32

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-09 Thread Eryk Sun


Eryk Sun  added the comment:

> I dislike lying in the locale module. This change is basically useless 
> with my PR 13230.

Yes, functionally it's no different than using 'cp65001' as an alias. That 
said, the CRT special cases 65001 as "utf8":

>>> locale.setlocale(locale.LC_CTYPE, '')
'English_United Kingdom.utf8'
>>> crt_locale = ctypes.CDLL('api-ms-win-crt-locale-l1-1-0', use_errno=True)
>>> crt_locale.___lc_codepage_func()
65001

So the suggested change makes the locale module internally consistent on 
Windows and more transparent for anyone who doesn't know off the top of their 
head that "cp65001" is just UTF-8.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-09 Thread STINNER Victor


STINNER Victor  added the comment:

> Python could similarly special case CP_UTF8 as "utf-8" in 
> _locale._getdefaultlocale.

I dislike lying in the locale module. This change is basically useless with my 
PR 13230.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-09 Thread STINNER Victor


STINNER Victor  added the comment:

My PR 13110 (avoid functools) makes codecs.lookup('cp65001').encode() made 2.7x 
slower:
https://github.com/python/cpython/pull/13110#issuecomment-491095964
417 ns +- 17 ns

My PR 13230 (remove cp65001.py) makes it 1.5x faster :-)
https://github.com/python/cpython/pull/13230#issuecomment-491099012
105 ns +- 3 ns

The reference is: 156 ns +- 3 ns.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-09 Thread STINNER Victor


STINNER Victor  added the comment:

I wrote PR 13230 to remove Lib/encodings/cp65001.py and simply reuse 
Lib/encodings/utf_8.py.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-09 Thread STINNER Victor


Change by STINNER Victor :


--
pull_requests: +13138

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-08 Thread Inada Naoki


Inada Naoki  added the comment:

@Eryk I didn't say new Terminal will cause this issue.  I know ConsoeIO too.

I just meant Microsoft use cp65001 more widely for better UTF-8 support 
nowadays.
So I want to make cp65001 as alias of UTF-8.


> Python could similarly special case CP_UTF8 as "utf-8" in 
> _locale._getdefaultlocale.

I like this idea too.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-08 Thread Eryk Sun


Eryk Sun  added the comment:

> FYI, I expect cp65001 will be used more widely in near future,
[...]
> It seems use `SetConsoleOutputCP(65001)` and `SetConsoleCP(65001)`.

Unless PYTHONLEGACYWINDOWSSTDIO is defined, Python 3.6+ doesn't use the 
console's codepage-based interface (except for low-level os.read and os.write). 
Console files uses the wide-character console API internally, and have a 
"utf-8" encoding. "cp65001" isn't a factor in this context.

This issue probably occurs due to the encoding returned by 
locale.getpreferredencoding(). This calls _locale._getdefaultlocale, which 
returns a tuple that mixes the user locale with the system ANSI codepage. For 
example, with ANSI set to UTF-8 (Windows 10):

>>> _locale._getdefaultlocale()
('en_GB', 'cp65001')

The Universal CRT special cases CP_UTF8 (codepage 65001) as "utf8" and accepts 
"utf-8" as an alias. For example, after setting the ANSI codepage to UTF-8:

>>> locale.setlocale(locale.LC_CTYPE, '')
'English_United Kingdom.utf8'

Python could similarly special case CP_UTF8 as "utf-8" in 
_locale._getdefaultlocale.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-08 Thread Paul Monson


Paul Monson  added the comment:

Removing import functools from cp65001.py fixes test_startup_imports.

Victor proposed this PR: https://github.com/python/cpython/pull/13110
but new test_codecs fails because it's passing self on to the lambda I think.

I tried to build on Victor's change but there is still one test failure I 
haven't tracked down yet: https://github.com/python/cpython/pull/13211

FAIL: test_incremental_surrogatepass (test.test_codecs.CP65001Test)
--
Traceback (most recent call last):
  File "C:\master\pythond\lib\test\test_codecs.py", line 436, in 
test_incremental_surrogatepass
self.assertEqual(dec.decode(data[i:], True), '\uD901')
AssertionError: '' != '\ud901'
+ \ud901

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-08 Thread Paul Monson


Change by Paul Monson :


--
pull_requests: +13122

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-08 Thread Steve Dower


Steve Dower  added the comment:

The XP/Vista change is just context - we don't have to worry about OS that old 
any more.

If we remove the functools.partial call, does that help?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-08 Thread Paul Monson


Paul Monson  added the comment:

cp65001 is the default codepage on Windows IoT Core and Windows NanoServer.  

There is also an option in control panel in Windows desktop 1809 (version 
17763) and greater which changes the default codepage to cp65001. 
1. Run control.exe
2. Click Clock and Region> change date, time or number formats
3. Click administrative tab
4. Click "Change System locale..." button
5. Check "Beta: Use Unicode UTF-8 for worldwide language support"
6. Click OK twice.
7. You will be prompted to reboot.

> Code page 65001 handles lone surrogate differently on Windows XP and older.

If I read the docs correctly a lone surrogate is an error.  I don't think a 
corner case like handling errors differently makes cp65001 not UTF-8.  Am I 
misunderstanding this point?
Also, Why is Windows XP still relevant in this discussion?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-06 Thread Inada Naoki


Inada Naoki  added the comment:

FYI, I expect cp65001 will be used more widely in near future,
because non UTF-8 default encoding reduced Developer eXperience,
and Microsoft try to improve DX recent years.

Today, Microsoft announced new Terminal application.
It seems use `SetConsoleOutputCP(65001)` and `SetConsoleCP(65001)`.

I think treating cp65001 as right "UTF-8" locale is better for all
Windows developers.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-06 Thread Paul Monson


Paul Monson  added the comment:

> Okay. The test verifies work done to minimize interpreter startup time, but 
> probably the relative cost of importing functools (and thus collections et 
> al) isn't significant compared to the overall cost of spawning a process in a 
> Windows desktop environment. That may not be the case for Nano Server and IoT 
> Core.

Is there an easy way to measure this?

> PYTHONIOENCODING=cp65001

I tried setting PYTHONIOENCODING=cp1252 on Windows IoT Core as a workaround and 
it didn't work.

Victor> My PR 13110 avoids "import functools" at startup. Can you please try it 
and check if it fix test_site?

I tried the PR and it fixes test_startup_imports, which seems promising.  The 
PR breaks other test_site tests on Windows IoT Core. 
 The same ones you pointed out in the PR discussion.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36778] test_site.StartupImportTests.test_startup_imports fails if default code page is cp65001

2019-05-06 Thread Paul Monson


Change by Paul Monson :


--
title: test_site.StartupImportTests.test_startup_imports fails if default code 
page is not cp1252 -> test_site.StartupImportTests.test_startup_imports fails 
if default code page is cp65001

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com