[issue28712] Non-Windows mappings for a couple of Windows code pages

2021-03-08 Thread STINNER Victor
Change by STINNER Victor : -- nosy: -vstinner ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue28712] Non-Windows mappings for a couple of Windows code pages

2021-03-04 Thread Larry Hastings
Change by Larry Hastings : -- nosy: -larry ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue28712] Non-Windows mappings for a couple of Windows code pages

2021-03-04 Thread Eryk Sun
Change by Eryk Sun : -- versions: +Python 3.10, Python 3.8, Python 3.9 -Python 2.7, Python 3.5, Python 3.6, Python 3.7 ___ Python tracker ___

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-17 Thread Eryk Sun
Eryk Sun added the comment: Thanks, Serihy. When I looked at this previously, I mistakenly assumed that any undefined codes would be decoded using the codepage's default Unicode character. But for single-byte codepages in the range above 0x9F, Windows instead maps undefined codes to the

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-17 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Thank you Eryk. That is what I want. I just missed that code_page_decode() returns a tuple. Seems Windows maps undefined codes to Unicode characters if they are in the range 0x80-0x9f and makes an error if they are outside of this range. But if the code

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread STINNER Victor
STINNER Victor added the comment: Windows API doc is not easy to understand. I wrote this doc when I fixed code pages in Python 3: http://unicodebook.readthedocs.io/operating_systems.html#windows -- ___ Python tracker

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Eryk Sun
Eryk Sun added the comment: The ANSI and OEM codepages are conveniently supported on a Windows system as the encodings 'mbcs' and 'oem' (new in 3.6). The best-fit mapping is used by the 'replace' error handler (see the encode_code_page_flags function in Objects/unicodeobject.c). For other

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Mingye Wang added the comment: > Codecs are strict by default in Python. Call MultiByteToWideChar() with the > MB_ERR_INVALID_CHARS flag as Python does. Great catch. Without MB_ERR_INVALID_CHARS or WC_NO_BEST_FIT_CHARS Windows would perform the "best fit" behavior described in the BestFit

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Eryk Sun
Eryk Sun added the comment: I rewrote it using the csv module since I can't remember the escaping rules. -- Added file: http://bugs.python.org/file45511/codepage_table.csv ___ Python tracker

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Eryk Sun
Changes by Eryk Sun : Removed file: http://bugs.python.org/file45510/codepage_table.csv ___ Python tracker ___

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Eryk Sun
Eryk Sun added the comment: I don't think the 2nd tuple element is useful when decoding a single byte. It either works or it doesn't, such as failing for non-ASCII bytes with multibyte codepages such as 932 and 950. I'm attaching the output from the following, which you should be able to

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: This would be helpful too if every byte is decoded to exactly 1 character. -- ___ Python tracker ___

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Eryk Sun
Eryk Sun added the comment: How about just the ASCII repr of the 256 decoded characters in CSV? I don't think the list of 2-tuple results is useful. For these single-byte codepages it's always 1 byte consumed. -- ___ Python tracker

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Thanks Eryk. Could you please run following script and attach the output? import codecs codepages = [424, 856, 857, 864, 869, 874, 932, 949, 950, 1250, 1251, 1252, 1253, 1254, 1255, 1257, 1258] for cp in codepages: table = [] for i in range(256):

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Eryk Sun
Eryk Sun added the comment: Serhiy, single-byte codepages map every byte value, even if it's just to a Unicode C1 control code [1]. For example: import ctypes kernel32 = ctypes.WinDLL('kernel32', use_last_error=True) MB_ERR_INVALID_CHARS = 0x0008 def

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Codecs are strict by default in Python. Call MultiByteToWideChar() with the MB_ERR_INVALID_CHARS flag as Python does. You also could use _codecs.code_page_decode(). -- ___ Python tracker

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Ned Deily
Ned Deily added the comment: I'm not qualified to offer a technical opinion on Windows matters like this so, for 3.6, I leave it to your discretion, Steve. If you do decide to push this change, please do so before 3.6.0b4 on Monday. -- ___ Python

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Steve Dower
Steve Dower added the comment: No idea which is faster, but the tables have better compatibility. However, I'm not sure that changing the tables in already released versions is a great idea, since it could "corrupt" programs without warning. Adding the release managers to weigh in - my gut

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Mingye Wang added the comment: ... On the other hand, I am happy to use these Win32 functions if they are faster, but still the table should be made correct in the first place. (See also issue28343 (936) and issue28693 (950) for problems with DBCS Chinese code pages.) --

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Mingye Wang added the comment: Yes, it's a table issue. My suggested fix is to replace them all with WindowsBestFit tables, where MS currently redirects https://msdn.microsoft.com/en-us/globalization/mt767590 visitors to. These old "WINDOWS" tables appear abandoned since long ago. --

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Steve Dower
Steve Dower added the comment: So is this a bug in the hardcoded encoding tables in Python? I briefly considered making them all use the OS functions, but then they'll be inconsistent with other platforms (where the tables should work fine). Do you have a proposed fix? That will help

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- components: +Windows nosy: +paul.moore, steve.dower, tim.golden, zach.ware ___ Python tracker

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Changes by Mingye Wang : Removed file: http://bugs.python.org/file45502/pycp.py ___ Python tracker ___

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Mingye Wang added the comment: The output is already attached as win10_14959_py36.txt. PS: after playing with ctypes, I got a version of pycp that works with Py < 3.3 too (attached with comment). -- Added file: http://bugs.python.org/file45503/pycp_ctypes.py

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: What is the output of new script? -- ___ Python tracker ___ ___

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Changes by Mingye Wang : Removed file: http://bugs.python.org/file45497/pycp.py ___ Python tracker ___

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Changes by Mingye Wang : Added file: http://bugs.python.org/file45502/pycp.py ___ Python tracker ___

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-16 Thread Mingye Wang
Mingye Wang added the comment: Ugh... This is weird. Attached is a correct version use Python 3.6's 'code page' methods. I have modified the script a little to make sure it runs on Py3. -- Added file: http://bugs.python.org/file45501/win10_14959_py36.txt

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-15 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: It seems to me there is something wrong with your test. For example decoding b'\x81\x8d' from CP1251 (as well from any other codepage!) gives you u'\x81\x8d', but codes 0x81 and 0x8D are assigned to different characters: 'Ѓ' (U+0402) and 'Ќ' (U+040C). 0x81

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-15 Thread Mingye Wang
Mingye Wang added the comment: > Python 3.4.3 on Cygwin also fails ``b'\x81\x8d'.encode('cp1252')``. ... but since Cygwin packagers did not enable Win32 APIs for their build, I cannot test the script directly. -- ___ Python tracker

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-15 Thread Mingye Wang
Changes by Mingye Wang : Added file: http://bugs.python.org/file45498/windows10_14959.txt ___ Python tracker ___

[issue28712] Non-Windows mappings for a couple of Windows code pages

2016-11-15 Thread Mingye Wang
New submission from Mingye Wang: Mappings for 0x81 and 0x8D in multiple Windows code pages diverge from what Windows does. Attached is a script that tests for this behavior. (These two bytes are not necessary the only problems, but for sure they are the most widespread and famous ones. Again,