Eryk Sun added the comment:
The problem from issue 10653 is that internally the CRT encodes the time zone
name using the ANSI codepage (i.e. the default system codepage). wcsftime
decodes this string using mbstowcs (i.e. multibyte string to wide-character
string), which uses Latin-1 in the C locale. In other words, in the C locale on
Windows, mbstowcs just casts the byte values to wchar_t.
With the new Universal CRT, strftime is implemented by calling wcsftime, so the
accepted solution for issue 10653 is broken in 3.5+. A simple way around the
problem is to switch back to using wcsftime and temporarily (or permanently)
set the thread's LC_CTYPE locale to the system default. This makes the internal
mbstowcs call use the ANSI codepage. Note that on POSIX platforms 3.x already
sets the default via setlocale(LC_CTYPE, "") in Python/pylifecycle.c. Why not
set this for all platforms that have setlocale?
> I only tested with my default US locale.
If your system locale uses codepage 1252 (a superset of Latin-1), then you can
still test this on a per thread basis if your system has additional language
packs. For example:
import ctypes
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
if kernel32.GetModuleHandleW('ucrtbased'): # debug build
crt = ctypes.CDLL('ucrtbased', use_errno=True)
else:
crt = ctypes.CDLL('ucrtbase', use_errno=True)
MUI_LANGUAGE_NAME = 8
LC_CTYPE = 2
class tm(ctypes.Structure):
pass
crt._gmtime64.restype = ctypes.POINTER(tm)
# set a Russian locale for the current thread
kernel32.SetThreadPreferredUILanguages(MUI_LANGUAGE_NAME,
'ru-RU\0', None)
crt._wsetlocale(LC_CTYPE, 'ru-RU')
# update the time zone name based on the thread locale
crt._tzset()
# get a struct tm *
ltime = ctypes.c_int64()
crt._time64(ctypes.byref(ltime))
tmptr = crt._gmtime64(ctypes.byref(ltime))
# call wcsftime using C and Russian locales
buf = (ctypes.c_wchar * 100)()
crt._wsetlocale(LC_CTYPE, 'C')
size = crt.wcsftime(buf, 100, '%Z\r\n', tmptr)
tz1 = buf[:size]
crt._wsetlocale(LC_CTYPE, 'ru-RU')
size = crt.wcsftime(buf, 100, '%Z\r\n', tmptr)
tz2 = buf[:size]
hcon = kernel32.GetStdHandle(-11)
pn = ctypes.pointer(ctypes.c_uint())
>>> _ = kernel32.WriteConsoleW(hcon, tz1, len(tz1), pn, None)
Âðåìÿ â ôîðìàòå UTC
>>> _ = kernel32.WriteConsoleW(hcon, tz2, len(tz2), pn, None)
Время в формате UTC
The first result demonstrates the ANSI => Latin-1 mojibake problem in the C
locale. You can encode this result as Latin-1 and then decode it back as
codepage 1251:
>>> tz1.encode('latin-1').decode('1251') == tz2
True
But transcoding isn't a general workaround since the format string shouldn't be
restricted to ANSI, unless you can smuggle the Unicode through like Takayuki
showed.
----------
nosy: +eryksun
versions: +Python 3.6
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue8304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com