eryksun added the comment:
It seems VC 14 has a bug here. In the new C runtime, strftime is implemented by
calling wcsftime as follows:
size_t const result = _Wcsftime_l(wstring.get(), maxsize, wformat.get(),
timeptr, lc_time_arg, locale);
if (result == 0)
return 0;
// Copy output from wide char string
if (!WideCharToMultiByte(lc_time_cp, 0, wstring.get(), -1, string,
static_cast<int>(maxsize), nullptr, nullptr))
{
__acrt_errno_map_os_error(GetLastError());
return 0;
}
return result;
The WideCharToMultiByte call returns the number of bytes in the converted
string, but strftime doesn't update the value of "result".
This worked correctly in the old CRT. For example, in 3.4 built with VC 10:
>>> sys.version_info[:2]
(3, 4)
>>> locale.setlocale(locale.LC_ALL, 'kor_kor')
'Korean_Korea.949'
>>> time.strftime('%a')
'\ud654'
Here's an overview of the problem in 3.5, stepped through in the debugger:
>>> sys.version_info[:2]
(3, 5)
>>> locale.setlocale(locale.LC_ALL, 'ko')
'ko'
>>> time.strftime('%a')
Breakpoint 0 hit
ucrtbase!Wcsftime_l:
000007fe`e9e6fd74 48895c2410 mov qword ptr [rsp+10h],rbx
ss:00000000`003df6d8=0000000000666ce0
wcsftime returns the output buffer length in wide characters:
0:000> pt; r rax
rax=0000000000000001
WideCharToMultiByte is called to convert the wide-character string to the
locale encoding:
0:000> pc
ucrtbase!Strftime_l+0x17f:
000007fe`e9e6c383 ff15dfa00200 call qword ptr
[ucrtbase!_imp_WideCharToMultiByte (000007fe`e9e96468)] ds:000007fe`
e9e96468={KERNELBASE!WideCharToMultiByte (000007fe`fd631be0)}
0:000> p
ucrtbase!Strftime_l+0x185:
000007fe`e9e6c389 85c0 test eax,eax
This returns the length of the converted string (including the null):
0:000> r rax
rax=0000000000000003
But strftime ignores this value, and instead returns the wide-character string
length, which gets passed to PyUnicode_DecodeLocaleAndSize:
0:000> bp python35!PyUnicode_DecodeLocaleAndSize
0:000> g
Breakpoint 1 hit
python35!PyUnicode_DecodeLocaleAndSize:
00000000`5ec15160 4053 push rbx
0:000> r rdx
rdx=0000000000000001
U+D654 was converted correctly to '\xc8\cad' (codepaged 949):
0:000> db @rcx l3
00000000`007e5d20 c8 ad 00 ...
However, since (str[len] != '\0'), PyUnicode_DecodeLocaleAndSize errors out as
follows:
0:000> bd 0,1; g
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: embedded null byte
It works as expected if the length is manually changed to 2:
>>> time.strftime('%a')
Breakpoint 1 hit
python35!PyUnicode_DecodeLocaleAndSize:
00000000`5ec15160 4053 push rbx
0:000> r rdx=2
0:000> g
'\ud654'
The string is null-terminated, so can time_strftime simply substitute
PyUnicode_DecodeLocale in place of PyUnicode_DecodeLocaleAndSize?
----------
components: +Windows
nosy: +eryksun, paul.moore, steve.dower, tim.golden, zach.ware
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue25023>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com