[issue25023] time.strftime('%a'), ValueError: embedded null byte, in ko locale

eryksun Tue, 08 Sep 2015 02:56:33 -0700

eryksun added the comment:

It seems VC 14 has a bug here. In the new C runtime, strftime is implemented by 
calling wcsftime as follows:


    size_t const result = _Wcsftime_l(wstring.get(), maxsize, wformat.get(), 
timeptr, lc_time_arg, locale);
    if (result == 0)
        return 0;

    // Copy output from wide char string
    if (!WideCharToMultiByte(lc_time_cp, 0, wstring.get(), -1, string, 
static_cast<int>(maxsize), nullptr, nullptr))
    {
        __acrt_errno_map_os_error(GetLastError());
        return 0;
    }

    return result;

The WideCharToMultiByte call returns the number of bytes in the converted 
string, but strftime doesn't update the value of "result". 

This worked correctly in the old CRT. For example, in 3.4 built with VC 10:

    >>> sys.version_info[:2]
    (3, 4)
    >>> locale.setlocale(locale.LC_ALL, 'kor_kor') 
    'Korean_Korea.949'
    >>> time.strftime('%a')
    '\ud654'

Here's an overview of the problem in 3.5, stepped through in the debugger:

    >>> sys.version_info[:2]
    (3, 5)
    >>> locale.setlocale(locale.LC_ALL, 'ko')
    'ko'
    >>> time.strftime('%a')
    Breakpoint 0 hit
    ucrtbase!Wcsftime_l:
    000007fe`e9e6fd74 48895c2410      mov     qword ptr [rsp+10h],rbx 
ss:00000000`003df6d8=0000000000666ce0

wcsftime returns the output buffer length in wide characters:

    0:000> pt; r rax
    rax=0000000000000001

WideCharToMultiByte is called to convert the wide-character string to the 
locale encoding:

    0:000> pc
    ucrtbase!Strftime_l+0x17f:
    000007fe`e9e6c383 ff15dfa00200    call    qword ptr 
[ucrtbase!_imp_WideCharToMultiByte (000007fe`e9e96468)] ds:000007fe`
    e9e96468={KERNELBASE!WideCharToMultiByte (000007fe`fd631be0)}
    0:000> p
    ucrtbase!Strftime_l+0x185:
    000007fe`e9e6c389 85c0            test    eax,eax

This returns the length of the converted string (including the null):

    0:000> r rax
    rax=0000000000000003

But strftime ignores this value, and instead returns the wide-character string 
length, which gets passed to PyUnicode_DecodeLocaleAndSize:

    0:000> bp python35!PyUnicode_DecodeLocaleAndSize
    0:000> g
    Breakpoint 1 hit
    python35!PyUnicode_DecodeLocaleAndSize:
    00000000`5ec15160 4053            push    rbx
    0:000> r rdx
    rdx=0000000000000001

U+D654 was converted correctly to '\xc8\cad' (codepaged 949):

    0:000> db @rcx l3
    00000000`007e5d20  c8 ad 00                                         ...

However, since (str[len] != '\0'), PyUnicode_DecodeLocaleAndSize errors out as 
follows:

    0:000> bd 0,1; g
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: embedded null byte

It works as expected if the length is manually changed to 2:

    >>> time.strftime('%a')
    Breakpoint 1 hit
    python35!PyUnicode_DecodeLocaleAndSize:
    00000000`5ec15160 4053            push    rbx
    0:000> r rdx=2
    0:000> g
    '\ud654'

The string is null-terminated, so can time_strftime simply substitute 
PyUnicode_DecodeLocale in place of PyUnicode_DecodeLocaleAndSize?

----------
components: +Windows
nosy: +eryksun, paul.moore, steve.dower, tim.golden, zach.ware

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue25023>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25023] time.strftime('%a'), ValueError: embedded null byte, in ko locale

Reply via email to