Eryk Sun <[email protected]> added the comment:
In Unix, Python 3.6 decodes the char * command line arguments via mbstowcs. In
Linux, I see the following misbehavior of mbstowcs when decoding an overlong
UTF-8 sequence:
>>> mbstowcs = ctypes.CDLL(None, use_errno=True).mbstowcs
>>> arg = bytes(x + 128 for x in [1 + 124, 63, 63, 59, 58, 58])
>>> mbstowcs(None, arg, 0)
1
>>> buf = (ctypes.c_int * 2)()
>>> mbstowcs(buf, arg, 2)
1
>>> hex(buf[0])
'0x7fffbeba'
This shouldn't be an issue in 3.7, at least not with the default UTF-8 mode
configuration. With this mode, Py_DecodeLocale calls _Py_DecodeUTF8Ex using the
surrogateescape error handler [1].
[1]: https://github.com/python/cpython/blob/v3.7.2/Python/fileutils.c#L456
----------
nosy: +eryksun
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue35883>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com