On Fri, Sep 30, 2016 at 5:58 AM, iMath <redstone-c...@163.com> wrote: > the doc of os.fsencode(filename) says Encode filename to the filesystem > encoding 'strict' > on Windows, what does 'strict' mean ?
"strict" is the error handler for the encoding. It raises a UnicodeEncodeError for unmapped characters. For example: >>> 'αβψδ'.encode('mbcs', 'strict') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character On the other hand, the "replace" error handler is lossy. With the Windows "mbcs" codec, it substitutes question marks and best-fit mappings for characters that aren't defined in the system locale's ANSI codepage (e.g. 1252). For example: >>> print('αβψδ'.encode('mbcs', 'replace').decode('mbcs')) aß?d This is the behavior of os.listdir with bytes paths, which is why using bytes paths has been deprecated on Windows since 3.3. In 3.6 bytes paths are provisionally allowed again because the filesystem encoding has changed to UTF-8 (internally transcoded to the native UTF-16LE) and uses the "surrogatepass" error handler to allow lone surrogate codes (allowed by Windows). See PEP 529 for more information. -- https://mail.python.org/mailman/listinfo/python-list