All I can say is "ouch". Hard to call it a regression to no longer allow this mess...
CHB > On Feb 8, 2016, at 4:37 PM, eryk sun <eryk...@gmail.com> wrote: > >> On Mon, Feb 8, 2016 at 2:41 PM, Chris Barker <chris.bar...@noaa.gov> wrote: >> Just to clarify -- what does it currently do for bytes? IIUC, Windows uses >> UTF-16, so can you pass in UTF-16 bytes? Or when using bytes is is assuming >> some Windows ANSI-compatible encoding? (and what does it return?) > > UTF-16 is used in the [W]ide-character API. Bytes paths use the [A]NSI > codepage. For a single-byte codepage, the ANSI API rountrips, i.e. a > bytes path that's passed to CreateFileA matches the listing from > FindFirstFileA. But for a DBCS codepage arbitrary bytes paths do not > roundtrip. Invalid byte sequences map to the default character. Note > that an ASCII question mark is not always the default character. It > depends on the codepage. > > For example, in codepage 932 (Japanese), it's an error if a lead byte > (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a > value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not > uncommon). In this case the ANSI API substitutes the default character > for Japanese, '・' (U+30FB, Katakana middle dot). > >>>> locale.getpreferredencoding() > 'cp932' >>>> open(b'\xe05', 'w').close() >>>> os.listdir('.') > ['・'] >>>> os.listdir(b'.') > [b'\x81E'] > > All invalid sequences get mapped to '・', which roundtrips as > b'\x81\x45', so you can't reliably create and open files with > arbitrary bytes paths in this locale. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com