[issue43403] Misleading statement about bytes not being able to represent windows filenames in documentation

2021-03-04 Thread Eryk Sun


Eryk Sun  added the comment:

> lets not claim that bytes cannot represent everything on a filesystem 
> with an encoding.

Gregory, before changing the filesystem encoding to UTF-8 in Python 3.6, the 
[A]NSI file API (e.g. CreateFileA) was used for bytes paths and the [W]ide 
character file API was used for str paths (e.g. CreateFileW). The ANSI API is a 
set of wrapper functions that automatically translate strings between the ANSI 
code page of the current process and the system's native UTF-16 encoding, 
before and after calling the wide-character function (or a common internal 
function). Starting with Windows 10, the ANSI and OEM code pages of a process 
are finally allowed to be UTF-8 (code page 65001), but it's still considered 
beta and barely used. Usually the ANSI API is set to a legacy single-byte or 
double-byte code page such as 1252 (Western Europe) or 932 (Japanese). 

Natively, Windows is UTF-16, and native Windows filesystems store filenames on 
disk using 16-bit characters. The system doesn't check for valid Unicode, so 
lone surrogate codes are allowed. This is sometimes called a "Wobbly" format. 
In Python it requires the "surrogatepass" error handler.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43403] Misleading statement about bytes not being able to represent windows filenames in documentation

2021-03-04 Thread Eryk Sun


Change by Eryk Sun :


--
resolution:  -> duplicate
stage: needs patch -> resolved
status: open -> closed
superseder:  -> os.path states that bytes can't represent all MBCS paths under 
Windows

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43403] Misleading statement about bytes not being able to represent windows filenames in documentation

2021-03-04 Thread Ammar Askar


Change by Ammar Askar :


--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43403] Misleading statement about bytes not being able to represent windows filenames in documentation

2021-03-04 Thread Gregory P. Smith


New submission from Gregory P. Smith :

As noted in the comment on 
https://github.com/rdiff-backup/rdiff-backup/issues/540#issuecomment-789485896

The Python documentation in https://docs.python.org/3/library/os.path.html 
makes an odd claim that bytes cannot represent all file names on Windows.  That 
doesn't make sense.  bytes can by definition represent everything.

"""Vice versa, using bytes objects cannot represent all file names on Windows 
(in the standard mbcs encoding), hence Windows applications should use string 
objects to access all files."""

Could we get this clarified and corrected to cover what any actual technical 
limitation is?

Every OS is going to reject some bytes objects as a pathname for containing 
invalid byte sequences for their filesystem (ex: I doubt any OS allows null 
b'\0' characters).  But lets not claim that bytes cannot represent everything 
on a filesystem with an encoding.

--
assignee: docs@python
components: Documentation
messages: 388122
nosy: docs@python, gregory.p.smith, steve.dower
priority: normal
severity: normal
stage: needs patch
status: open
title: Misleading statement about bytes not being able to represent windows 
filenames in documentation
versions: Python 3.10, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com