[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2022-01-16 Thread Inada Naoki

Inada Naoki  added the comment:

Mercurial still use it.

Mercurial has plan to move filesystem name from ANSI Code Page to UTF-8, but I 
don't know about its progress.

nosy: +methane

Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2022-01-16 Thread Irit Katriel

Irit Katriel  added the comment:

With 3.6 being over, is _enablelegacywindowsfsencoding still needed or is it 
time to deprecate it?

nosy: +iritkatriel

Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2018-02-19 Thread Steve Dower

Steve Dower  added the comment:

I took another look at this and it's still unclear whether it's worth the 
performance loss.

Perhaps moving fsencode and fsdecode (almost) entirely into C would be a better 
approach? That shouldn't send us backwards at all, and all they really do is a 
typecheck and then calling a function that's already written in C.

assignee: steve.dower -> 

Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-21 Thread Steve Dower

Steve Dower added the comment:



Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-21 Thread STINNER Victor

STINNER Victor added the comment:

Can't we just update the cache when the function changes the encoding?


Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-21 Thread Steve Dower

Steve Dower added the comment:

Thanks for checking that.

I don't think it's worth retaining the cache on Windows in the face of the 
broken behaviour. Any real-world case where a lot of paths are being encoded or 
decoded is also likely to involve file-system access which will dwarf the 
encoding time. Further, passing bytes on Windows will result in another 
decode/encode cycle anyway, so there will be a bigger performance impact in 
using str (though even then, probably only when the str is already represented 
using 16-bit characters).

Unless somebody wants to make a case for having a more complex mechanism to 
reset the cache, I'll make the change to remove it (protected by an 'if 
sys.platform.startswith('win')' check).

assignee:  -> steve.dower
versions: +Python 3.7

Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-19 Thread JGoutin

JGoutin added the comment:

A little encoding cache benchmark.

Current Code:

import sys

def _fscodec():
encoding = sys.getfilesystemencoding()
errors = sys.getfilesystemencodeerrors()

def fsencode(filename):
filename = fspath(filename)  # Does type-checking of `filename`.
if isinstance(filename, str):
return filename.encode(encoding, errors)
return filename

def fsdecode(filename):
filename = fspath(filename)  # Does type-checking of `filename`.
if isinstance(filename, bytes):
return filename.decode(encoding, errors)
return filename

return fsencode, fsdecode

fsencode, fsdecode = _fscodec()
del _fscodec


import os

%timeit os.fsdecode(b'\xc3\xa9')
The slowest run took 21.59 times longer than the fastest. This could mean that 
an intermediate result is being cached.
100 loops, best of 3: 449 ns per loop

%timeit os.fsencode('é')
The slowest run took 24.20 times longer than the fastest. This could mean that 
an intermediate result is being cached.
100 loops, best of 3: 412 ns per loop

Modified Code:

from sys import getfilesystemencoding, getfilesystemencodeerrors

def fsencode(filename):
filename = fspath(filename)  # Does type-checking of `filename`.
if isinstance(filename, str):
return filename.encode(getfilesystemencoding(),
return filename

def fsdecode(filename):
filename = fspath(filename)  # Does type-checking of `filename`.
if isinstance(filename, bytes):
return filename.decode(getfilesystemencoding(),
return filename


import os

%timeit os.fsdecode(b'\xc3\xa9')
The slowest run took 15.88 times longer than the fastest. This could mean that 
an intermediate result is being cached.
100 loops, best of 3: 541 ns per loop

%timeit os.fsencode('é')
The slowest run took 19.32 times longer than the fastest. This could mean that 
an intermediate result is being cached.
100 loops, best of 3: 502 ns per loop


Cache is a 17% speed up optimization.


Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-13 Thread JGoutin

JGoutin added the comment:

Yes, I reported this encoding issue to some of them.

But, there is still some problems : 
- Some libraries are not updated frequently (Or not still maintained), and 
still use fsencode.
- Tests and CI don't see this problem if they don't have a test case for 
filename with accents or other uncommon characters in english.

This problem will not be easy to eliminate totally...


Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-13 Thread STINNER Victor

STINNER Victor added the comment:

Hum, it was long time ago since I worked on Windows. Well, Python has a "mbcs" 
codec which uses the ANSI code page which exists like "forever". These 
libraries should be patched to use "mbcs" instead of 


Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-13 Thread STINNER Victor

STINNER Victor added the comment:

> Temporary fixing issues with some third party libraries which use C code for 
> files I/O (With filename as "mbcs" encoded bytes internally).
> Theses libraries generally call 
> "filename.encode(sys.getfilesystemencoding())" or "os.fsencode(filename)" 
> before sending filenames from Python to C code.

Hum, Python lacks a function to encode to/decode from the ANSI code page, 
something like codecs.code_page_encode() / code_page_decode() with CP_ACP. It 
would allow to get the same encoding in UTF-8 and legacy modes.


Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-13 Thread JGoutin

JGoutin added the comment:

Personally, I call "sys._enablelegacywindowsfsencoding()" for only one reason :
Temporary fixing issues with some third party libraries which use C code for 
files I/O (With filename as "mbcs" encoded bytes internally).

Theses libraries generally call "filename.encode(sys.getfilesystemencoding())" 
or "os.fsencode(filename)" before sending filenames from Python to C code.

Actually, I didn't see any side effect for using this function. Maybe because I 
call it at start before anything else.

Using the environment variable is not easy in my case.

I can look if encoding caching in fsencode is efficient (On Windows). And 
eventually propose a code change for this.


Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-13 Thread Steve Dower

Steve Dower added the comment:

Windows doesn't use the fs encoding at all until Python code requests/provides 
something in bytes. Except for the caching in fsencode/fsdecode, there's no 
problem setting it once at the start of your program (and it can only be set 
once - there's no parameter and it cannot be undone).

What I'm most interested in is whether caching the encoding in 
fsencode/fsdecode is actually an optimization - if not, remove it, and if so 
make a way to reset it. I'll get around to this sooner or later but I don't 
want to stop someone else from doing it.


Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-13 Thread STINNER Victor

STINNER Victor added the comment:

My experience with changing the Python "filesystem encoding" 
(sys.getfilesystemencoding()) at runtime: it doesn't work.

The filesystem encoding must be set as soon as possible and must never change 
later. As soon as possible: before the first call to os.fsdecode(), which is 
implemented in C as Py_DecodeLocale(). For example, the encoding must be set 
before Python imports the first module.

The filesystem encoding must be set before Python decodes *any* operating 
system data: command line arguments, any filename or path, environment 
variables, etc.

Hopefully, Windows provides most operating system data as Unicode directly: 
command line arguments and environment variables are exposed as Unicode for 

os.fsdecode() and os.fsencode() have an important property:
assert os.fsencode(os.fsdecode(data)) == data

On Windows, the other property is maybe more imporant:
assert os.fsdecode(os.fsencode(data)) == data

If the property becomes false, for example if the filesystem encoding is 
changed at runtime, you get mojibake. Example:

* os.fsdecode() decodes the filename b'h\xc3\xa9llo' from UTF-8 => 'h\xe9llo'
* sys._enablelegacywindowsfsencoding()
* os.fsencode() encodes the filename to cp1252 => you get 'h\xc3\xa9llo'
 instead of 'h\xe9llo', say hello to mojibake


Sorry, I didn't play with sys._enablelegacywindowsfsencoding() on Windows. I 
don't know if it would "work" if sys._enablelegacywindowsfsencoding() is the 
first instruction of an application. I expect that Python almost decodes 
nothing at startup on Windows, so it may work.

sys._enablelegacywindowsfsencoding() is a private method, so it shouldn't be 

Maybe we could add a "fuse" (flag only sets to 1, cannot be reset to 0) to 
raise an exception if sys._enablelegacywindowsfsencoding() is called "too 
late", after the first call to os.fsdecode() / Py_DecodeLocale()?


Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-13 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

Adding Victor, who implemented the fs codec.

AFAIK, it's not possible to change the encoding after interpreter 
initialization, since it will have been already used for many different things 
by the time you get to executing code.

nosy: +haypo, lemburg

Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-12 Thread Steve Dower

Steve Dower added the comment:

Then we do in fact need to make os.fsencode/fsdecode either stop caching the 
encoding completely, or figure out a way to reset the cache when that function 
is called.


Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-11 Thread JGoutin

JGoutin added the comment:

import sys

# Force the use of legacy encoding like versions of Python prior to 3.6.

# Show actual file system encoding
encoding = sys.getfilesystemencoding()
print('File system encoding:', encoding)

# os.fsencode(filename) VS filename.encode(File system encoding)
import os
print(os.fsencode('é'), 'é'.encode(encoding))

>>> File system encoding: mbcs
>>> b'\xc3\xa9' b'\xe9'

The result is the same.


Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-11 Thread Steve Dower

Steve Dower added the comment:

If you import os first then that's acceptable and we should document it more 
clearly. Try calling enable before importing os.

I wouldn't be surprised if os is imported automatically, in which case we need 
to figure out some alternate caching mechanism that can be reset by the enable 
call (or demonstrate that the caching is of no benefit).


Python tracker 

Python-bugs-list mailing list

[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2017-01-11 Thread JGoutin

New submission from JGoutin:

The doc say that calling "sys._enablelegacywindowsfsencoding()" is equivalent 
to use "PYTHONLEGACYWINDOWSFSENCODING" environment variable.

In fact, this no apply to "os.fsencode" and "os.fsdecode".

Example with Python 3.6 64Bits on Windows 7 64 bits :

EXAMPLE CODE 1 (sys._enablelegacywindowsfsencoding()): 

import sys
import os

# Force the use of legacy encoding like versions of Python prior to 3.6.

# Show actual file system encoding
encoding = sys.getfilesystemencoding()
print('File system encoding:', encoding)

# os.fsencode(filename) VS filename.encode(File system encoding)
print(os.fsencode('é'), 'é'.encode(encoding))

>>> File system encoding: mbcs
>>> b'\xc3\xa9' b'\xe9'

First is encoded with "utf-8" and not "mbcs" (The actual File system encoding)


import sys
import os

# Force the use of legacy encoding like versions of Python prior to 3.6.
# "PYTHONLEGACYWINDOWSFSENCODING" environment variable set before running 

# Show actual file system encoding
encoding = sys.getfilesystemencoding()
print('File system encoding:', encoding)

# os.fsencode(filename) VS filename.encode(File system encoding)
print(os.fsencode('é'), 'é'.encode(encoding))

>>> File system encoding: mbcs
>>> b'\xe9' b'\xe9'

Everything encoded with "mbcs" (The actual File system encoding)

In "os.fsencode" and "os.fsdecode" encoding and errors are cached on start and 
never updated by "sys._enablelegacywindowsfsencoding()" after.

components: Windows
messages: 285220
nosy: JGoutin, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and 
type: behavior
versions: Python 3.6

Python tracker 

Python-bugs-list mailing list