STINNER Victor added the comment:

Your clean() function looses information. If a filename contains almost only 
undecodable characters, it will looks like ����.txt. It's not very useful. I 
would prefer to escape the byte. Mac OS X (HFS+ filesystem) uses for example 
%HH format: "\udc80" would be replaced with "%80" for example.

This format is also used in URLs. For example, "a\xe9b.txt" (latin-1, whereas 
my locale encoding is UTF-8) is displayed "a�b.txt" in Firefox (when listing a 
local directory), but Firefox uses the URL "file://.../a%E9b.txt" (hexadecimal 
in uppercase).

In the Gnome file browser (Nautilus), "a\xe9b.txt" (latin-1, whereas my locale 
encoding is UTF-8) is displayed "a�b.txt (invalid encoding)".

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18814>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to