New submission from Martin <gzl...@googlemail.com>:

Currently when running Python on a non-OSX posix environment under either the C 
locale, or with an invalid or missing locale, it's not possible to operate 
using unicode filenames outside the ascii range. Using bytes works, as does 
reading expecting unicode, using the surrogates hack.

This makes robustly working with non-ascii filenames on different platforms 
needlessly annoying, given no modern nix should have problems just using UTF-8 
in these cases.

See the downstream bzr bug for more:
<https://bugs.launchpad.net/bzr/+bug/794353>

One option is to just use UTF-8 for encoding and decoding filenames when 
otherwise ascii would be used. As a strict superset, this shouldn't break too 
many existing assumptions, and it's unlikely that non-UTF-8 filenames will 
accidentally be mangled due to a locale setting blip. See the attached patch 
for this behaviour change. It does not include a test currently, but it's 
possible to write one using subprocess and overriden LANG and LC_ALL vars.

----------
components: Interpreter Core
files: /tmp/filesystem_encoding_utf8.patch
keywords: patch
messages: 149924
nosy: benjamin.peterson, gz
priority: normal
severity: normal
status: open
title: 'ascii' is a bad filesystem default encoding
versions: Python 3.3
Added file: http://bugs.python.org/file24064//tmp/filesystem_encoding_utf8.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13643>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to