STINNER Victor <victor.stin...@haypocalc.com> added the comment: > A packaging mechanism that prepares code developed on a Latin-1 > filesystem for distribution, would have to NFKC-normalize > filenames before encoding them using UTF-8.
It causes portability issues: if you copy a non-ASCII module on a new host, the program will work or not depending on the filesystem encoding. Having to transform the filename when you copy a file, just to fix a corner case, is a pain. > One possible solution to this problem is to define a 'compat' error > handler that would detect unencodable strings with encodable > compatibility equivalents and produce encoding of an NFKC equivalent > string instead of raising an error. Only few people use non-ASCII module names and most operating systems are able to store all Unicode characters, so I don't think that we need to support U+00B5 in a module name with Latin1 filesystem at all. If you use an old system using Latin1 filesystem, you have to limit your expectation on Python unicode support :-) os.fsencode() and os.fsdecode() already use a custom error handler: surrogateescape. compat will conflict with surrogateescape. Loading a module concatenates two parts: a path from sys.path (decoded from the filesystem encoding and surrogateescape error handler) and a module name. If custom is used to encode the filename, the module name will be encoded correctly, but not the path. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10952> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com