STINNER Victor <victor.stin...@haypocalc.com> added the comment: > > What? No. We have problems because we don't use the same encoding to > > decode and to encode the same data type. It's not a problem to use a > > different encoding for each data type (stdout, filenames, environment > > variables, ...). > > This is exactly the very problem that we face. In particular, the > question is what encoding to use if something is *both* a filename > and an environment variable value, or both a filename and a command > line argument.
The question is: what is the best default encoding for a specific data type? There is no perfect answer (well, except maybe using byte strings :-)). Each solution has its own use cases and disadvantages. If an application knows exactly the encoding of a data, and it is not the default encoding, it can still redecode the data. Using os.environb, it's a little bit better: the application just has to decode (don't have to encode and to know which encoding was used to decode initially the data). For sys.argv, I still want to create sys.argvb (bytes version) ;-) For the command line arguments and environment variables, we don't have a lot of choices: locale or filesystem encodings. So Antoine and Martin: which encoding do you prefer? We should maybe try to find some use cases Here is a dummy script bla.py: --- import sys print(sys.argv) try: open(sys.argv[1]).close() except Exception as err: print("open error: %s" % err) else: print("open ok") --- Locale encoding = FS encoding = utf-8: $ ./python bla.py xxxé.txt ['bla.py', 'xxxé.txt'] open ok Locale encoding = utf8, FS encoding = ascii: $ PYTHONFSENCODING=ascii ./python bla.py xxxé.txt ['bla.py', 'xxxé.txt'] open error: 'ascii' codec can't encode character '\xe9' ... The filename is displayed correctly, but we are unable to open the file if PYTHONFSENCODING is used :-/ Should the filename be displayed differently if PYTHONFSENCODING is used? > I think these problems are sufficiently resolved now: either by > PEP 3333, PEP 444, PEP 383, or os.environb. Ok, cool :-) > I think you misunderstood MAL's comment, though: the environment > variables are not encoded in *any* specific encoding. Instead, > they are copied literally from the HTTP request, using whatever > bytes the browser originally put in there - which may or may > not have followed a particular encoding. HTTP is silent on > this most of the time, and HTML is out of scope. Ah yes, thanks for you explaination. I was unable to find its comment. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9992> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com