Thanks for reminding us of this! Is there any chance that you can submit a patch (or even the beginnings of one) for this?
At the very least, can you add the issue to the tracker so we'll have a permanent reminder until it's resolved? On Feb 16, 2008 7:20 AM, Giovanni Bajo <[EMAIL PROTECTED]> wrote: > Hello, > > CPython 2.x (and 3.x) under Win32 has an issue with sys.argv. The list is > computed using the ANSI version of the windows APIs[*]. The problem is > apparent when you have a file/directory which can't be represented in the > system encoding (eg: a japanese-named file or directory on a Western > Windows), because the Windows ANSI API will encode the filename to the > system encoding using what we call the "replace" policy, and sys.argv[] > will contain an entry like "c:\\foo\\??????????????.dat". > > At the moment, there's simply no way of passing such a file to a Python > script/application as an argument (eg: if you double-click on that file, > and the file is associated to a Python application). This is a wide- > spread problem among Python applications; eg. if you click on a > Japanese .torrent file, ABC (a Bittorent client written in Python) won't > be able to open it and will complain "cannot access > file ??????????.torrent". > > I understand that fixing this properly in the 2.x serie might have > backward compatibility issues, but I propose that this be fixed at least > in the Python 3.x serie, and I volunteer to write a patch. I would be > glad if someone expert with ANSI/Unicode/Windows (MvL?) would show me > what he believes being the correct way of approaching this problem. > > My suggestion is that: > > * At the Python level, we still expose a single sys.argv[], which will > contain unicode strings. I think this exactly matches what Py3k does now. > (Back in the time, there were proposals to add a sys.argvu, but I guess > it does not make sense right now). > * At the C level, I believe it involves using GetCommandLineW() and > CommandLineToArgvW() in WinMain.c, but should Py_Main/PySys_SetArgv() be > changed to also accept wchar_t** arguments? Or is it better to allow for > NULL to be passed (under Windows at least), so that the Windows code-path > in there can use GetCommandLineW()/CommandLineToArgvW() to get the > current process' arguments? > > Thanks! > > [*] In detail: it actually comes from __argc/__argv (see WinMain.c), > which in turn are computed by the CRT startup code, which would adapt to > user's choice but Python is being compiled in ANSI mode. > -- > Giovanni Bajo > > _______________________________________________ > Python-3000 mailing list > Python-3000@python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com