> Victor Stinner schrieb: (Thanks Victor for moving this to the list. Having a discussion in the tracker is really painful, I find.)
>> POSIX OS >> -------- >> >> The default behaviour should be to use unicode and raise an error if >> conversion to unicode fails. It should also be possible to use bytes using >> bytes arguments and optional arguments (for getcwd). >> >> - listdir(unicode) -> unicode and raise an error on invalid filename I know I keep flipflopping on this one, but the more I think about it the more I believe it is better to drop those names than to raise an exception. Otherwise a "naive" program that happens to use os.listdir() can be rendered completely useless by a single non-UTF-8 filename. Consider the use of os.listdir() by the glob module. If I am globbing for *.py, why should the presence of a file named b'\xff' cause it to fail? Robust programs using os.listdir() should use the bytes->bytes version. >> - listdir(bytes) -> bytes >> - getcwd() -> unicode >> - getcwd(bytes=True) -> bytes >> - open(): accept bytes or unicode >> >> os.path.*() should accept operations on bytes filenames, but maybe not on >> bytes+unicode arguments. os.path.join('directory', b'filename'): raise an >> error (or use *implicit* conversion to bytes)? (Yeah, it should be all bytes or all strings.) On Mon, Sep 29, 2008 at 9:45 AM, Georg Brandl <[EMAIL PROTECTED]> wrote: > This approach (changing all path-handling functions to accept either bytes > or string, but not both) is doomed in my eyes. First, there are lots of them, > second, they are not only in os.path but in many modules and also in user > code, and third, I see no clean way of implementing them in the specified way. > (Just try to do it with os.path.join as an example; I couldn't find the > good way to write it, only the bad and the ugly...) It doesn't have to be supported for all operations -- just enough to be able to access all the system calls. and do the most basic pathname manipulations (split and join -- almost everything else can be built out of those). > If I had to choose, I'd still argue for the modified UTF-8 as filesystem > encoding (if it were UTF-8 otherwise), despite possible surprises when a > such-encoded filename escapes from Python. I'm having a hard time finding info about UTF-8b. Does anyone have a decent link? I noticed that OSX has a different approach yet. I believe it insists on valid UTF-8 filenames. It may even require some normalization but I don't know if the kernel enforces this. I tried to create a file named b'\xff' and it came out as %ff. Then "rm %ff" worked. So I think it may be replacing all bad UTF8 sequences with their % encoding. The "set filesystem encoding to be Latin-1" approach has a certain charm as well, but clearly would be a mistake on OSX, and probably on other systems too (whenever the user doesn't think in Latin-1). -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com