Ulrich Eckhardt wrote: > On Wednesday 10 December 2008, Adam Olsen wrote: >> On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt >> >> <[EMAIL PROTECTED]> wrote: >>> On Tuesday 09 December 2008, Adam Olsen wrote: >>>> The only thing separating this from a bikeshed discussion is that a >>>> bikeshed has many equally good solutions, while we have no good >>>> solutions. Instead we're trying to find the least-bad one. The >>>> unicode/bytes separation is pretty close to that. Adding a warning >>>> gets even closer. Adding magic makes it worse. >>> Well, I see two cases: >>> 1. Converting from an uncertain representation to a known one. >>> 2. Converting from a known representation to a known one. >> Not quite: >> 1. Using a garbage file name locally (within a single process, not >> talking to any libs) >> 2. Using a unicode filename everywhere (libs, saved to config files, >> displayed to the user, etc.) > > I think there is some misunderstanding. I was referring to conversions and > whether it is good to perform them implicitly. For that, I saw the above two > cases. > >> On linux the bytes/unicode separation is perfect for this. You decide >> which approach you're using and use it consistently. If you mess up >> (mixing bytes and unicode) you'll consistently get an error. >> >> We currently don't follow this model on windows, so a garbage file >> name gets passed around as if it was unicode, but fails when passed to >> a lib, saved to a config file, is displayed to a user, etc. > > I'm not sure I agree with this. Facts I know are: > 1. On POSIX systems, there is no reliable encoding for filenames while the > system APIs use char/byte strings. > 2. On MS Windows, the encoding for filenames is Unicode/UTF-16. > > Returning Unicode strings from readdir() is wrong because it can't handle the > case 1 above. Returning byte strings is wrong because it can't handle case 2 > above because it gives you useless roundtrips from UTF-16 to either UTF-8 or, > worst case, to the locale-dependent MBCS. Returning something different > depending on the system us also broken because that would make Python code > that uses this function and assumes a certain type unportable. > > Note that this doesn't get much better if you provide a separate readdirb() > API or one that simply returns a byte string or Unicode string depending on > its argument. It just shifts the brokenness from readdir() to the code that > uses it, unless this code makes a distinction between the target systems. > Since way too many programmers are not aware of the problem, they will not > handle these systems differently, so code will become non-portable. > > What I'd just like some feedback on is the approach to return a distinct type > (neither a byte string nor a Unicode string) from readdir(). In order to use > this, a programmer will have to convert it explicitly, otherwise e.g. > printing it will just produce <env_string at 0x01234567>. This will > immediately bump each programmer with their heads on the issue of unknown > encodings and they will have to make the application-specific choice whether > an approximation of the filename, an exception or ignoring the file is the > right choice. Also, it presents the options for doing this conversion in a > single class, which I personally find much better than providing overloads > for hundreds of functions. > > > Sorry for ranting, but I'm a bit confused and desperate, because either I'm > unable to explain what I mean or I'm really not understanding something that > everybody else here seems to agree upon. I just know that using a distinct > path type has helped me in C++ in the past, and I don't see why it shouldn't > in Python. > Seems to me this just threatens to add to the confusion.
If you know what your filesystem produces, you can take the appropriate action to convert it into a type that makes sense to the user. If you don't, then at least if you have the string in its bytes form you can re-present it to the filesystem to manipulate the file. What are we supposed to do with the "special type"? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com