Adam Olsen wrote: > On Tue, Sep 30, 2008 at 3:43 PM, Nick Coghlan <[EMAIL PROTECTED]> wrote: >> Of the suggestions I've seen so far, I like Marcin's Mono-inspired >> NULL-escape codec idea the best. Since these strings all come from parts >> of the environment where NULLs are not permitted, a simple "'\0' in >> text" check will immediately identify any strings where decoding failed >> (for applications which care about the difference and want to try to do >> better), while applications which don't care will receive perfectly >> valid Python strings that can be passed around and manipulated as if the >> decoding error never happened. > > It avoids the technical problems, but it's still magical behaviour > that users have to learn, whereas bytes/unicode polymorphism uses the > distinctions you should already know about. > > There's also a problem of how to turn it on. I'm against > automatically Python changing the filesystem encoding, no matter how > well intentioned. Better to let the app do that, which is easy and > could be done for all apps (not just python!) if someone defined a > libc encoding of "null-escaped UTF-8". > > On the whole I'm only -0 on it (compared to -1 for UTF-8b).
For the decoding side, you wouldn't need to do it as a codec - you could do it as a 'nullescape' error handler (since NULLs can't be present in the byte sequences being decoded, there is no need to worry about escaping anything when decoding is successful). Converting those NULL escaped strings back into something the filesystem can understand would obviously need a custom codec though, but some kind of application level handling of bad filenames is going to be needed no matter how we deal with bad encoding on the input side. That said, I don't think this is something we (or, more to the point, Guido) need to make a decision on right now - for 3.0, having bytes-level APIs that can see everything, and Unicode APIs that ignore badly encoded filenames is worth trying. If it proves inadequate, then we can revisit the idea of some kind of implicit escaping mechanism in the Unicode APIs for 3.1 when there is more time for a proper PEP. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com