On Tue, Sep 30, 2008 at 9:20 AM, James Y Knight <[EMAIL PROTECTED]> wrote: > > On Sep 29, 2008, at 11:11 PM, Stephen J. Turnbull wrote: > >>> Except...that one over there. That's the whole point of UTF-8b: >>> correctly encoded names get decoded correctly and readably, and the >>> other cases get decoded into something unique that cannot possibly >>> conflict. >> >> Sure. But there are lots of other operations besides encoding and >> decoding that we do with filenames. How do you display a filename? >> How about concatenating them to make paths? What do you do when you >> want to mix a filename with other, well-formed strings? If you keep >> the filenames internally in UTF-8b, you're going to need what amounts >> to a whole string API for dealing with them, aren't you? If you're >> not doing that, how is UTF-8b represented? > > No, you keep the filenames internally in a PyUnicode object. All that stuff > *works* in Python today, with a UTF-8b decoded string. > > Displaying a filename is encoding it into some other encoding. Like this: >>>> '\x90\x90'.decode('utf-8b') > u'\udc90\udc90' >>>> u'\udc90\udc90'.encode('utf-8') > '\xed\xb2\x90\xed\xb2\x90' > > So, that seems to work okay. Maybe I should try to display that in a web > browser. Shows up as 2 "unknown character" glyphs. Perfect.
Well browsers are of course the epitome of lenient parsing. Try incorporating one of these things to an XML file and see if standard-conforming XML product likes it. > If you want to mix a filename with other strings, you append them together, > or use os.path, same as always. You don't need any new string API. > > Since from what I've tried, things seem to work, I'd really like to know > what precisely does fail from the opponents of utf-8b. Another problem I have with UTF-8b is its lack of standardization. > And again: if utf-8b isn't acceptable, because it does break things in some > unknown-to-me way, I really can't imagine anything working but just going > back to byte-string access as the only API. It's really not okay for the > "obvious" APIs to be totally broken by unexpected input. Think os.getcwd(), > sys.argv, os.environ. You can't just ignore bad files and call it done. Actually that is what you *have* to do with the filesystem-as-a-black-box model. Filesystems reserve the right to fail occasionally and there's nothing you can do to prevent it -- it would be unacceptable if the entire disk would stop working because it had one bad block (unless the bad block is in some kind of master table) so you just have to deal with it, and you can't wish the problems away by insisting on a perfect abstraction. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com