On Sep 29, 2008, at 11:11 PM, Stephen J. Turnbull wrote:

Except...that one over there. That's the whole point of UTF-8b:
correctly encoded names get decoded correctly and readably, and the
other cases get decoded into something unique that cannot possibly
conflict.

Sure.  But there are lots of other operations besides encoding and
decoding that we do with filenames.  How do you display a filename?
How about concatenating them to make paths?  What do you do when you
want to mix a filename with other, well-formed strings?  If you keep
the filenames internally in UTF-8b, you're going to need what amounts
to a whole string API for dealing with them, aren't you?  If you're
not doing that, how is UTF-8b represented?

No, you keep the filenames internally in a PyUnicode object. All that stuff *works* in Python today, with a UTF-8b decoded string.

Displaying a filename is encoding it into some other encoding. Like this:
>>> '\x90\x90'.decode('utf-8b')
u'\udc90\udc90'
>>> u'\udc90\udc90'.encode('utf-8')
'\xed\xb2\x90\xed\xb2\x90'

So, that seems to work okay. Maybe I should try to display that in a web browser. Shows up as 2 "unknown character" glyphs. Perfect.

If you want to mix a filename with other strings, you append them together, or use os.path, same as always. You don't need any new string API.

Since from what I've tried, things seem to work, I'd really like to know what precisely does fail from the opponents of utf-8b.

And again: if utf-8b isn't acceptable, because it does break things in some unknown-to-me way, I really can't imagine anything working but just going back to byte-string access as the only API. It's really not okay for the "obvious" APIs to be totally broken by unexpected input. Think os.getcwd(), sys.argv, os.environ. You can't just ignore bad files and call it done.

James
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to