Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

James Y Knight Wed, 01 Oct 2008 15:15:05 -0700


On Oct 1, 2008, at 3:03 PM, Glenn Linderman wrote:

On approximately 10/1/2008 11:30 AM, came the following charactersfrom the keyboard of James Y Knight:
BTW, Windows will cheerfully let you create and access files with"garbage surrogates" in it.
Try it yourself:

open(u"\ud8fd", 'w').close()
os.listdir(u'.')
But Windows doesn't have the problem of non-Unicode sequencesneeding to be translated to something else in the first place. Sothis is mostly irrelevant to the problem at hand.

Well...either you consider lone surrogates as valid Unicode sequences,or else Windows *does* have the problem of non-Unicode sequencesneeding to be translated to something else.

Currently, the answer is that lone surrogates are treated as validUnicode, and allowed into Python via the windows file APIs. Thus,filename strings in Python are going to have lone surrogates, anyways,on Windows.

Therefore, any external library which freaks out upon seeing a lonesurrogate is already going to be broken for some filenames on Windows.So, it seems to me, converting invalid UTF-8 sequences into lonesurrogates for Unix doesn't actually add any new form of brokenness.So why not just do that?

So, I'm back to favoring the lone surrogate plan over the U+0000plan. But either one seems better than the alternatives.
The original byte string must be preserved for use in actuallyopening files.


Or reversibly transformed.

How it is displayed is another question. Doing something that worksfor both Unicode display and access to the file is basicallyimpossible in all cases. Providing an encapsulation of the bytestring that has display methods, together with new methods totransform the file path, and use parts of it to create other filepaths, is the solution I described earlier.

This sounds like a fine solution. And it would work just as well witha UTF-8b base API as with a dual string/byte string base API. The onlydifference is what the default behavior for people who don't use yournew fancy API is. In the UTF-8b case, most things would work, evenwith invalidly-encoded filenames.


James
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

Reply via email to