Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

Adam Olsen Wed, 01 Oct 2008 15:42:13 -0700

On Wed, Oct 1, 2008 at 4:14 PM, James Y Knight <[EMAIL PROTECTED]> wrote:
> On Oct 1, 2008, at 3:03 PM, Glenn Linderman wrote:
>> On approximately 10/1/2008 11:30 AM, came the following characters from
>> the keyboard of James Y Knight:
>>>
>>> BTW, Windows will cheerfully let you create and access files with
>>> "garbage surrogates" in it.
>>> Try it yourself:
>>>
>>> open(u"\ud8fd", 'w').close()
>>> os.listdir(u'.')
>>
>> But Windows doesn't have the problem of non-Unicode sequences needing to
>> be translated to something else in the first place.  So this is mostly
>> irrelevant to the problem at hand.
>
>
> Well...either you consider lone surrogates as valid Unicode sequences, or
> else Windows *does* have the problem of non-Unicode sequences needing to be
> translated to something else.
>
> Currently, the answer is that lone surrogates are treated as valid Unicode,
> and allowed into Python via the windows file APIs. Thus, filename strings in
> Python are going to have lone surrogates, anyways, on Windows.


We allow lone surrogates into our unicode objects, but they aren't
valid Unicode.  They'll fail for any APIs that expect only valid
Unicode.


> Therefore, any external library which freaks out upon seeing a lone
> surrogate is already going to be broken for some filenames on Windows. So,
> it seems to me, converting invalid UTF-8 sequences into lone surrogates for
> Unix doesn't actually add any new form of brokenness. So why not just do
> that?

I see it the opposite: lone surrogates on windows should be rejected
from unicode APIs, just as we want to do for invalid UTF-8 on linux.

But since the same rationale for having a "raw" API applies, maybe the
windows byte APIs should expose raw UTF-16, rather than letting it be
translated?


-- 
Adam Olsen, aka Rhamphoryncus
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

Reply via email to