On approximately 4/25/2009 5:22 AM, came the following characters from the keyboard of Martin v. Löwis:
The problem with this, and other preceding schemes that have been
discussed here, is that there is no means of ascertaining whether a
particular file name str was obtained from a str API, or was funny-
decoded from a bytes API... and thus, there is no means of reliably
ascertaining whether a particular filename str should be passed to a
str API, or funny-encoded back to bytes.

Why is it necessary that you are able to make this distinction?


It is necessary that programs (not me) can make the distinction, so that it knows whether or not to do the funny-encoding or not. If a name is funny-decoded when the name is accessed by a directory listing, it needs to be funny-encoded in order to open the file.


Picking a character (I don't find U+F01xx in the
Unicode standard, so I don't know what it is)

It's a private use area. It will never carry an official character
assignment.


I know that U+F0000 - U+FFFFF is a private use area. I don't find a definition of U+F01xx to know what the notation means. Are you picking a particular character within the private use area, or a particular range, or what?


As I realized in the email-sig, in talking about decoding corrupted
headers, there is only one way to guarantee this... to encode _all_
character sequences, from _all_ interfaces.  Basically it requires
reserving an escape character (I'll use ? in these examples -- yes, an
ASCII question mark -- happens to be illegal in Windows filenames so
all the better on that platform, but the specific character doesn't
matter... avoiding / \ and . is probably good, though).

I think you'll have to write an alternative PEP if you want to see
something like this implemented throughout Python.


I'm certainly not experienced enough in Python development processes or internals to attempt such, as yet. But somewhere in 25 years of programming, I picked up the knowledge that if you want to have a 1-to-1 reversible mapping, you have to avoid data puns, mappings of two different data values into a single data value. Your PEP, as first written, didn't seem to do that... since there are two interfaces from which to obtain data values, one performing a mapping from bytes to "funny invalid" Unicode, and the other performing no mapping, but accepting any sort of Unicode, possibly including "funny invalid" Unicode, the possibility of data puns seems to exist. I may be misunderstanding something about the use cases that prevent these two sources of "funny invalid" Unicode from ever coexisting, but if so, perhaps you could point it out, or clarify the PEP. I'll try to reread it again... could you post a URL to the most up-to-date version of the PEP, since I haven't seen such appear here, and the version I found via a Google search seems to be the original?


--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to