On approximately 4/27/2009 12:55 AM, came the following characters from
the keyboard of Cameron Simpson:
On 26Apr2009 23:39, Glenn Linderman <v+pyt...@g.nevcal.com> wrote:
[...snip...]
There are still issues regarding how Windows and POSIX programs that are
sharing cross-mounted file systems might communicate file names between
each other, which is not at all clear from the PEP. If this is an
insoluble or un-addressed issue, it should be stated. (It is probably
insoluble, due to there being multiple ways that the cross-mounted file
systems might translate names; but if there are, can we learn something
from the rules the mounting systems use, to be compatible with (one of)
them, or not.
I'd say that's out of scope. A windows filesystem mounted on a UNIX host
should probably be mounted with a mapping to translate the Windows
Unicode names into whatever the sysadmin deems the locally most apt
byte encoding. But sys.getfilesystemencoding() is based on the current user's
locale settings, which need not be the same.
And if it were, what would it do with files that can't be encoded with
the locally most apt byte encoding? That's where we might learn
something about what behaviors are deemed acceptable. Would such files
be inaccessible? Accessible with mangled names? or what?
And for a Unix filesystem mounted on a Windows host? Or accessed via
some network connection?
Together with your change to avoid using PUA characters, and the rule
suggested by MRAB in another branch of this thread, of treating
half-surrogates as invalid byte sequences may avoid the data puns I'm
concerned about.
It is not clear how half-surrogate characters would be displayed, when
the user prints or displays such a file name string. It would seem that
programs that display file names to users might still have issues with
such; an escaping mechanism that uses displayable characters would have
an advantage there.
Wouldn't any escaping mechanism that uses displayable characters
require visually mangling occurences of those characters that
legitimately occur in the original?
Yes. My suggested use of ? is a visible character that is illegal in
Windows file names, thus causing no valid Windows file names to be
visually mangled. It is also a character that should be avoided in
POSIX names because:
1) it is known to be illegal on Windows, and thus non-portable
2) it is hard to write globs that match ? without allowing matches of
other characters as well
3) it must be quoted to specify it on a command line
That said, someone provided a case where it is "easy" to get ? in POSIX
file names. The remaining question is whether that is a reasonable use
case, a frequent use case, or a stupid use case; and whether the
resulting visible mangling is more or less understandable and disruptive
than using half-surrogates which are:
1) invalid Unicode
2) non-displayable
3) indistinguishable using normal non-displayable character substitution
rules
--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com