On Sep 30, 2008, at 10:06 PM, [EMAIL PROTECTED] wrote:
However, Martin, I can promise you that I will _never_ ask for any
convenience functions related to bytes as a result of this
decision. I want bytes to come back from filesystem APIs because I
intend to have a wrapper layer which knows two things about the
file: the bytes (which are needed to talk to POSIX filesystem APIs)
and the characters (which are computed from those bytes, can be
safely renormalized, displayed to users, etc). On Windows this
filesystem wrapper will necessarily behave differently, and will not
use bytes for anything. Any formatting beyond joining path segments
together and possibly splitting extensions off will be done on
character strings, not byte strings.
Can you clarify what proposal you are supporting for Python:
1) Two sets of APIs, one returning unicode strings, and one returning
bytestrings. (subpoints: what does the unicode-returning API do when
it cannot decode the bytestring into unicode? raise exception, pretend
argument/envvar/file didn't exist/?)
or
2) All APIs return bytestrings only. Converting to unicode is
considered lossy, and would have to be done by applications for
display purposes only.
I really don't understand the reasoning for (1). It seems to me that
most software (probably including all of the Python stdlib) would
continue to use the unicode string API. Switching all of the Python
stdlib to use the bytestring APIs instead would certainly be a large
undertaking, and would have all sorts of ripple-on API changes (e.g.
__file__). So I can only imagine that if you're proposing (1), you're
doing so without the intention of suggesting that Python be converted
to use it.
And so, of course, that doesn't really fix things (such as getcwd
failing if your cwd is a path that is undecodeable in the current
locale, or well, currently, python refusing to even start).
If you're proposing (2), it's at least as large an undertaking as (1)
+ converting Python to use the optional bytestring APIs. But at least
it avoids exposing an API that people ought not use, and does make it
obvious what still needs to be fixed: the unfixed code simply won't
run at all.
The proposal of using U+0000 seems like it would have been almost
the same from such a wrapper's perspective, except (A) people using
the filesystem APIs without the benefit of such a wrapper would have
been even more screwed
I'm not sure what your "more screwed" is comparing against: current
py3k behavior? (aka: decoding to Unicode in locale's specified
encoding)? I don't see how you can really be more screwed than that:
not only can't you send your filename to display in a Gtk+ button, you
can't access it at all, even staying within python.
and (B) there are a few nasty corner-cases when dealing with
surrogate (i.e. invalid, in UTF-8) code points which I'm not quite
sure what it would have done with.
The lone-surrogate-pair proposal was a totally different proposal than
the U+0000 one.
James
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe:
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com