On Sep 28, 2008, at 7:21 PM, Gregory P. Smith wrote:
On Sun, Sep 28, 2008 at 2:13 PM, "Martin v. Löwis"
<[EMAIL PROTECTED]> wrote:
"broken" systems will always exist. Code to deal with them must be
possible to write in python 3.0.
Python 3.0 will have bugs. This might just be one of them. I can
agree
that Python 3.x will need to support that somehow, but perhaps not
3.0.
Regards,
Martin
Agreed. At this point I think we just need to get 3.0 out there and
be willing to fix flaws like this for 3.1 or in some cases for 3.0.1.
This problem sure would be "practically" solved simply by switching
the way the filesystemencoding is selected. You'll note that if you
want things to Just Work for a backup tool with today's Py3k, all you
need to do is switch the filesystem encoding to iso-8859-1. In that
encoding, every byte string has an associated unique unicode string,
so there's no problem with any possible filename.
With that in mind, here's my proposal:
a) Whenever ASCII would be selected as a filesystem encoding, use
iso-8859-1 instead.
a) Whenever UTF-8 would be selected as a filesystem encoding, use
UTF-8b [1] instead.
It's clearly not a 100% perfect solution, but it completely solves the
issue for users with the most popular filesystem encodings: ASCII,
iso-8859-1, and UTF-8. IMO, that's good enough to just leave things
there.
But even if it's deemed not good enough, and the byte-string level
file access APIs are all implemented, I *still* think doing the above
is a good idea. It makes unicode string file/environment/argv access
work in a huge majority of cases: a) windows always, b) Mac OS X
always, c) ASCII locale always, d) ISO-8859-1 locale always, e) UTF-8
locale always, f) other locales when the filenames really are encoded
in their locale.
It will make users happy, and it's simple enough to implement for
python 3.0.
James
[1] UTF-8b has a similar property to 8859-1, in that all byte strings
can be successfully round-tripped. It's not currently implemented in
python core, but it's a pretty trivial encoding, and is available
under the BSD license, see below.
Background:
http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html
Blog post:
http://bsittler.livejournal.com/10381.html
Implementation for python:
http://hyperreal.org/~est/libutf8b/
James
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe:
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com