On Sep 28, 2008, at 7:21 PM, Gregory P. Smith wrote:

On Sun, Sep 28, 2008 at 2:13 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
"broken" systems will always exist.  Code to deal with them must be
possible to write in python 3.0.

Python 3.0 will have bugs. This might just be one of them. I can agree that Python 3.x will need to support that somehow, but perhaps not 3.0.

Regards,
Martin

Agreed.  At this point I think we just need to get 3.0 out there and
be willing to fix flaws like this for 3.1 or in some cases for 3.0.1.

This problem sure would be "practically" solved simply by switching the way the filesystemencoding is selected. You'll note that if you want things to Just Work for a backup tool with today's Py3k, all you need to do is switch the filesystem encoding to iso-8859-1. In that encoding, every byte string has an associated unique unicode string, so there's no problem with any possible filename.

With that in mind, here's my proposal:
a) Whenever ASCII would be selected as a filesystem encoding, use iso-8859-1 instead. a) Whenever UTF-8 would be selected as a filesystem encoding, use UTF-8b [1] instead.

It's clearly not a 100% perfect solution, but it completely solves the issue for users with the most popular filesystem encodings: ASCII, iso-8859-1, and UTF-8. IMO, that's good enough to just leave things there.

But even if it's deemed not good enough, and the byte-string level file access APIs are all implemented, I *still* think doing the above is a good idea. It makes unicode string file/environment/argv access work in a huge majority of cases: a) windows always, b) Mac OS X always, c) ASCII locale always, d) ISO-8859-1 locale always, e) UTF-8 locale always, f) other locales when the filenames really are encoded in their locale.

It will make users happy, and it's simple enough to implement for python 3.0.


James

[1] UTF-8b has a similar property to 8859-1, in that all byte strings can be successfully round-tripped. It's not currently implemented in python core, but it's a pretty trivial encoding, and is available under the BSD license, see below.

Background:
http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html

Blog post:
http://bsittler.livejournal.com/10381.html

Implementation for python:
http://hyperreal.org/~est/libutf8b/

James
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to