On 9/15/06, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
Le vendredi 15 septembre 2006 à 10:48 -0700, Josiah Carlson a écrit :
> This is one of the reasons why I was talking Latin-1, UCS-2, and UCS-4:

You could replace "latin-1" with "one-byte system encoding chosen at
interpreter startup depending on locale".
There are lots of 8-bit encodings other than iso-8859-1.
(for example, my current locale uses iso-8859-15)

The algorithm for choosing the one-byte encoding could be:
- if the current locale uses an one-byte encoding, use that encoding
- otherwise, if current locale language has a popular one-byte encoding
(for many languages this would mean iso-8859-<X>), use that encoding
- otherwise, no one-byte encoding

This would ensure that, for example, Russian text on a system configured
with a Russian locale does not always end up using two bytes per
character internally.

I do not believe that this extra complexity will be valuable in the long-term because most Europeans will switch to UTF-8 locales over the next five years. The current situation makes no sense. Think about it from the end-user's point of view:

"You can use KOI8-R/ISO-8859-? or UTF-8.

Pro for KOI8-R:

1. text files will use 0.8% instead of 1% of your hard disk space.
2. backwards compatibility

Pro for UTF-8:

1. Better compatibility with new software
2. Easier to share files across geographic boundaries
3. Ability to encode characters from other character sets
4. Access to characters like smart quotes, wingdings, fractions and so forth.
"

The result seems obvious to me...8-bit-fixed encodings are a terrible idea and need to just go away. Let's not build them into Python's core on the basis of a minor and fleeting performance improvement.

 Paul Prescod

_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to