Adam Olsen wrote:
On 2/14/06, Just van Rossum <[EMAIL PROTECTED]> wrote:
  
+1 for two functions.

My choice would be open() for binary and opentext() for text. I don't
find that backwards at all: the text function is going to be more
different from the current open() function then the binary function
would be since in many ways the str type is closer to bytes than to
unicode.

Maybe it's even better to use opentext() AND openbinary(), and deprecate
plain open(). We could even introduce them at the same time as bytes()
(and leave the open() deprecation for 3.0).
    

Thus providing us with a transition period, even with warnings on use
of the old function.
  
[snip..]

I personally like the move towards all unicode strings, basically any text where you don't know the encoding used is 'random binary data'. This works fine, so long as you are in control of the text source. *However*, it leaves the following problem :

The current situation (treating byte-sequences as text and assuming they are an ascii-superset encoded text-string) *works* (albeit with many breakages), simply because this assumption is usually correct.

Forcing the programmer to be aware of encodings, also pushes the same requirement onto the user (who is often the source of the text in question).

Currently you can read a text file and process it - making sure that any changes/requirements only use ascii characters. It therefore doesn't matter what 8 bit ascii-superset encoding is used in the original. If you force the programmer to specify the encoding in order to read the file, they would have to pass that requirement onto their user. Their user is even less likely to be encoding aware than the programmer.

What this means, is that for simple programs where the programmer doesn't want to have to worry about encoding, or can't force the user to be aware, they will read in the file as bytes. Modules will quickly and inevitably be created implementing all the 'string methods' for bytes. New programmers will gravitate to these and the old mess will continue, but with a more awkward hybrid than before. (String manipulations of byte sequences will no longer be a core part of the language - and so be harder to use.)

Not sure what we can do to obviate this of course... but is this change actually going to improve the situation or make it worse ?

All the best,

Michael Foord
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to