On 6/2/2014 7:10 AM, Robin Becker wrote:
there seems to be an implicit assumption in python land that encoded
strings are the norm.
I don't know why you say that. To have a stream of bytes interpreted as
characters, open in text mode and give the encoding. Otherwise, open in
binary mode and apply whatever encoding you want. Image programs like
Pil or Pillow assume that bytes have image encodings. Same idea.
> On virtually every computer I encounter that assumption is wrong.
Except for the std streams (see below), it is also not part of Python.
I will just point out that bytes are given meaning by encoding meaning
into them. Unicode attempts to reduce the hundreds of text encodings to
just a few, and mostly to just one for external storage and transmission.
In python I would have preferred for bytes to remain the default io
Do you really think that defaulting the open mode to 'rb' rather than
'rt' would be a better choice for newbies?
mechanism, at least that would allow me to decide if I need any decoding.
Assuming that 'rb' is actually needed more than 'rt' for you in
particular, is it really such a burden to give a mode more often than not?
As the cat example
showed these extra assumptions are sometimes really in the way.
This example is *only* about the *pre-opened* stdxyz streams. Python
uses these to read characters from the keyboard and print characters to
the screen in input, print, and the interactive interpreter. So they are
open in text mode (which wraps binary read and write). The developers,
knowing that people can and do write batch mode programs that avoid
input and print, gave a documented way to convert the streams back to
binary. (See the sys doc.)
The issue Armin ran into is this. He write a library module that makes
sure the streams are binary. Someone else does the same. A program
imports both modules, in either order. The conversion method referenced
above raises an exception if one attempt to convert an already converted
stream. Much of the extra code Armin published detects whether the steam
is already binary or needs conversion.
The obvious solution is to enhance the conversion method so that one may
say 'convert is needed, otherwise just pass'.
Terry Jan Reedy