there seems to be an implicit assumption in python land that encoded strings are
the norm. On virtually every computer I encounter that assumption is wrong. The
vast majority of bytes in most computers is not something that can be easily
printed out for humans to read. I suppose some clever pythonista can figure out
an encoding to read my .o / .so etc files, but they are practically meaningless
to a unicode program today. Same goes for most image formats and media files.
Browsers routinely encounter mis/un-encoded pages.
I probably should have mentioned it, but in my case it's not even Python
(Java). It's exactly the same principal - an assumption was made that has
become entrenched due to the fear of breakage. If they'd been forced to
think about encodings up-front, it shouldn't have been an issue, which was
the point I was trying to make.
In Java, it's much worse. At least with Python you can perform string-like
operations on bytes. In Java you have to convert it to characters before
you can really do anything with it, so people just use the default encoding
all the time - especially if they want the convenience of line-by-line
reading using BufferedReader ...
In python I would have preferred for bytes to remain the default io mechanism,
at least that would allow me to decide if I need any decoding.
As the cat example
showed these extra assumptions are sometimes really in the way.