I probably should have mentioned it, but in my case it's not even Python
(Java). It's exactly the same principal - an assumption was made that has
become entrenched due to the fear of breakage. If they'd been forced to
think about encodings up-front, it shouldn't have been an issue, which was
the point I was trying to make.

there seems to be an implicit assumption in python land that encoded strings are the norm. On virtually every computer I encounter that assumption is wrong. The vast majority of bytes in most computers is not something that can be easily printed out for humans to read. I suppose some clever pythonista can figure out an encoding to read my .o / .so etc files, but they are practically meaningless to a unicode program today. Same goes for most image formats and media files. Browsers routinely encounter mis/un-encoded pages.

In Java, it's much worse. At least with Python you can perform string-like
operations on bytes. In Java you have to convert it to characters before
you can really do anything with it, so people just use the default encoding
all the time - especially if they want the convenience of line-by-line
reading using BufferedReader ...

In python I would have preferred for bytes to remain the default io mechanism, at least that would allow me to decide if I need any decoding.

As the cat example


showed these extra assumptions are sometimes really in the way.
Robin Becker


Reply via email to