On Mon, 02 Jun 2014 12:10:48 +0100, Robin Becker wrote: > there seems to be an implicit assumption in python land that encoded > strings are the norm. On virtually every computer I encounter that > assumption is wrong. The vast majority of bytes in most computers is not > something that can be easily printed out for humans to read. I suppose > some clever pythonista can figure out an encoding to read my .o / .so > etc files, but they are practically meaningless to a unicode program > today. Same goes for most image formats and media files. Browsers > routinely encounter mis/un-encoded pages.
If you include image, video and sound files, you are probably correct that most content of files is binary. Outside of those three kinds of files, I would expect that *by far* the single largest kind of file is text. Some text is wrapped in a binary layer, e.g. .doc, .odt, etc. but an awful lot of it is good old human readable text, including web pages (html) and XML. Every programming language I know of defaults to opening files in text mode rather than binary mode. There may be exceptions, but reading and writing text is ubiquitous while writing .o and .so files is not. > In python I would have preferred for bytes to remain the default io > mechanism, at least that would allow me to decide if I need any > decoding. That implies that you're opening files in binary mode by default. It also implies that even something as trivial as writing the string "Hello World" to a file (stdout is a file) is impossible until you've learned about encodings and know which encoding you need. I really don't think that's a good plan, for any language, but especially a language like Python which is intended for beginners as well as experts. The Python 2 approach, where stdout in binary but tries really hard to pretend to be a superset of ASCII, is simply broken. It works well for trivial examples, while breaking in surprising and hard-to-diagnose ways in others. It violates the Zen, errors should not be ignored unless explicitly silenced, instead silently failing and giving moji-bake: [steve@ando ~]$ python2.7 -c "import sys; sys.stdout.write(u'ñβж\n')" ñβж Changing to print doesn't help: [steve@ando ~]$ python2.7 -c "print u'ñβж'" ñβж Python 3 works correctly, whether you use print or sys.stdout: [steve@ando ~]$ python3.3 -c "import sys; sys.stdout.write(u'ñβж\n')" ñβж (although I haven't tested it on Windows). -- Steven D'Aprano http://import-that.dreamwidth.org/ -- https://mail.python.org/mailman/listinfo/python-list