On Wed, Jun 4, 2014 at 2:34 AM, Steven D'Aprano
> Outside of those three kinds of files, I would expect that *by far* the
> single largest kind of file is text. Some text is wrapped in a binary
> layer, e.g. .doc, .odt, etc. but an awful lot of it is good old human
> readable text, including web pages (html) and XML.
In terms of file I/O in Python, text wrapped in a binary layer has to
be treated as binary, not text. There's no difference between a JPEG
file that has some textual EXIF information and an ODT file that's a
whole lot of zipped up text; both of them have to be read as binary,
then unpacked according to the container's specs, and then the text
portion decoded according to an encoding like UTF-8.
But you're quite right that a large proportion of files out there
really are text.