On Wed, Jun 4, 2014 at 2:34 AM, Steven D'Aprano
<steve+comp.lang.pyt...@pearwood.info> wrote:
> Outside of those three kinds of files, I would expect that *by far* the
> single largest kind of file is text. Some text is wrapped in a binary
> layer, e.g. .doc, .odt, etc. but an awful lot of it is good old human
> readable text, including web pages (html) and XML.

In terms of file I/O in Python, text wrapped in a binary layer has to
be treated as binary, not text. There's no difference between a JPEG
file that has some textual EXIF information and an ODT file that's a
whole lot of zipped up text; both of them have to be read as binary,
then unpacked according to the container's specs, and then the text
portion decoded according to an encoding like UTF-8.

But you're quite right that a large proportion of files out there
really are text.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to