Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

Henning von Bargen Sun, 10 Jan 2010 03:11:48 -0800

If Python should support BOM when reading text files,
it should also be able to *write* such files.


An encoding="BOM" argument wouldn't help here, because
it does not specify which encoding to use actually:
UFT-8, UTF-16-LE or what?

That would be a point against encoding="BOM" and
pro an additional keyword argument "use_bom" or whatever
with the following values:

None: default (old) behaviour: don't handle BOM at all

True: reading: expect BOM (raising an exception if it's
               missing). The encoding argument must be None
               or it must match the encoding implied by the
               BOM
      writing: write a BOM. The encoding argument must be
               one of the UTF encodings.
False: reading: If a BOM is present, use it to determine the
               file encoding. The encoding argument must
               be None or it must match the encoding implied by
               the BOM. (*)
               Otherwise, use the encoding argument to determine
               the encoding.
       writing: do not write a BOM. Use the encoding argument.

(*) This is a question of taste. I think some people would prefer
    a fourth value "AUTO" instead, or to swap the behaviour of
    None and False.

Henning

P.S. To make things worse, I have sometimes seen XML files with a
UTF-8 BOM, but an XML encoding declaration of "iso-8859-1".
For such files, whatever you guess will be wrong anyway...
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

Reply via email to