[issue7651] Python3: guess text file charset using the BOM

STINNER Victor Thu, 07 Jan 2010 15:20:09 -0800

STINNER Victor <[email protected]> added the comment:

open_bom.patch is the proof of concept. It only works in read mode. The idea is 
to delay the creation of the encoding and the decoder. We wait for just after 
the first read_chunk().


The patch changes the default behaviour of open(): if the file starts with a 
BOM, the BOM is used but skipped. Example:
-------------
from _pyio import open

with open('test.txt', 'w', encoding='utf-8-sig') as fp:
    print("abc", file=fp)
    print("d\xe9f", file=fp)

with open('test.txt', 'r') as fp:
    print("open().read(): {!r}".format(fp.read()))
-------------

Unpatched Python displays '\ufeffabc\ndéf\n', whereas patched Python displays 
'abc\ndéf\n'.

----------
keywords: +patch
Added file: http://bugs.python.org/file15782/open_bom.patch

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue7651>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7651] Python3: guess text file charset using the BOM

Reply via email to