I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy talk. And for the other two, perhaps it would make more sense to have a separate encoding-guessing function that takes a binary stream and returns a text stream wrapping it with the proper encoding?
--Guido On Thu, Jan 7, 2010 at 4:10 PM, Victor Stinner <victor.stin...@haypocalc.com> wrote: > Hi, > > Builtin open() function is unable to open an UTF-16/32 file starting with a > BOM if the encoding is not specified (raise an unicode error). For an UTF-8 > file starting with a BOM, read()/readline() returns also the BOM whereas the > BOM should be "ignored". > > See recent issues related to reading an UTF-8 text file including a BOM: #7185 > (csv) and #7519 (ConfigParser). Such file can be opened in unicode mode with > the UTF-8-SIG encoding, but it's possible to do better. > > I propose to improve open() (TextIOWrapper) by using the BOM to choose the > right encoding. I think that only files opened in read only mode should > support this new feature. *Read* the BOM in a *write* only file would cause > unexpected behaviours. > > Since my proposition changes the result TextIOWrapper.read()/readline() for > files starting with a BOM, we might introduce an option to open() to enable > the new behaviour. But is it really needed to keep the backward compatibility? > > I wrote a proof of concept attached to the issue #7651. My patch only changes > the behaviour of TextIOWrapper for reading files starting with a BOM. It > doesn't work yet if a seek() is used before the first read. > > -- > Victor Stinner > http://www.haypocalc.com/ > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com