> On Thu, Jan 7, 2010 at 4:10 PM, Victor Stinner > <victor.stin...@haypocalc.com> wrote: >> Hi, >> >> Builtin open() function is unable to open an UTF-16/32 file starting with a >> BOM if the encoding is not specified (raise an unicode error). For an UTF-8 >> file starting with a BOM, read()/readline() returns also the BOM whereas the >> BOM should be "ignored". >> [...] >
I had similar issues too (please read below ;o) ... On Thu, Jan 7, 2010 at 7:52 PM, Guido van Rossum <gu...@python.org> wrote: > I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy > talk. And for the other two, perhaps it would make more sense to have > a separate encoding-guessing function that takes a binary stream and > returns a text stream wrapping it with the proper encoding? > About guessing the encoding, I experienced this issue while I was developing a Trac plugin. What I was doing is as follows : - I guessed the MIME type + charset encoding using Trac MIME API (it was a CSV file encoded using UTF-16) - I read the file using `open` - Then wrapped the file using `codecs.EncodedFile` - Then used `csv.reader` ... and still get the BOM in the first value of the first row in the CSV file. {{{ #!python >>> mimetype 'utf-16-le' >>> ef = EncodedFile(f, 'utf-8', mimetype) }}} IMO I think I am +1 for leaving `open` just like it is, and use module `codecs` to deal with encodings, but I am strongly -1 for returning the BOM while using `EncodedFile` (mainly because encoding is explicitly supplied in ;o) > --Guido > CMIIW anyway ... -- Regards, Olemis. Blog ES: http://simelo-es.blogspot.com/ Blog EN: http://simelo-en.blogspot.com/ Featured article: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com