Daniel Blanchard added the comment:

Thanks for straightening me out there! I had not noticed this in the Unicode 
FAQ before:

>  Where the data has an associated type, such as a field in a database, a BOM 
> is unnecessary. In particular, if a text data stream is marked as UTF-16BE, 
> UTF-16LE, UTF-32BE or UTF-32LE, a BOM is neither necessary nor permitted. Any 
> U+FEFF would be interpreted as a ZWNBSP.

Anyway, the thing that brought this up is that in chardet we detect codecs of 
files for people and we've been returning UTF-16BE or UTF-16LE when we detect 
the BOM at the front of the file, but we recently learned that if people tried 
to decode with those codecs things don't work as expected.  It seems the 
correct behavior in our case is to just return UTF-16 in these cases.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25325>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to