Martin v. Löwis wrote: > Nicholas Bastin wrote: > >>It would be nice if you could optionally specify that the codec would >>assume UTF-16BE if no BOM was present, and not raise UnicodeError in >>that case, which would preserve the current behaviour as well as allow >>users' to ask for behaviour which conforms to the standard. > > > Alternatively, the UTF-16BE codec could support the BOM, and do > UTF-16LE if the "other" BOM is found.
That would violate the Unicode standard - the BOM character for UTF-16-LE and -BE must be interpreted as ZWNBSP. > This would also support your usecase, and in a better way. The > Unicode assertion that UTF-16 is BE by default is void these > days - there is *always* a higher layer protocol, and it more > often than not specifies (perhaps not in English words, but > only in the source code of the generator) that the default should > by LE. I've checked the various versions of the Unicode standard docs: it seems that the quote you have was silently introduced between 3.0 and 4.0. Python currently uses version 3.2.0 of the standard and I don't think enough people are aware of the change in the standard to make a case for dropping the exception raising in the case of a UTF-16 finding a stream without a BOM mark. By the time we switch to 4.1 or later, we can then make the change in the native UTF-16 codec as you requested. Personally, I think that the Unicode consortium should not have introduced a default for the UTF-16 encoding byte order. Using big endian as default in a world where most Unicode data is created on little endian machines is not very realistic either. Note that the UTF-16 codec starts reading data in the machines native byte order and then learns a possibly different byte order by looking for BOMs. Implementing a codec which implements the 4.0 behavior is easy, though. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 07 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com