Glenn Linderman wrote:
On approximately 1/8/2010 3:59 PM, came the following characters from
the keyboard of Victor Stinner:
Hi,
Thanks for all the answers! I will try to sum up all ideas here.
One concern I have with this implementation encoding="BOM" is that if
there is no BOM it assumes UTF-8. That is probably a good assumption in
some circumstances, but not in others.
* It is not required that UTF-16LE, UTF-16BE, UTF-32LE, or UTF-32BE
encoded files include a BOM. It is only required that UTF-16 and UTF-32
(cases where the endianness is unspecified) contain a BOM. Hence, it
might be that someone would expect a UTF-16LE (or any of the formats
that don't require a BOM, rather than UTF-8), but be willing to accept
any BOM-discriminated format.
* Potentially, this could be expanded beyond the various Unicode
encodings... one could envision that a program whose data files
historically were in any particular national language locale, could want
to be enhance to accept Unicode, and could declare that they will accept
any BOM-discriminated format, but want to default, in the absence of a
BOM, to the original national language locale that they historically
accepted. That would provide a migration path for their old data files.
So the point is, that it might be nice to have
"BOM-otherEncodingForDefault" for each other encoding that Python
supports. Not sure that is the right API, but I think it is expressive
enough to handle the cases above. Whether the cases solve actual
problems or not, I couldn't say, but they seem like reasonable cases.
It would, of course, be nicest if OS metadata had been invented way back
when, for all OSes, such that all text files were flagged with their
encoding... then languages could just read the encoding and do the right
thing! But we live in the real world, instead.
What about listing the possible encodings? It would try each in turn
until it found one where the BOM matched or had no BOM:
my_file = open(filename, 'r', encoding='UTF-8-sig|UTF-16|UTF-8')
or is that taking it too far?
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com