Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

Stephen J. Turnbull Sun, 24 Jan 2010 06:39:37 -0800

Michael Foord writes:

 > When reading text files the presence of the UTF-8 signature *almost 
 > invariably* means a UTF-8 encoding. Honouring this will almost always be 
 > better than using the wrong encoding. Of course there are caveats, but 
 > it will be a substantial improvement.


Sure, that would be better than using the wrong encoding *if* the only
thing that matters is getting the input codec right.  But it's not
clear that it's an improvement from the naive programmers' point of
view, which needs to take into account the behavior of the whole
application.  Is it an improvement if it "seems to work" in testing,
and then munges something important to the boss because she has a
correspondent who uses UTF-8, not UTF-8-signature?  Maybe it's better
if it screws up almost all the time, so that the problem is detected
early!

 > Unless you keep the information about the original encoding along with 
 > the decoded string changing the (default0 output encoding depending on 
 > the input is simply not possible - and so not really relevant.

That's throwing the baby out with the bathwater.  Very few practical
applications that care about the input encoding are going to be
willing to accept an output encoding that doesn't correspond to the
input encoding in an appropriate way.

*If* you are going to advocate guessing about the input encoding, even
based on very strong signals like the UTF-8 signature, then you really
have to advocate adding the infrastructure to ensure that the output
encoding is properly set.  If the output encoding is the programmer's
problem, then it's purely pandering to laziness not to ask them to
deal with the input encoding as well.
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

Reply via email to