On 09/01/2014 00:12, Kristján Valur Jónsson wrote:
Just to avoid confusion, let me state up front that I am very well aware of
encodings and all that, having internationalized one largish app in python 2.x.
I know the problems that 2.x had with tracking down the source of errors and
understand the beautiful concept of encodings on the boundary.
However:
For a lot of data processing and tools, encoding isn't an issue. Either you assume
ascii, or you're working with something like latin1. A single byte encoding. This is
because you're working with a text file that _you_ wrote. And you're not assigning any
semantics to the characters. If there is actual "text" in there it is just
english, not Norwegian or Turkish. A byte read at code 0xfa doesn't mean anything
special. It's just that, a byte with that value. The file system doesn't have any
default encoding. A file on disk is just a file on disk consisting of bytes. There can
never be any wrong encoding, no mojibake.
With python 2, you can read that file into a string object. You can scan for
your field delimiter, e.g. a comma, split up your string, interpolate some
binary data, spit it out again. All without ever thinking about encodings.
Even though the file is conceptually encoded in something, if you insist on
attaching a particular semantic meaning to every ordinal value, whatever that
meaning is is in many cases irrelevant to the program.
I understand that surrogateescape allows you to do this. But it is an awkward
extra step and forces an extra layer of needles semantics on to that guy that
just wants to read a file. Sure, vegetarians and alergics like to read the
list of ingredients on everything that they eat. But others are just omnivores
and want to be able to eat whatever is on the table, and not worry about what
it is made of.
And yes, you can read the file in binary mode but then you end up with those
bytes objects that we have just found that are tedious to work with.
All I can say is that I've been using python 3 for years and wouldn't
know what a surrogateescape was if you were to hit me around the head
with it. I open my files, I process them, and Python kindly closes them
for me via a context manager. So if you're not bothered about encoding,
where has the "awkward extra step and forces an extra layer of needles
semantics" bit come from?
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com