On 09/01/2014 00:12, Kristján Valur Jónsson wrote:
Just to avoid confusion, let me state up front that I am very well aware of 
encodings and all that, having internationalized one largish app in python 2.x. 
 I know the problems that 2.x had with tracking down the source of errors and 
understand the beautiful concept of encodings on the boundary.

However:
For a  lot of data processing and tools, encoding isn't an issue.  Either you assume 
ascii, or you're working with something like latin1.  A single byte encoding.  This is 
because you're working with a text file that _you_ wrote.  And you're not assigning any 
semantics to the characters.  If there is actual "text" in there it is just 
english, not Norwegian or Turkish. A byte read at code 0xfa doesn't mean anything 
special.  It's just that, a byte with that value.  The file system doesn't have any 
default encoding.  A file on disk is just a file on disk consisting of bytes.  There can 
never be any wrong encoding, no mojibake.

With python 2, you can read that file into a string object.  You can scan for 
your field delimiter, e.g. a comma, split up your string, interpolate some 
binary data, spit it out again.  All without ever thinking about encodings.

Even though the file is conceptually encoded in something, if you insist on 
attaching a particular semantic meaning to every ordinal value, whatever that 
meaning is is in many cases irrelevant to the program.

I understand that surrogateescape allows you to do this.  But it is an awkward 
extra step and forces an extra layer of needles semantics on to that guy that 
just wants to read a file.  Sure, vegetarians and alergics like to read the 
list of ingredients on everything that they eat.  But others are just omnivores 
and want to be able to eat whatever is on the table, and not worry about what 
it is made of.
And yes, you can read the file in binary mode but then you end up with those 
bytes objects that we have just found that are tedious to work with.


All I can say is that I've been using python 3 for years and wouldn't know what a surrogateescape was if you were to hit me around the head with it. I open my files, I process them, and Python kindly closes them for me via a context manager. So if you're not bothered about encoding, where has the "awkward extra step and forces an extra layer of needles semantics" bit come from?

--
My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language.

Mark Lawrence

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to