Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

Ethan Furman Sun, 12 Jan 2014 11:38:28 -0800

On 01/12/2014 11:00 AM, Paul Moore wrote:


And yet I still don't follow what you *want*. Unless it's that b'%d' %
(12,) must work and give b'12', and nothing else is acceptable.

Nothing else is ideal. I'll go that route if I have to. I understand that in the real world you go with what works,but in the development stage you fight for the ideal. :)

My reading of Nick's refusal is that %d takes a value which is
semantically a number, converts it into a base-10 representation
(which is semantically a *string*, not a sequence of bytes[1]) and
then *encodes* that string into a series of bytes using the ASCII
encoding. That is *two* semantic transformations, and one (the ASCII
encoding) is *implicit*. Specifically, it's implicit because (a) the
normal reading of %d is "produce the base-10 representation of a
number, and a base-10 representation is a *string*, and (b) because
nowhere has ASCII been mentioned (why not UTF16? that would be
entirely plausible for a wchar-based environment like Windows). And a
core principle of the bytes/text separation in Python 3 is that
encoding should never happen implicitly.


That could be.  And yet the bytes type already has several concessions to ASCII 
encoding.

By the way, I should point out that I would never have understood
*any* of the ideas involved in this thread before Python 3 forced me
to think about Unicode and the distinction between text and bytes. And
yet, I now find myself, in my (non-Python) work environment, being the
local expert whenever applications screw up text encodings. So I, for
one, am very grateful for Python 3's clear separation of bytes and
text. (And if I sometimes come across as over-dogmatic, I apologise -
put it down to the enthusiasm of the recent convert :-))

No worries. I was forced to learn the difference when I wrote my dbf module for 2.5. Took longer than I'd like toadmit to realize that ASCII was an encoding. :/

[1] If you cannot see that there's no essential reason why the base-10
representation '123' should correspond to the bytes b'\x31\x32\x33'
then you are probably not old enough to have started programming on
EBCDIC-based computers :-)

I can see it. :) But bytes already acknowledges an ASCII bias. ;) And even EBCDIC machines speak ASCII when talkingtelnet.


--
~Ethan~
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

Reply via email to