Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

Ethan Furman Sun, 12 Jan 2014 13:29:23 -0800

On 01/12/2014 12:02 PM, Stephen J. Turnbull wrote:

Georg Brandl writes:

Antoine writes:


. . . if it weren't for your stupid maximalist opposition. . .


Can you please stop throwing personal insults around?  You don't have to
resort to that level.


Ethan's posts (as an example of one general trend in this thread) are
pretty frustrating, you have to admit.


Two points:

1) Are you saying it's okay to be insulting when frustrated? I also find this mega-thread frustrating, but I'm tryingvery hard not to be insulting.


2) If you are going to use my name, please be certain of the facts [1].  More 
below.

MAL posted straight out the Python 2 model of text makes it easier for
him to write some programs, so he's all for reintroducing it.  And
that is the whole truth of the matter.  Although I disagree with him,
I appreciate his honesty.

If you have an example of me lying (even if it's just a possibility), please refer to it directly so I can either try toexplain the misunderstanding or apologize.

But people keep posting "we don't want Python 2's confounding of text
and binary, we just want bytes with (nearly) all the functionality of
strings [because they are (partially|really) encoded text]".  Some of
them actually use the literal word "text" in their justification!

In only one case did I use the word "text" loosely, and that was when I claimed that Py2 had three text types, and Py3had two. I was wrong, I apologize. Py3 has one definite text type, str, and, I claim, one half text type in bytes,because bytes itself provides ASCII text processing methods. If you have a better term for the notion ofb'ethan'.title() --> b'Ethan' than ASCII-text processing, I'll use that instead. If there are good reasons to not allowfurther concessions to the ASCII-ness of bytes (and you provide a good one below) then that makes living with thehandicap easier. But don't lie to me (as Nick tried to) and say that "In particular, the bytes type is, and always willbe, designed for pure binary manipulation" when it has methods like .center().


If I am wrong, and that was not a lie, please explain it to me.

That's, well, what would you call it?  Either they know what they're
saying, in which case it's disingenuous at best, or they don't know
what they're saying, in which case it's a proposal based on a clear
misunderstanding of the situation.

I think some of the misunderstanding (which you also seem to suffer from) is that we (or at least I) /ever/ want aunicode string back from bytes interpolation. I don't! If I start with bytes, I want bytes back! And I have a veryclear grasp on the difference between str and bytes and what ACSII encoding means, it was a hard and painful lesson forme and I'm not likely to forget it.

To summarize, I used the term text when referring to unicode text (str), ASCII or ASCII-encoded text to refer to bytesthat are to be used in a place that requires ASCII bytes for communication (such as content length or field type). I do/not/ use ASCII to refer to any ol' collection of bytes that happens to look like it might be ASCII-encoded text.

The problem is not going to go
away just because they *say* they don't want to reintroduce Python 2
text processing.  That is precisely what this proposal is *intended*
to do, whether in the limited form proposed by Antoine or in the much
more extensive form that folks like Ethan want.

What "maximalists" mean is that they promise not to abuse Python 2
text processing when writing Python 3 programs.  This promise is
highly unlikely to be kept for two reasons.  First, they can't make
that promise on behalf of third parties, who for various reasons
certainly will abuse these features to avoid the encoded-text-to-
Unicode-text and vice-versa conversions.


I concede that this is a good reason to not allow % interpolation.  Kinda like 
not allowing sum on strings.

And I don't make promises for other people, and abusing this feature would be a 
bug.

Second, I doubt they
themselves will keep the promise to my satisfaction because their
definition of "text" is ambiguous.

*My* definition is not ambiguous at all. If this particular part of the byte stream is defined to contain ASCII-encodedtext, then I can use the bytes text methods to work with it. The only time I would return a bytes object is if it wassupposed to be bytes (an image, for example); otherwise I return a bool, an int, a float, a date, or, even, a str.

When it's convenient for them to
use text-processing operations on bytes, they'll say "oh, yes, these
are conventionally considered text-processing features, but that's
just an accident of the particular configuration of bytes -- yup,
bytes -- I'm processing."

If that particular configuration of bytes is because it's ASCII-encoded text, then sure. To use, for example,bytes.__upper__ on data that wasn't ASCII-encoded text (even if it happened to look like it was) would be the height ofstupidity. Please don't include me in such accusations.

But Nick's important example of web frameworks demonstrates the
problem: unless they convert to text where appropriate, they're just
pushing the problem off on application writers.  Sometimes passing on
data as bytes is appropriate, of course, but the framework authors are
likely to be biased in favor of doing that, and it's not hard to
imagine frameworks ported from Python 2 passing on the problem
wholesale on the grounds that "we returned str in Python 2 which is
bytes in Python 3, and since we were processing bytes the whole time,
we see no reason to change the 'ABI'."  Of course the application
writers thought they were receiving text "in an inconvenient and
ambiguous form".  IMO, with the proposed changes, that is likely to
continue indefinitely, negating some of the gains I expected to
receive from Python 3. :-(


This would be a good reason to reject PEP 460, if that danger was deemed more 
likely than the good it would bring.

Note: there are a lot of high-level frameworks like Django that even
in Python 2 basically went to Unicode everywhere internally.  I don't
deny that.  I think that Python 3 as currently constituted makes it a
lot easier to make an appropriate decision of where to convert, and
should take some of the burden off the high-level frameworks.
Approving this PEP, especially in a maximalist form, will blur the
lines.

I understand your point, but I disagree. When I open a file (in binary mode, obviously, as otherwise I'd get massivecorruption) I get back a bunch of bytes. When working with tcp, I get back a bunch of bytes. bytes are /already/ theboundary type. If we have to make a third type for proper boundary processing it's an admission that bytes failed inits role.



--
~Ethan~

[1] I double-checked all my posts on this topic both here and on Python Ideas 
to make sure.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

Reply via email to