On 01/13/2014 02:48 AM, Stephen J. Turnbull wrote:
Ethan Furman writes:

The part that you don't seem to acknowledge (sorry if I missed it)
is that there are str-like methods already on bytes.

I haven't expressed myself well, but I don't much care about that.

You don't care that there are str-like methods on bytes? Whether you do or not, they are there, and they impact how people think about bytes and what is (and what should be) allowed.

It's what Knuth would classify as a seminumerical method.

I do not see how that's relevant. What matters is not how we can manipulate the data (everything is reduced to numbers at some point), but what the data represents.

[snip]

*My* definition is not ambiguous at all.  If this particular part
of the byte stream is defined to contain ASCII-encoded text, then I
can use the bytes text methods to work with it.

But how is Python supposed to know that?

Python doesn't need to.

... because you know it.  But the ideal of object-oriented programming
(and duck-typing) is that you shouldn't need to; the object should
know how to produce appropriate behavior itself.

The ideal, sure. But if you're stuck with using a list to hold data for your higher-order recursive function are you going to expect the list data type to "know" which pops and inserts are allowed and which are not? Of course not. And you'd probably build a proper class on top of the list so those things could be checked. Now imagine that the list type didn't offer insert and pop, and you had to use slice replacement -- what a pain that would be!

[snip]

But under your definition, you need to make the decision, or
explicitly code the decision, on the basis of context.

Exactly so.  I even have to do that in Py2.

"Even."  This is exactly where PBP and EIBTI part company, I think.
EIBTI thinks its a bad idea to pass around bytes that are implicitly
some other type

bytes are /always/ implicitly some other type. They are basically raw data. They are given meaning by how we interpret them.

[snip]

Even though "ethan" is perfectly good ASCII-encoded text (as well as
the integer 435,744,694,638 on a bigendian machine with 5-byte words,
and you have no way of knowing whether it was user data (CP1251) or a
metadata keyword (ASCII) or be the US national debt in 1967 dollars
(integer) when b'ethan' shows up in a trace?

Context is everything. If b'ethan' shows up in a trace I would have to examine the surrounding code to see how those bytes were being used.

And if there were methods that worked directly on a cp1251-encoded
byte stream I would not have any problem using them on
cp1251-encoded text.)

I was afraid of that: all of those methods (except the case methods)
will work fine on a cp1251-encoded text.

Really?  Huh.  They wouldn't work fine with the Spanish alphabet.  I should've 
used that for my example.  :/

And because they only know
that the string is bytes, the case methods will silently corrupt your
"text" as soon as they get a chance.

Inevitably there are methods that will "work" even if given the wrong data type, while others will either corrupt or blow up if not given exactly what they expect. You tell me that some ASCII methods will work okay on cp1251 text, and others will not. So I'm not going to use any of them on cp1251 as that is not what they are intended for.

That bothers me, even if it
doesn't bother you.  Purity again, if you like.  (But you'd take a
safe .upper if you got it for free, no?)

Well, there is no such thing as free. ;) And there already is a safe .upper -- str.upper. And if I don't know that my bytes are ASCII, but I did know they were text, I wouldn't use ASCII methods, I'd convert to str and work there.

--
~Ethan~
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to