On 11.01.2014 14:54, Georg Brandl wrote:
> Am 11.01.2014 14:49, schrieb Georg Brandl:
>> Am 11.01.2014 10:44, schrieb Stephen Hansen:
>>
>>> I mean, its not like the "bytes" type lacks knowledge of the subset of bytes
>>> that happen to be 7-bit ascii-compatible and can't perform text-ish 
>>> operations
>>> on them--
>>>
>>>   Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 
>>> bit
>>> (Intel)] on win32
>>>   Type "help", "copyright", "credits" or "license" for more information.
>>>   >>> b"stephen hansen".title()
>>>   b'Stephen Hansen'
>>>
>>> How is this not a practical recognition that yes, while bytes are byte 
>>> streams
>>> and not text, a huge subset of bytes are text-y, and as long as we maintain 
>>> the
>>> barrier between higher characters and implicit conversion therein, we're 
>>> fine?
>>>
>>> I don't see the difference here. There is a very real, practical need to
>>> interpolate bytes. This very real, practical need includes the very real
>>> recognition that converting 12345 to b'12345' is not something weird, 
>>> unusual,
>>> and subject to the thorny issues of Encodings. It is not violating the 
>>> doctrine
>>> of separation of powers between Text and Bytes.
>>
>> This. Exactly. Thanks for putting it so nicely, Stephen.
> 
> To elaborate: if the bytes type didn't have all this ASCII-aware functionality
> already, I think we would have (and be using) a dedicated "asciistr" type 
> right
> now.  But it has the functionality, and it's way too late to remove it.

I think we need to step back a little from the purist view
of things and give more emphasis on the "practicality beats
purity" Zen.

I complete agree with Stephen, that bytes are in fact often
an encoding of text. If that text is ASCII compatible, I don't
see any reason why we should not continue to expose the C lib
standard string APIs available for text manipulations on bytes.

We don't have to be pedantic about the bytes/text separation.
It doesn't help in real life.

If you give programmers the choice they will - most of the time -
do the right thing. If you don't give them the tools, they'll
work around the missing features in a gazillion different
ways of which many will probably miss a few edge cases.

bytes already have most of the 8-bit string methods from Python 2,
so it doesn't hurt adding some more of the missing features
from Python 2 on top to make life easier for people dealing
with multiple/unknown encoding data.

BTW: I don't know why so many people keep asking for use cases.
Isn't it obvious that text data without known (but ASCII compatible)
encoding or multiple different encodings in a single data chunk
is part of life ? Most HTTP packets fall into this category,
many email messages as well. And let's not forget that we don't
live in a perfect world. Broken encodings are everywhere around
you - just have a look at your spam folder for a decent chunk
of example data :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 11 2014)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to