On 11.01.2014 14:54, Georg Brandl wrote: > Am 11.01.2014 14:49, schrieb Georg Brandl: >> Am 11.01.2014 10:44, schrieb Stephen Hansen: >> >>> I mean, its not like the "bytes" type lacks knowledge of the subset of bytes >>> that happen to be 7-bit ascii-compatible and can't perform text-ish >>> operations >>> on them-- >>> >>> Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 >>> bit >>> (Intel)] on win32 >>> Type "help", "copyright", "credits" or "license" for more information. >>> >>> b"stephen hansen".title() >>> b'Stephen Hansen' >>> >>> How is this not a practical recognition that yes, while bytes are byte >>> streams >>> and not text, a huge subset of bytes are text-y, and as long as we maintain >>> the >>> barrier between higher characters and implicit conversion therein, we're >>> fine? >>> >>> I don't see the difference here. There is a very real, practical need to >>> interpolate bytes. This very real, practical need includes the very real >>> recognition that converting 12345 to b'12345' is not something weird, >>> unusual, >>> and subject to the thorny issues of Encodings. It is not violating the >>> doctrine >>> of separation of powers between Text and Bytes. >> >> This. Exactly. Thanks for putting it so nicely, Stephen. > > To elaborate: if the bytes type didn't have all this ASCII-aware functionality > already, I think we would have (and be using) a dedicated "asciistr" type > right > now. But it has the functionality, and it's way too late to remove it.
I think we need to step back a little from the purist view of things and give more emphasis on the "practicality beats purity" Zen. I complete agree with Stephen, that bytes are in fact often an encoding of text. If that text is ASCII compatible, I don't see any reason why we should not continue to expose the C lib standard string APIs available for text manipulations on bytes. We don't have to be pedantic about the bytes/text separation. It doesn't help in real life. If you give programmers the choice they will - most of the time - do the right thing. If you don't give them the tools, they'll work around the missing features in a gazillion different ways of which many will probably miss a few edge cases. bytes already have most of the 8-bit string methods from Python 2, so it doesn't hurt adding some more of the missing features from Python 2 on top to make life easier for people dealing with multiple/unknown encoding data. BTW: I don't know why so many people keep asking for use cases. Isn't it obvious that text data without known (but ASCII compatible) encoding or multiple different encodings in a single data chunk is part of life ? Most HTTP packets fall into this category, many email messages as well. And let's not forget that we don't live in a perfect world. Broken encodings are everywhere around you - just have a look at your spam folder for a decent chunk of example data :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 11 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com