This is all speculation and no hint of implementation at this point ... redirecting this subthread to Python-Ideas. Reply-To set accordingly.
Nick Coghlan writes: > Heh, I knew as soon as I sent that message that someone would be able > to point out a counter example. I agree that RFC 822 (and > case-insensitive ASCII comparison in general) is enough to save > lower() and upper() and co, but what about this even further reduced > list of text-specific methods: > > 'capitalize' > 'istitle' > 'swapcase' > 'title' > > While case-insensitive comparison makes sense for wire level data, > where do these methods fit in, even when embedded ASCII text fragments > are involved? Well, 'capitalize' could theoretically be used to "beautify" RFC 822 field names, but realistically, to me they're a litmus test for packages I probably don't want on my system.<0.5 wink> I don't know if it's worth the effort to deprecate them, though. There is a school of thought (represented on python-dev by Philip Eby and Antoine Pitrou, among others, I would say) that says that text with an implicit encoding is still text if you can figure out what the encoding is, and the syntactically important tokens are invariably ASCII, which often is enough information to do the work. So if you can do some operation without first converting to str, let's save the cycles and the bytes (especially in bit-shoveling applications like WSGI)! I disagree, but "consenting adults" and all that. It occurs to me that the bit-shoveling applications would generally be sufficiently well-served with a special "codec" that just stuffs the data pointer in a bytes object into the latin1 member of the data pointer union in a PEP 393 Unicode object, and marks the Unicode object as "ascii-compatible", ie, anything ASCII can be manipulated as text, but anything non-ASCII is like a private character that Python doesn't know anything about, and can't do anything useful with, except delete or pass through verbatim (perhaps as a slice). This may be nonsense; I don't know enough about Python internals to be sure. And it would be a change to PEP 393, since the encoding of the 8-bit representation would no longer be Unicode. I wouldn't blame Martin one bit if he hated the idea in principle! On the other hand, the "Latin-1 can be used to decode any binary content" end-around makes that point moot IMO. This would give a somewhat safer way of doing that. But if feasible and a Pythonic implementation could be devised, that would take much of the wind out of the sails of the "implicitly it's ASCII text" crowd. The whole "it's inefficient in time and space to work with 'str'" argument goes away, leaving them with "it's verbose" as the only reason for not doing the conversion. I don't know if there would be any use case left for bytes at that point ... but that's clearly a py4k discussion. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com