On Tue, Dec 7, 2010 at 12:06 AM, Nick Coghlan <ncogh...@gmail.com> wrote: > On Tue, Dec 7, 2010 at 2:46 PM, Alexander Belopolsky > <alexander.belopol...@gmail.com> wrote: >> Having all encodings accessible in a str method only promotes a >> programming style where bytes objects can contain differently encoded >> strings in different parts of the program. Instead, well-written >> programs should decode bytes on input, do all processing with str type >> and decode on output. When strings need to be passed to char* C APIs, >> they should be encoded in UTF-8. Many C APIs originally designed for >> ASCII actually produce meaningful results when given UTF-8 bytes. >> (Supporting such usage was one of the design goals of UTF-8.) > > This world sounds nice, but it isn't the one that exists right now. > Practicality beats purity and all that :)
.. and default encoding being fixed as UTF-8 already goes 99% of the way to that world. As long as I can use encode/decode without an argument, it does not bother me much that they can take one. These methods are also much easier to ignore than the transform/untransform pair simply because it is only one method per class. transform/untransform have much larger mental footprint not only because there are two of them in both str and bytes, but also because both str and bytes have a synonymously named translate method. With 43 non-special methods, str interface is already huge. The transform() method with a suitable set of codecs could possibly replace things like expandtabs() or swapcase(), but that would be like writing x.transform('exp') and x.unstransform('exp') instead of math.exp(x) and math.log(x). _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com