Re: [Python-3000] PEP 3138- String representation in Python 3000

Guido van Rossum Wed, 14 May 2008 20:22:43 -0700

On Wed, May 14, 2008 at 8:00 PM, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> This discussion isn't about whether it could be done or not, it's
> about where people expect to find such functionality.  Personally, if
> I can find .encode('euc-jp') on a string object, I would expect to
> find .encode('gzip') on a bytes object, too.


The argument against reusing the same method name is that in 3.0 we
need to keep bytes and str instances separate more carefully than we
did in 2.x. Consider code that gets an encoding passed in as a
variable e. It knows it has a bytes instance b. To encode b from bytes
to str (unicode), it can use s = b.decode(e). It can then treat s as a
string, e.g. write it to a text file or pass it to a text processing
class. If the possibility existed that the result was actually a bytes
instance (e.g. when e == 'gzip' instead of e == 'euc-jp') this would
either cause the code to break subtly in the field, or it would
require the programmer do an additional type check on s before using
it. (And I know quite a few programmers who would feel obliged to
handle this case.)

Of course the possibility always exists that e is not a valid encoding
at all; but that case raises a predictable exception. Similar in the
case that b can't be decoded using e. Having something be a valid
encoding but return an unusable result is much more problematic.

> I think this one is just going to come down to BDFL pronouncement
> about which is more Pythonic, because I don't really see either point
> of view as more "natural".

It's mostly settled. There will be separate methods to transform bytes
to bytes and to transform str to str, and these will use separate
collections of encodings. (Or perhaps some codecs will apply to
multiple cases, e.g. rot13 might apply both for str<->str and for
bytes<->bytes; but I'd expect gzip to apply only for bytes<->bytes.)
There will be metadata on the codecs so that b.decode("gzip") will
raise an exception just as b.transform("utf-8") will. The details
haven't all been sorted out but so far the only names proposed that I
like are transform() and untransform(). I propose that
b.transform("gzip") would compress and b.untransform("gzip") would
uncompress. I'm fine with the str and bytes methods both being called
transform() and untransform() -- this is no different than the current
situation with e.g. lower() and upper().

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] PEP 3138- String representation in Python 3000

Reply via email to