On Wed, May 14, 2008 at 8:00 PM, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > This discussion isn't about whether it could be done or not, it's > about where people expect to find such functionality. Personally, if > I can find .encode('euc-jp') on a string object, I would expect to > find .encode('gzip') on a bytes object, too.
The argument against reusing the same method name is that in 3.0 we need to keep bytes and str instances separate more carefully than we did in 2.x. Consider code that gets an encoding passed in as a variable e. It knows it has a bytes instance b. To encode b from bytes to str (unicode), it can use s = b.decode(e). It can then treat s as a string, e.g. write it to a text file or pass it to a text processing class. If the possibility existed that the result was actually a bytes instance (e.g. when e == 'gzip' instead of e == 'euc-jp') this would either cause the code to break subtly in the field, or it would require the programmer do an additional type check on s before using it. (And I know quite a few programmers who would feel obliged to handle this case.) Of course the possibility always exists that e is not a valid encoding at all; but that case raises a predictable exception. Similar in the case that b can't be decoded using e. Having something be a valid encoding but return an unusable result is much more problematic. > I think this one is just going to come down to BDFL pronouncement > about which is more Pythonic, because I don't really see either point > of view as more "natural". It's mostly settled. There will be separate methods to transform bytes to bytes and to transform str to str, and these will use separate collections of encodings. (Or perhaps some codecs will apply to multiple cases, e.g. rot13 might apply both for str<->str and for bytes<->bytes; but I'd expect gzip to apply only for bytes<->bytes.) There will be metadata on the codecs so that b.decode("gzip") will raise an exception just as b.transform("utf-8") will. The details haven't all been sorted out but so far the only names proposed that I like are transform() and untransform(). I propose that b.transform("gzip") would compress and b.untransform("gzip") would uncompress. I'm fine with the str and bytes methods both being called transform() and untransform() -- this is no different than the current situation with e.g. lower() and upper(). -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com