Greg Ewing wrote:
Stephen J. Turnbull wrote:
This discussion isn't about whether it could be done or not, it's
about where people expect to find such functionality. Personally, if
I can find .encode('euc-jp') on a string object, I would expect to
find .encode('gzip') on a bytes object, too.
What I'm not seeing is a clear rationale on where you
draw the line. Out of all the possible transformations
between a string and some other kind of data, which
ones deserve to be available via this rather strange
and special interface, and why?
Where this kind of unified interface to binary and character transforms
is incredibly handy is in a stacking IO model like the one used in Py3k.
For example, suppose you're using a compressed XML stream to communicate
over a network socket. What this approach allows you to do is have
generic 'transformation' layers in your IO stack, so you can just build
up your IO stack as something like:
XMLParserIO('myschema')
BufferedTextIO('utf-8')
BytesTransform('gzip')
RawSocketIO
To change to a different compression mechanism (e.g. bz2), you just
chance the codec used by the BytesTransform layer from 'gzip' to 'bz2'.
As for how you choose what to provide as codecs... well, that's a major
reason why the codec registry is extensible. The answer is that any
binary or character transform which is useful to the application
programmer can be accessed via the codec API - the only question will be
whether the application programmer will have to write the codec
themselves, or will find it already provided in the standard library.
Cheers,
Nick.
P.S. My original tangential response that didn't actually answer your
question, but may still be useful to some folks:
An actual codec that encodes a character string to a byte sequence, and
decodes a byte sequence back to a character string would be invoked via
the str.encode() and bytes.decode() methods. For example,
mystr.encode('utf-8') to serialise a string using UTF-8,
mybytes.decode('utf-8') to read it back.
A text transform that converts a character string to a different
character string would be invoked via the str.transform() and
str.untransform() methods. For example,
mystr.transform('unicode-escape') to convert unicode characters to their
\u or \U equivalents, mystr.untransform('unicode-escape') to convert
them back to the actual unicode characters.
A binary transform that converts a byte sequence to a different byte
sequence would be invoked via the bytes.transform() and
bytes.untransform() methods. For example, mybytes.transform('gzip') to
compress a byte sequence, mybytes.untransform('gzip') to decompress it.
--
Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe:
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com