Re: [Python-3000] PEP 3138- String representation in Python 3000

Nick Coghlan Thu, 15 May 2008 03:34:52 -0700

Greg Ewing wrote:

Stephen J. Turnbull wrote:

This discussion isn't about whether it could be done or not, it's
about where people expect to find such functionality.  Personally, if
I can find .encode('euc-jp') on a string object, I would expect to
find .encode('gzip') on a bytes object, too.


What I'm not seeing is a clear rationale on where you
draw the line. Out of all the possible transformations
between a string and some other kind of data, which
ones deserve to be available via this rather strange
and special interface, and why?

Where this kind of unified interface to binary and character transformsis incredibly handy is in a stacking IO model like the one used in Py3k.For example, suppose you're using a compressed XML stream to communicateover a network socket. What this approach allows you to do is havegeneric 'transformation' layers in your IO stack, so you can just buildup your IO stack as something like:


XMLParserIO('myschema')
BufferedTextIO('utf-8')
BytesTransform('gzip')
RawSocketIO

To change to a different compression mechanism (e.g. bz2), you justchance the codec used by the BytesTransform layer from 'gzip' to 'bz2'.

As for how you choose what to provide as codecs... well, that's a majorreason why the codec registry is extensible. The answer is that anybinary or character transform which is useful to the applicationprogrammer can be accessed via the codec API - the only question will bewhether the application programmer will have to write the codecthemselves, or will find it already provided in the standard library.


Cheers,
Nick.

P.S. My original tangential response that didn't actually answer yourquestion, but may still be useful to some folks:

An actual codec that encodes a character string to a byte sequence, anddecodes a byte sequence back to a character string would be invoked viathe str.encode() and bytes.decode() methods. For example,mystr.encode('utf-8') to serialise a string using UTF-8,mybytes.decode('utf-8') to read it back.

A text transform that converts a character string to a differentcharacter string would be invoked via the str.transform() andstr.untransform() methods. For example,mystr.transform('unicode-escape') to convert unicode characters to their\u or \U equivalents, mystr.untransform('unicode-escape') to convertthem back to the actual unicode characters.

A binary transform that converts a byte sequence to a different bytesequence would be invoked via the bytes.transform() andbytes.untransform() methods. For example, mybytes.transform('gzip') tocompress a byte sequence, mybytes.untransform('gzip') to decompress it.




--
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] PEP 3138- String representation in Python 3000

Reply via email to