Re: [Python-3000] PEP 3138- String representation in Python 3000

Joel Bender Mon, 19 May 2008 08:53:34 -0700

Stephen J. Turnbull wrote:

But why be verbose *and* ignore the vernacular?


    gzipped = plaintext.transform('gzip')
    plaintext = gzipped.transform('gunzip')

I'm generally resistant to a registry, none of my applications are sogeneral that they would take advantage of astring-key-to-dictionary-to-function-pointer. If they did, they wouldhave to have some pretty severe constraints on what functions can beselected, so I would end up building my own context sensitive dictionaryof available functions. I'm in favor of:


    gzipped = plaintext.transform(zlib.compress)
    plaintext = gzipped.transform(zlib.decompress)

So, you may ask, why would that be any better that this...

    gzipped = zlib.compress(plaintext)

...and the answer is that it depends on what you consider the mostappropriate design pattern to follow.

I think the style should be EIBTI for "private" protocols, and TOOWDTI
for transforms that wrap well-known libraries.

I've been around socket libraries and protocol encoding/decoding stackstoo long I guess, or I'm just jaded, but TOOWDTI is a pipe dream.There's Only One Blessed Way To Do It I can understand and appreciate.

EIBTI trumps TOOWDTI when it has to go through a registry. I would be-1 on this design:


    In module codecs:

        from gzip import compress as _gzip_compress
        ...
        _registry['gzip'] = _gzip_compress

Where there is a great deal of code that enforces TOOWDTI, effectivelyobfuscating the fact that all your passing to transform() nothing moremagical than a reference to a function.

This is a non-starter, because you don't know what the representation
of strings is.

If you're working on that kind of application. My applications have toknow what the items in the sequence are, or they have to figure it out,but when it comes time to do the transformation, they know.

We could be right-thinking and mandate that in the
.transform() context the string representation is considered
big-endian (and for little-endian platforms the bytes are swabbed
before applying the transformation).


Yuck.

But that would annoy all the Wintel users because string.transform('zip')
would produce gobbledgook when unzipped from the command line.  And
of course assuming a little-endian representation is un-right-thinkable.

It would annoy me because mandating the format of the input is up to thetransformation function, not the transform().


    y = x.transform(f)

If there is some endian restriction on f, it should detect it andenforce it, or if it can't, document it. If there is some platformstrangeness, it should take that into account.

In this sense string-to-string and byte-to-byte *must* be kept
separate from "true" codecs.

I don't any codecs that aren't true. Some may be more popular orcommand than others, and the more popular ones may be blessed by beingpresented as easily accessible, just like your gunzip === gzip_to_plaintext.

I think it would be a very bad idea to allow names to be shared
for, say, byte-to-byte and string-to-byte "gzip" for the reason
given above.

I don't agree, only because I've written plenty of functions that cantake a variety of different kinds of inputs as a convenience. Ifzlib.compress can take bytes or strings I would be fine with that, andif I could be more explicit, e.g.,


    gzipped = plainbytes.transform(zlib.compress_bytes)

I would be even happier. What is not available in Python that is inC++, and believe that I don't miss it all THAT much, is a way to selectthe appropriate function based on both the input and output.Annotations would have been a way to do it, but there's far too manypeople that don't like it for very good reasons.

Whether string-to-string and byte-to-byte need to share a namespace is
another question, but since we already need three (string->byte,
byte->string, byte->byte) that should be forced not to collide, I
don't think that there's that big a loss in requiring that
.transform('pig_latin') (string to string) be spelled differently from
.transform('pig_latin1') (byte to byte assuming ISO 8859/1 data).


I agree, and I don't think there's an advantage to passing string names.

    import piglatin as pig
    piggy = mytext.transform(pig.latin1_encode)

I'm -1 on transform.register('pig_latin1', pig.latin1_encode).

Do you have use cases where byte-to-byte and string-to-string
transformations should share the same name?


Not in the same module.


Joel

_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] PEP 3138- String representation in Python 3000

Reply via email to