Stephen J. Turnbull wrote:

But why be verbose *and* ignore the vernacular?

    gzipped = plaintext.transform('gzip')
    plaintext = gzipped.transform('gunzip')

I'm generally resistant to a registry, none of my applications are so general that they would take advantage of a string-key-to-dictionary-to-function-pointer. If they did, they would have to have some pretty severe constraints on what functions can be selected, so I would end up building my own context sensitive dictionary of available functions. I'm in favor of:

    gzipped = plaintext.transform(zlib.compress)
    plaintext = gzipped.transform(zlib.decompress)

So, you may ask, why would that be any better that this...

    gzipped = zlib.compress(plaintext)

...and the answer is that it depends on what you consider the most appropriate design pattern to follow.

I think the style should be EIBTI for "private" protocols, and TOOWDTI
for transforms that wrap well-known libraries.

I've been around socket libraries and protocol encoding/decoding stacks too long I guess, or I'm just jaded, but TOOWDTI is a pipe dream. There's Only One Blessed Way To Do It I can understand and appreciate.

EIBTI trumps TOOWDTI when it has to go through a registry. I would be -1 on this design:

    In module codecs:

        from gzip import compress as _gzip_compress
        ...
        _registry['gzip'] = _gzip_compress

Where there is a great deal of code that enforces TOOWDTI, effectively obfuscating the fact that all your passing to transform() nothing more magical than a reference to a function.

This is a non-starter, because you don't know what the representation
of strings is.

If you're working on that kind of application. My applications have to know what the items in the sequence are, or they have to figure it out, but when it comes time to do the transformation, they know.

We could be right-thinking and mandate that in the
.transform() context the string representation is considered
big-endian (and for little-endian platforms the bytes are swabbed
before applying the transformation).

Yuck.

But that would annoy all the Wintel users because string.transform('zip')
would produce gobbledgook when unzipped from the command line.  And
of course assuming a little-endian representation is un-right-thinkable.

It would annoy me because mandating the format of the input is up to the transformation function, not the transform().

    y = x.transform(f)

If there is some endian restriction on f, it should detect it and enforce it, or if it can't, document it. If there is some platform strangeness, it should take that into account.

In this sense string-to-string and byte-to-byte *must* be kept
separate from "true" codecs.

I don't any codecs that aren't true. Some may be more popular or command than others, and the more popular ones may be blessed by being presented as easily accessible, just like your gunzip === gzip_to_plaintext.

I think it would be a very bad idea to allow names to be shared
for, say, byte-to-byte and string-to-byte "gzip" for the reason
given above.

I don't agree, only because I've written plenty of functions that can take a variety of different kinds of inputs as a convenience. If zlib.compress can take bytes or strings I would be fine with that, and if I could be more explicit, e.g.,

    gzipped = plainbytes.transform(zlib.compress_bytes)

I would be even happier. What is not available in Python that is in C++, and believe that I don't miss it all THAT much, is a way to select the appropriate function based on both the input and output. Annotations would have been a way to do it, but there's far too many people that don't like it for very good reasons.

Whether string-to-string and byte-to-byte need to share a namespace is
another question, but since we already need three (string->byte,
byte->string, byte->byte) that should be forced not to collide, I
don't think that there's that big a loss in requiring that
.transform('pig_latin') (string to string) be spelled differently from
.transform('pig_latin1') (byte to byte assuming ISO 8859/1 data).

I agree, and I don't think there's an advantage to passing string names.

    import piglatin as pig
    piggy = mytext.transform(pig.latin1_encode)

I'm -1 on transform.register('pig_latin1', pig.latin1_encode).

Do you have use cases where byte-to-byte and string-to-string
transformations should share the same name?

Not in the same module.


Joel

_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to