Marc-Andre Lemburg added the comment: On 16.11.2013 13:44, Nick Coghlan wrote: > > Nick Coghlan added the comment: > > Now that I understand Victor's proposal better, I actually agree with it, I > just think the attribute names need to be "encodes_to" and "decodes_to". > > With Victor's proposal, *input* validity checks (including type checks) would > remain the responsibility of the codec itself. What the new attributes would > enable is *output* type checks *without having to perform the encoding or > decoding operation first*. codecs will be free to leave these as None to > retain the current behaviour of "try it and see". > > The specific field names "input_type" and "output_type" aren't accurate, > since the acceptable input types for encoding or decoding are likely to be > more permissive than the specific output type for the other operation. Most > of the binary codecs, for example, accept any bytes-like object as input, but > produce bytes objects as output for both encoding and decoding. For Unicode > encodings, encoding is strictly str->bytes, but decoding is generally the > more permissive bytes-like object -> str. > > I would still suggest providing the following helper function in the codecs > module (the name has changed from my earlier suggestion and I now suggest > implementing it in terms of Victor's suggestion with more appropriate field > names): > > def is_text_encoding(name): > """Returns true if the named encoding is a Unicode text encoding""" > info = codecs.lookup(name) > return info.encodes_to is bytes and info.decodes_to is str > > This approach covers all the current stdlib codecs: > > - the text encodings encode to bytes and decode to str > - the binary transforms encode to bytes and also decode to bytes > - the lone text transform (rot_13) encodes and decodes to str > > This approach also makes it possible for a type inference engine (like mypy) > to potentially analyse codec use, and could be expanded in 3.5 to offer type > checked binary and text transform APIs that filtered codecs appropriately > according to their output types.
Nick, you are missing an important point: codecs can have any number of input/output type combinations, e.g. they may convert bytes -> str and str->str (output type depends on input type). For this reason the simplistic approach with just one type conversion will not work. Codecs will have to provide a *mapping* of input to output types for each direction (encoding and decoding) - either as Python mapping or as list of mapping tuples. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19619> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com