[issue19619] Blacklist base64, hex, ... codecs from bytes.decode() and str.encode()

Marc-Andre Lemburg Sat, 16 Nov 2013 05:27:27 -0800

Marc-Andre Lemburg added the comment:

On 16.11.2013 13:44, Nick Coghlan wrote:
> 
> Nick Coghlan added the comment:
> 
> Now that I understand Victor's proposal better, I actually agree with it, I 
> just think the attribute names need to be "encodes_to" and "decodes_to".
> 
> With Victor's proposal, *input* validity checks (including type checks) would 
> remain the responsibility of the codec itself. What the new attributes would 
> enable is *output* type checks *without having to perform the encoding or 
> decoding operation first*. codecs will be free to leave these as None to 
> retain the current behaviour of "try it and see".
> 
> The specific field names "input_type" and "output_type" aren't accurate, 
> since the acceptable input types for encoding or decoding are likely to be 
> more permissive than the specific output type for the other operation. Most 
> of the binary codecs, for example, accept any bytes-like object as input, but 
> produce bytes objects as output for both encoding and decoding. For Unicode 
> encodings, encoding is strictly str->bytes, but decoding is generally the 
> more permissive bytes-like object -> str.
> 
> I would still suggest providing the following helper function in the codecs 
> module (the name has changed from my earlier suggestion and I now suggest 
> implementing it in terms of Victor's suggestion with more appropriate field 
> names):
> 
>     def is_text_encoding(name):
>         """Returns true if the named encoding is a Unicode text encoding"""
>         info = codecs.lookup(name)
>         return info.encodes_to is bytes and info.decodes_to is str
> 
> This approach covers all the current stdlib codecs:
> 
> - the text encodings encode to bytes and decode to str
> - the binary transforms encode to bytes and also decode to bytes
> - the lone text transform (rot_13) encodes and decodes to str
> 
> This approach also makes it possible for a type inference engine (like mypy) 
> to potentially analyse codec use, and could be expanded in 3.5 to offer type 
> checked binary and text transform APIs that filtered codecs appropriately 
> according to their output types.


Nick, you are missing an important point: codecs can have any
number of input/output type combinations, e.g. they may
convert bytes -> str and str->str (output type depends on
input type).

For this reason the simplistic approach with just one type
conversion will not work. Codecs will have to provide a
*mapping* of input to output types for each direction
(encoding and decoding) - either as Python mapping or
as list of mapping tuples.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19619>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19619] Blacklist base64, hex, ... codecs from bytes.decode() and str.encode()

Reply via email to