R. David Murray writes: > If most people agree with Antoine I won't fight it, but it seems to me > that accepting unicode in the binascii and base64 APIs is a bad > idea.
First, I agree with David that this change should have been brought up on python-dev before committing it. The distinctions Python 3 has made between APIs for bytes and those for str are both obviously controversial and genuinely delicate. Second, if Unicode is to be accepted in these APIs, there is a doc issue (which I haven't checked). It must be made clear that the "printable ASCII" is question is the set represented by the *integers* 33 to 126, *not* the ASCII characters ! to ~. Those characters are present in the Unicode repertoire in many other places (specifically the "full-width ASCII" compatibility character set around U+FF20, but also several Greek and Cyrillic characters, and possibly others.) I'm going to side with Antoine and Nick on these particular changes because in practice (except maybe in the email module :-( ) the BASE-encoded "text" to be decoded is going to be consistently defined by the client as either str or bytes, but not both. The fact that the repr of the encoded text is identical (except for the presence or absence of a leading "b") is very suggestive here. I do harbor a slight niggle that I think there is more room for confusion here than in Nick's urllib work. However, once we clarify that confusion in *our* minds, I don't think there's much potential for dangerous confusion for API clients. (I agree with Antoine on that point.) The BASE## decoding APIs in abstract are "text" to bytes. Pedantically in Python that suggests a str -> bytes signature, but RFC 4648 doesn't anywhere require a 1-byte representation of ASCII, only that the representation be interpreted as integers in the ASCII coding. However, an RFC-4648-conforming implementation MUST reject any string containing characters not allowed in the representation, so it's actually stricter than requiring ASCII. I see no problem with allowing str-or-bytes -> bytes polymorphism here. The remaining issue to my mind is we'd also like bytes -> str-or-bytes polymorphism for symmetry, but this is not Haskell, we can't have it. The same is true for binascii, I suppose -- assuming that the module is specified (as the name suggests) to produce and consume only ASCII text as a representation of bytes. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com