Steven D'Aprano writes: > base64.b64encode take bytes as input and returns bytes. Some people are > arguing that this is wrong behaviour, as RFC 3548
That RFC is obsolete: the replacement is RFC 4648. However, the text is essentially unchanged. > specifies that Base64 should transform bytes to characters: Without defining "character" except as a "subset" of ASCII. That omission is evidently deliberate. Unfortunately the RFC is unclear whether a subset of the ASCII repertoire of (abstract) characters is meant, or a subset of the ASCII codes. I believe the latter is meant, but either way, it does refer to *encoded* characters as the output of the encoding process: > The encoding process represents 24-bit groups of input bits > as output strings of 4 encoded characters. and I see no reason to deny that the bytes output by base64.b64encode are the octets representing the ASCII codes for the characters of the BASE64 alphabet. > Are they misinterpreting the standard? I think they are. As I understand it, the intention of the standard in using "character" to denote the code unit is similar to that of RFC 3986: BASE encodings are intended to be printable and recognizable to humans. If you're using a non-ASCII-superset encoding such as EBCDIC for text I/O, then you should translate from ASCII to that encoding for display, and in the (unlikely) case that a human types BASE encoding from the terminal, the reverse transformation is necessary. > Has Python got it wrong? I can't see anything in the RFC that suggests that. And, in the end, an RFC is not concerned with Python's internal fiddling, but rather with what goes out over the wire. All of the implementations you mention will eventually send to the wire octets that are interpreted as ASCII-encoded characters according to their integer values. > Is there a good reason for returning bytes? I suppose practicality over purity: BASE encodings are normally used on the wire, and so programs need to encode text to appropriately encoded octets *before* BASE encoding, and then normally immediately put the BASE-encoded content on the wire. Why round-trip from UTF-8 bytes to a str in BASE64 representation, and then do the (trivial) conversion back to bytes? OK, it's not that expensive, but still... _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com