On Tue, 14 Jun 2016 14:05:19 -0300, "Joao S. O. Bueno" <jsbu...@python.org.br> wrote: > On 14 June 2016 at 13:32, Toshio Kuratomi <a.bad...@gmail.com> wrote: > > > > On Jun 14, 2016 8:32 AM, "Joao S. O. Bueno" <jsbu...@python.org.br> wrote: > >> > >> On 14 June 2016 at 12:19, Steven D'Aprano <st...@pearwood.info> wrote: > >> > Is there > >> > a good reason for returning bytes? > >> > >> What about: it returns 0-255 numeric values for each position in a > >> stream, with > >> no clue whatsoever to how those values map to text characters beyond > >> the 32-128 range? > >> > >> Maybe base64.decode could take a "encoding" optional parameter - or > >> there could be > >> a separate 'decote_to_text" method that would explicitly take a text codec > >> name. > >> Otherwise, no, you simply can't take a bunch of bytes and say they > >> represent text. > >> > > Although it's not explicit, the question seems to be about the output of > > encoding (and for symmetry, the input of decoding). In both of those cases, > > valid output will consist only of ascii characters. > > > > The input to encoding would have to remain bytes (that's the main purpose of > > base64... to turn bytes into an ascii string). > > > > Sorry, it is 2016, and I don't think at this point anyone can consider > an ASCII string > as a representative pattern of textual data in any field of application. > Bytes are not text. Bytes with an associated, meaningful, encoding are text. > I thought this had been through when Python 3 was out. > > Unless you are working with COBOL generated data (and intending to keep > the file format) , it does not make sense in any real-world field. > (supposing your > Cobol data is ASCII and nort EBCDIC).
The fundamental purpose of the base64 encoding is to take a series of arbitrary bytes and reversibly turn them into another series of bytes in which the eighth bit is not significant. Its utility is for transmitting eight bit bytes over a channel that is not eight bit clean. Before unicode, that meant bytes. Now that we have unicode in use in lots of places, you can think of unicode as a communications channel that is not eight bit clean. So, we might want to use base64 encoding to transmit arbitrary bytes over a unicode channel. This gives a legitimate reason to want unicode output from a base64 encoder. However, it is equally legitimate in the Python context to say you should be explicit about your intentions by decoding the bytes output of the base64 encoder using the ASCII codec. This was indeed discussed at length. For a while we didn't even allow unicode input on either side, but we relaxed that. My understanding of Python's current stance on functions that handle both bytes and string is that *either* the function accepts both types and outputs the *same* type as the input, *or* it accepts both types but always outputs *one* type or the other. You can't have unicode output if you give unicode input to the base64 decoder in the general case. So decode, at least, has to always give bytes output. Likewise, there is small to zero utility for using unicode input to the base64 encoder, since the unicode would have to be ASCII only and there'd be no point in doing the encoding. So, the only thing that makes sense is to follow the "one output type" rule here. Now, you can argue whether or not it would make sense for the encoder to always produce unicode. However, you then immediately run into the backward compatibility issue: the primary use case of the base64 encoding is to produce *wire ready* bytes. This is what the email package uses it for, for example. So for backward compatibility reasons, which are consonant with its primary use case, it makes more sense for the encoder to produce bytes than string. If you need to transmit bytes over a unicode channel, you can decode it from ASCII. That is, unicode is the *exceptional* use case here, not the rule. That might in fact be changing, but for backward compatibility reasons, Python won't change. And that should answer Steve's original question :) --David _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com