On Jan 12, 2014, at 6:55 PM, Guido van Rossum <gu...@python.org> wrote:
> The key reason for introducing a separate bytes type in Python 3 is to > avoid *mixing* bytes and text. This aims to avoid the classic Python 2 > Unicode failure, where str+unicode fails or succeeds based on whether > str contains non-ASCII characters or not, which means it is easy to > miss in testing. +1 > > But this does not mean the bytes type isn't allowed to have a > noticeable bias in favor of encodings that are ASCII supersets, even > if not all bytes objects contain such data (e.g. image data, > compressed data, binary network packets, and so on). +1 > > IMO it's totally fine and consistent if b'%d' % 42 returns b'42' and > also for b'{}'.format(42) to return b'42'. There are numerous places > where bytes are already assumed to use an ASCII superset: > > - byte literals: b'abc' (it's a syntax error to have a non-ASCII character > here) > - the upper() and lower() methods modify the ASCII letter positions > - int(b'42') == 42, float(b'3.14') == 3.14 Completely Agree. > > I looked through the example code I recently write for asyncio (which > uses bytes for all data read or written). There are several places > where I have to make a clumsy detour via text strings because I need > to include an ASCII-encoded decimal integer (e.g. the Content-Length > header) or a hex-encoded one (e.g. for Transfer-Encoding: chunked). > Those detours aren't needed for parsing because int() accepts bytes > just fine. > > I also note that the behavior of the re module is perfect: if the > pattern is bytes, it can only match bytes and the extracted data is > bytes, and ditto for text -- so it supports both types but doesn't > allow mixing them. The urllib module does this too -- at considerable > cost in its implementation, but it's the right thing, because there > really are good cases to be made for treating URLs as text as well as > for treating them as bytes (as with filenames, command line arguments, > and environment variables). > > I'm sad that the json module in Python 3 doesn't support bytes at all, > but at least it is consistent -- it always produces text in ASCII > encoding (by default). The same applies to the http module, which IIUC > adheres to the standard by treating headers as Latin-1. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/donald%40stufft.io ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com