On Tue, 14 Feb 2006 15:13:25 -0800, Guido van Rossum <[EMAIL PROTECTED]> wrote:
>I'm about to send 6 or 8 replies to various salient messages in the >PEP 332 revival thread. That's probably a sign that there's still a >lot to be sorted out. In the mean time, to save you reading through >all those responses, here's a summary of where I believe I stand. >Let's continue the discussion in this new thread unless there are >specific hairs to be split in the other thread that aren't addressed >below or by later posts. > >Non-controversial (or almost): > >- we need a new PEP; PEP 332 won't cut it > >- no b"..." literal > >- bytes objects are mutable > >- bytes objects are composed of ints in range(256) > >- you can pass any iterable of ints to the bytes constructor, as long >as they are in range(256) > >- longs or anything with an __index__ method should do, too > >- when you index a bytes object, you get a plain int > >- repr(bytes[1,0 20, 30]) == 'bytes([10, 20, 30])' > >Somewhat controversial: > >- it's probably too big to attempt to rush this into 2.5 > >- bytes("abc") == bytes(map(ord, "abc")) > >- bytes("\x80\xff") == bytes(map(ord, "\x80\xff")) == bytes([128, 256]) > >Very controversial: > Given that ord/unichr and ord/chr work as encoding-agnostic function pairs symmetrically mapping between unicode and int or str and int, please consider the effect of this API as illustrated by how it works with the examples: >>> def bytes(arg, encoding=None): ... if isinstance(arg, str): ... if encoding: b = map(ord, arg.decode(encoding)) ... else: b = map(ord, arg) ... elif isinstance(arg, unicode): ... if encoding: raise ValueError( ... 'Use bytes(%r.encode(%r)) to avoid PY 3000 breakage'%(arg, encoding)) ... b = map(ord, arg) ... else: ... b = map(int, arg) ... if sum(1 for x in b if x<0 or x>255) > 0: ... raise ValueError('byte out of range') ... return 'bytes(%r)'%b ... ... Then >- bytes("abc", "encoding") == bytes("abc") # ignores the "encoding" argument (Use encoding, the only requirement is that all the resulting ord values be in range(0,256)) >>> bytes("abc\xf6", 'latin-1') 'bytes([97, 98, 99, 246])' >>> print unichr(246) ö >>> bytes("abc\xf6", 'cp437') 'bytes([97, 98, 99, 247])' >>> print unichr(247) ÷ > >- bytes(u"abc") == bytes("abc") # for ASCII at least >>> bytes(u"abc") 'bytes([97, 98, 99])' > >- bytes(u"\x80\xff") raises UnicodeError >>> bytes(u"\x80\xff") 'bytes([128, 255])' > >- bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff") >>> bytes(u"\x80\xff", "latin-1") Traceback (most recent call last): File "<stdin>", line 1, in ? File "<stdin>", line 6, in bytes ValueError: Use bytes(u'\x80\xff'.encode('latin-1')) to avoid PY 3000 breakage >>> bytes(u'\x80\xff'.encode('latin-1')) 'bytes([128, 255])' (If the characters exist in the encoding specified, it will work, otherwise raises exception. Assumes PY 3000 string encode results in bytes, so it should work there too ;-) of course, >>> bytes(u'\u1234') Traceback (most recent call last): File "<stdin>", line 1, in ? File "<stdin>", line 12, in bytes ValueError: byte out of range and >>> bytes([1,2]) 'bytes([1, 2])' >>> bytes([1,-1]) Traceback (most recent call last): File "<stdin>", line 1, in ? File "<stdin>", line 12, in bytes ValueError: byte out of range >>> bytes([1,256]) Traceback (most recent call last): File "<stdin>", line 1, in ? File "<stdin>", line 12, in bytes ValueError: byte out of range Interestingly, the internal map int on a sequence permits >>> bytes(["1", 2, 3L, True, 5.6]) 'bytes([1, 2, 3, 1, 5])' IOW, any sequence of objects that will convert themselves to int in range(0,256) will do. > >Martin von Loewis's alternative for the "very controversial" set is to >disallow an encoding argument and (I believe) also to disallow Unicode >arguments. In 3.0 this would leave us with s.encode(<encoding>) as the >only way to convert a string (which is always unicode) to bytes. The >problem with this is that there's no code that works in both 2.x and >3.0. > I hope Martin will reconsider, considering ord/unichr as a symmetric pair of functions mapping 1:1 to unicode (and ignoring the fact that this also happens to be the latin-1 mapping ;-) A test class should be easy, except deciding on appropriate methods and how the type should be defined. It's the same peculiar problem as str, i.e., length one would be compatible with int, but not other lengths. How do we do that? Regards, Bengt Richter
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com