On Feb 14, 2006, at 4:17 PM, Guido van Rossum wrote: > On 2/14/06, Bob Ippolito <[EMAIL PROTECTED]> wrote: >> On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote: >>> - we need a new PEP; PEP 332 won't cut it >>> >>> - no b"..." literal >>> >>> - bytes objects are mutable >>> >>> - bytes objects are composed of ints in range(256) >>> >>> - you can pass any iterable of ints to the bytes constructor, as >>> long >>> as they are in range(256) >> >> Sounds like array.array('B'). > > Sure. > >> Will the bytes object support the buffer interface? > > Do you want them to? > > I suppose they should *not* support the *text* part of that API.
I would imagine that it'd be convenient for integrating with existing extensions... e.g. initializing an array or Numeric array with one. >> Will it accept >> objects supporting the buffer interface in the constructor (or a >> class method)? If so, will it be a copy or a view? Current >> array.array behavior says copy. > > bytes() should always copy -- thanks for asking. I only really ask because it's worth fully specifying these things. Copy seems a lot more sensible given the rest of the interpreter and stdlib (e.g. buffer(x) seems to always return a read-only buffer). >>> - longs or anything with an __index__ method should do, too >>> >>> - when you index a bytes object, you get a plain int >> >> When slicing a bytes object, do you get another bytes object or a >> list? If its a bytes object, is it a copy or a view? Current >> array.array behavior says copy. > > Another bytes object which is a copy. > > (Why would you even think about views here? They are evil.) I mention views because that's what numpy/Numeric/numarray/etc. do... It's certainly convenient at times to have that functionality, for example, to work with only the alpha channel in an RGBA image. Probably too magical for the bytes type. >>> import numpy >>> image = numpy.array(list('RGBARGBARGBA')) >>> alpha = image[3::4] >>> alpha array([A, A, A], dtype=(string,1)) >>> alpha[:] = 'X' >>> image array([R, G, B, X, R, G, B, X, R, G, B, X], dtype=(string,1)) >>> Very controversial: >>> >>> - bytes("abc", "encoding") == bytes("abc") # ignores the "encoding" >>> argument >>> >>> - bytes(u"abc") == bytes("abc") # for ASCII at least >>> >>> - bytes(u"\x80\xff") raises UnicodeError >>> >>> - bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff") >>> >>> Martin von Loewis's alternative for the "very controversial" set >>> is to >>> disallow an encoding argument and (I believe) also to disallow >>> Unicode >>> arguments. In 3.0 this would leave us with s.encode(<encoding>) >>> as the >>> only way to convert a string (which is always unicode) to bytes. The >>> problem with this is that there's no code that works in both 2.x and >>> 3.0. >> >> Given a base64 or hex string, how do you get a bytes object out of >> it? Currently str.decode('base64') and str.decode('hex') are good >> solutions to this... but you get a str object back. > > I don't know -- you can propose an API you like here. base64 is as > likely to encode text as binary data, so I don't think it's wrong for > those things to return strings. That's kinda true I guess -- but you'd still need an encoding in py3k to turn base64 -> text. A lot of the current codecs infrastructure doesn't make sense in py3k -- for example, the 'zlib' encoding, which is really a bytes transform, or 'unicode_escape' which is a text transform. I suppose there aren't too many different ways you'd want to encode or decode data to binary (beyond the text codecs), they should probably just live in a module -- something like the binascii we have now. I do find the codecs infrastructure to be convenient at times (maybe too convenient), but since you're not interested in adding functions to existing types then a module seems like the best approach. -bob _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com