On 2/13/06, Barry Warsaw <[EMAIL PROTECTED]> wrote: > This makes me think I want an unsigned byte type, which b[0] would > return. In another thread I think someone mentioned something about > fixed width integral types, such that you could have an object that > was guaranteed to be 8-bits wide, 16-bits wide, etc. Maybe you also > want signed and unsigned versions of each. This may seem like YAGNI > to many people, but as I've been working on a tightly embedded/ > extended application for the last few years, I've definitely had > occasions where I wish I could more closely and more directly model > my C values as Python objects (without using the standard workarounds > or writing my own C extension types).
So I'm taking that the specific properties you want to model are the overflow behavior, right? N-bit unsigned is defined as arithmethic mod 2**N; N-bit signed is a bit more tricky to define but similar. These never overflow but instead just throw away bits in an exactly specified manner (2's complement arithmetic). While I personally am comfortable with writing (x+y) & 0xFFFF (for 16-bit unsigned), I can see that someone who spends a lot of time doing arithmetic in this field might want specialized types. But I'm not sure that that's what the Numeric folks want -- I believe they're more interested in saving space, not in the mod 2**N properties. So (here I'm to some extent guessing) they have different array types whose elements are ints or floats of various widths; I'm guessing they also have scalars of those widths for consistency or to guide the creation of new arrays from scalars. I wouldn't be surprised if, rather than requiring N-bit 2's complement, they would prefer more flexible control over overflow -- e.g. ignore, warn, error, turn into NaN, etc. > But anyway, without hyper-generalizing, it's still worth asking > whether a bytes type is just a container of byte objects, where the > contained objects would be distinct, fixed 8-bit unsigned integral > types. There's certainly a point to treating bytes as ints; I don't know if it's more compelling than to treating them as unit bytes. But if we decide that the bytes types contains ints, b[0] should return a plain int (whose value necessarily is in range(0, 256)), not some new unsigned-8-bit type. And creating a bytes object from a list of ints should accept any input values as long as their __index__ value is in that same range. I.e. bytes([1, 2L]) should be the same as bytes([1L, 2]); and bytes([-1]) should raise a ValueError. > > There's also the consideration for APIs that, informally, accept > > either a string or a sequence of objects. Many of these exist, and > > they are probably all being converted to support unicode as well as > > str (if it makes sense at all). Should a bytes object be considered as > > a sequence of things, or as a single thing, from the POV of these > > types of APIs? Should we try to standardize how code tests for the > > difference? (Currently all sorts of shortcuts are being taken, from > > isinstance(x, (list, tuple)) to isinstance(x, basestring).) > > I think bytes objects are very much like string objects today -- > they're the photons of Python since they can act like either > sequences or scalars, depending on the context. For example, we have > code that needs to deal with situations where an API can return > either a scalar or a sequence of those scalars. So we have a utility > function like this: > > def thingiter(obj): > try: > it = iter(obj) > except TypeError: > yield obj > else: > for item in it: > yield item > > Maybe there's a better way to do this, but the most obvious problem > is that (for our use cases), this fails for strings because in this > context we want strings to act like scalars. So we add a little test > just before the "try:" like "if isinstance(obj, basestring): yield > obj". But that's yucky. > > I don't know what the solution is -- if there /is/ a solution short > of special case tests like above, but I think the key observation is > that sometimes you want your string to act like a sequence and > sometimes you want it to act like a scalar. I suspect bytes objects > will be the same way. I agree it's icky, and I'd rather not design APIs like that -- but I can't help it that others continue to want to use that idiom. I also agree that most likely we'll want to treat bytes the same as strings here. But no basestring (bytes are mutable and don't behave like sequences of characters). -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com