Re: [Python-Dev] an idea for improving struct.unpack api
On Fri, 7 Jan 2005 19:40:18 -0800 (PST), Ilya Sandler [EMAIL PROTECTED] wrote: Eg. I just looked at xdrlib.py code and it seems that almost every invocation of struct._unpack would shrink from 3 lines to 1 line of code (i = self.__pos self.__pos = j = i+4 data = self.__buf[i:j] return struct.unpack('l', data)[0] would become: return struct.unpack('l', self.__buf, self.__pos)[0] ) FWIW, I could read and understand your original code without any problems, whereas in the second version I would completely miss the fact that self.__pos is updated, precisely because mutating arguments are very rare in Python functions. OTOH, Nick's idea of returning a tuple with the new offset might make your example shorter without sacrificing readability: result, newpos = struct.unpack('l', self.__buf, self.__pos) self.__pos = newpos # retained newpos for readability... return result A third possibility - rather than magically adding an additional return value because you supply a position, you could have a where am I? format symbol (say by analogy with the C address of operator). Then you'd say result, newpos = struct.unpack('l', self.__buf, self.__pos) Please be aware, I don't have a need myself for this feature - my interest is as a potential reader of others' code... Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] an idea for improving struct.unpack api
First, let me say two things: (a) A higher-level API can and should be constructed which acts like a (binary) stream but has additional methods for reading and writing values using struct format codes (or, preferably, somewhat higher-level type names, as suggested). Instances of this API should be constructable from a stream or from a buffer (e.g. a string). (b) -1 on Ilya's idea of having a special object that acts as an input-output integer; it is too unpythonic (no matter your objection). [Paul Moore] OTOH, Nick's idea of returning a tuple with the new offset might make your example shorter without sacrificing readability: result, newpos = struct.unpack('l', self.__buf, self.__pos) self.__pos = newpos # retained newpos for readability... return result This is okay, except I don't want to overload this on unpack() -- let's pick a different function name like unpack_at(). A third possibility - rather than magically adding an additional return value because you supply a position, you could have a where am I? format symbol (say by analogy with the C address of operator). Then you'd say result, newpos = struct.unpack('l', self.__buf, self.__pos) Please be aware, I don't have a need myself for this feature - my interest is as a potential reader of others' code... I think that adding more magical format characters is probably not doing the readers of this code a service. I do like the idea of not introducing an extra level of tuple to accommodate the position return value but instead make it the last item in the tuple when using unpack_at(). Then the definition would be: def unpack_at(fmt, buf, pos): size = calcsize(fmt) end = pos + size data = buf[pos:end] if len(data) size: raise struct.error(not enough data for format) # if data is too long that would be a bug in buf[pos:size] and cause an error below ret = unpack(fmt, data) ret = ret + (end,) return ret -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] an idea for improving struct.unpack api
Bob Ippolito [EMAIL PROTECTED] writes: On Jan 6, 2005, at 8:17, Michael Hudson wrote: Ilya Sandler [EMAIL PROTECTED] writes: A problem: The current struct.unpack api works well for unpacking C-structures where everything is usually unpacked at once, but it becomes inconvenient when unpacking binary files where things often have to be unpacked field by field. Then one has to keep track of offsets, slice the strings,call struct.calcsize(), etc... IMO (and E), struct.unpack is the primitive atop which something more sensible is built. I've certainly tried to build that more sensible thing at least once, but haven't ever got the point of believing what I had would be applicable to the general case... maybe it's time to write such a thing for the standard library. This is my ctypes-like attempt at a high-level interface for struct. It works well for me in macholib: http://svn.red-bean.com/bob/py2app/trunk/src/macholib/ptypes.py Unsurprisingly, that's fairly similar to mine :) Cheers, mwh -- If trees could scream, would we be so cavalier about cutting them down? We might, if they screamed all the time, for no good reason. -- Jack Handey ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
RE: [Python-Dev] an idea for improving struct.unpack api
I will try to respond to all comments at once. But first a clarification: -I am not trying to design a high-level API on top of existing struct.unpack and -I am not trying to design a replacement for struct.unpack (If I were to replace struct.unpack(), then I would probably go along the lines of StructReader suggested by Raymond) I view struct module as a low-level (un)packing library on top on which a more complex stuff can be built and I am simply suggesting a way to improve this low level functionality... We could have an optional offset argument for unpack(format, buffer, offset=None) the offset argument is an object which contains a single integer field which gets incremented inside unpack() to point to the next byte. As for passing offset implies the length is calcsize(fmt) sub-concept, I find that slightly more controversial. It's convenient, but somewhat ambiguous; in other cases (e.g. string methods) passing a start/offset and no end/length means to go to the end. I am not sure I agree: in most cases starting offset and no length/end just means: start whatever you are doing at this offset and stop it whenever you are happy.. At least that's the way I was alway thinking about functions like string.find() and friends Suggested struct.unpack() change seems to fit this mental model very well the offset argument is an object which contains a single integer field which gets incremented inside unpack() to point to the next byte. I find this just too magical. Why would it be magical? There is no guessing of user intentions involved. The function simply returns/uses an extra piece of information if the user asks for it. And the function already computes this piece of information.. It's only useful when you're specifically unpacking data bytes that are compactly back to back (no filler e.g. for alignment purposes) Yes, but it's a very common case when dealing with binary files formats. Eg. I just looked at xdrlib.py code and it seems that almost every invocation of struct._unpack would shrink from 3 lines to 1 line of code (i = self.__pos self.__pos = j = i+4 data = self.__buf[i:j] return struct.unpack('l', data)[0] would become: return struct.unpack('l', self.__buf, self.__pos)[0] ) There are probably other places in stdlib which would benefit from this api and stdlib does not deal with binary files that much.. and pays some conceptual price -- introducing a new specialized type to play the role of mutable int but the user does not have to pay anything if he does not need it! The change is backward compatible. (Note that just supporting int offsets would eliminate slicing, but it would not eliminate other annoyances, and it's possible to support both Offset and int args, is it worth the hassle?) and having an argument mutated, which is not usual in Python's library. Actually, it's so common that we simply stop noticing it :-) Eg. when we call a superclass's method: SuperClass.__init__(self) So, while I agree that there is an element of unusualness in the suggested unpack() API, this element seems pretty small to me All in all, I suspect that something like. hdrsize = struct.calcsize(hdr_fmt) itemsize = struct.calcsize(item_fmt) reclen = length_of_each_record rec = binfile.read(reclen) hdr = struct.unpack(hdr_fmt, rec, 0, hdrsize) for offs in itertools.islice(xrange(hdrsize, reclen, itemsize), hdr[0]): item = struct.unpack(item_fmt, rec, offs, itemsize) # process item might be a better compromise I think I again disagree: your example is almost as verbose as the current unpack() api and you still need to call calcsize() explicitly and I don't think there is any chance of gaining any noticeable perfomance benefit. Too little gain to bother with any changes... struct.pack/struct.unpack is already one of my least-favourite parts of the stdlib. Of the modules I use regularly, I pretty much only ever have to go back and re-read the struct (and re) documentation because they just won't fit in my brain. Adding additional complexity to them seems like a net loss to me. Net loss to the end programmer? But if he does not need new functionality he doesnot have to use it! In fact, I started with providing an example of how new api makes client code simpler I'd much rather specify the format as something like a tuple of values - (INT, UINT, INT, STRING) (where INT c are objects defined in the struct module). This also then allows users to specify their own formats if they have a particular need for something I don't disagree, but I think it's orthogonal to offset issue Ilya On Thu, 6 Jan 2005, Raymond Hettinger wrote: [Ilya Sandler] A problem: The current struct.unpack api works well for unpacking C-structures where everything is usually unpacked at once, but it becomes inconvenient when unpacking binary files where things often have to
Re: [Python-Dev] an idea for improving struct.unpack api
How about making offset a standard integer, and change the signature to return tuple when it is used: item, offset = unpack(format, rec, offset) # Partial unpacking Well, it would work well when unpack results are assigned to individual vars: x,y,offset=unpack( ii, rec, offset) but it gets more complicated if you have something like: coords=unpack(10i, rec) How would you pass/return offsets here? As an extra element in coords? coords=unpack(10i, rec, offset) offset=coords.pop() But that would be counterintuitive and somewhat inconvinient.. Ilya On Sat, 8 Jan 2005, Nick Coghlan wrote: Ilya Sandler wrote: item=unpack( , rec, offset) How about making offset a standard integer, and change the signature to return a tuple when it is used: item = unpack(format, rec) # Full unpacking offset = 0 item, offset = unpack(format, rec, offset) # Partial unpacking The second item in the returned tuple being the offset of the first byte after the end of the unpacked item. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.skystorm.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ilya%40bluefir.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] an idea for improving struct.unpack api
On Jan 6, 2005, at 8:17, Michael Hudson wrote: Ilya Sandler [EMAIL PROTECTED] writes: A problem: The current struct.unpack api works well for unpacking C-structures where everything is usually unpacked at once, but it becomes inconvenient when unpacking binary files where things often have to be unpacked field by field. Then one has to keep track of offsets, slice the strings,call struct.calcsize(), etc... IMO (and E), struct.unpack is the primitive atop which something more sensible is built. I've certainly tried to build that more sensible thing at least once, but haven't ever got the point of believing what I had would be applicable to the general case... maybe it's time to write such a thing for the standard library. This is my ctypes-like attempt at a high-level interface for struct. It works well for me in macholib: http://svn.red-bean.com/bob/py2app/trunk/src/macholib/ptypes.py -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
RE: [Python-Dev] an idea for improving struct.unpack api
[Ilya Sandler] A problem: The current struct.unpack api works well for unpacking C-structures where everything is usually unpacked at once, but it becomes inconvenient when unpacking binary files where things often have to be unpacked field by field. Then one has to keep track of offsets, slice the strings,call struct.calcsize(), etc... Yes. That bites. Eg. with a current api unpacking of a record which consists of a header followed by a variable number of items would go like this hdr_fmt= item_fmt= item_size=calcsize(item_fmt) hdr_size=calcsize(hdr_fmt) hdr=unpack(hdr_fmt, rec[0:hdr_size]) #rec is the record to unpack offset=hdr_size for i in range(hdr[0]): #assume 1st field of header is a counter item=unpack( item_fmt, rec[ offset: offset+item_size]) offset+=item_size which is quite inconvenient... A solution: We could have an optional offset argument for unpack(format, buffer, offset=None) the offset argument is an object which contains a single integer field which gets incremented inside unpack() to point to the next byte. so with a new API the above code could be written as offset=struct.Offset(0) hdr=unpack(, offset) for i in range(hdr[0]): item=unpack( , rec, offset) When an offset argument is provided, unpack() should allow some bytes to be left unpacked at the end of the buffer.. Does this suggestion make sense? Any better ideas? Rather than alter struct.unpack(), I suggest making a separate class that tracks the offset and encapsulates some of the logic that typically surrounds unpacking: r = StructReader(rec) hdr = r('') for item in r.getgroups('', times=rec[0]): . . . It would be especially nice if it handled the more complex case where the next offset is determined in-part by the data being read (see the example in section 11.3 of the tutorial): r = StructReader(open('myfile.zip', 'rb')) for i in range(3): # show the first 3 file headers fields = r.getgroup('LLLHH', offset=14) crc32, comp_size, uncomp_size, filenamesize, extra_size = fields filename = g.getgroup('c', offset=16, times=filenamesize) extra = g.getgroup('c', times=extra_size) r.advance(comp_size) print filename, hex(crc32), comp_size, uncomp_size If you come up with something, I suggest posting it as an ASPN recipe and then announcing it on comp.lang.python. That ought to generate some good feedback based on other people's real world issues with struct.unpack(). Raymond Hettinger ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com