Re: [Python-Dev] an idea for improving struct.unpack api

2005-01-08 Thread Paul Moore
On Fri, 7 Jan 2005 19:40:18 -0800 (PST), Ilya Sandler [EMAIL PROTECTED] wrote:
 Eg. I just looked at xdrlib.py code and it seems that almost every
 invocation of struct._unpack would shrink from 3 lines to 1 line of code
 
 (i = self.__pos
 self.__pos = j = i+4
 data = self.__buf[i:j]
 return struct.unpack('l', data)[0]
 
 would become:
 return struct.unpack('l', self.__buf, self.__pos)[0]
 )

FWIW, I could read and understand your original code without any
problems, whereas in the second version I would completely miss the
fact that self.__pos is updated, precisely because mutating arguments
are very rare in Python functions.

OTOH, Nick's idea of returning a tuple with the new offset might make
your example shorter without sacrificing readability:

result, newpos = struct.unpack('l', self.__buf, self.__pos)
self.__pos = newpos # retained newpos for readability...
return result

A third possibility - rather than magically adding an additional
return value because you supply a position, you could have a where am
I? format symbol (say  by analogy with the C address of operator).
Then you'd say

result, newpos = struct.unpack('l', self.__buf, self.__pos)

Please be aware, I don't have a need myself for this feature - my
interest is as a potential reader of others' code...

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] an idea for improving struct.unpack api

2005-01-08 Thread Guido van Rossum
First, let me say two things:

(a) A higher-level API can and should be constructed which acts like a
(binary) stream but has additional methods for reading and writing
values using struct format codes (or, preferably, somewhat
higher-level type names, as suggested). Instances of this API should
be constructable from a stream or from a buffer (e.g. a string).

(b) -1 on Ilya's idea of having a special object that acts as an
input-output integer; it is too unpythonic (no matter your objection).

[Paul Moore]
 OTOH, Nick's idea of returning a tuple with the new offset might make
 your example shorter without sacrificing readability:
 
 result, newpos = struct.unpack('l', self.__buf, self.__pos)
 self.__pos = newpos # retained newpos for readability...
 return result

This is okay, except I don't want to overload this on unpack() --
let's pick a different function name like unpack_at().

 A third possibility - rather than magically adding an additional
 return value because you supply a position, you could have a where am
 I? format symbol (say  by analogy with the C address of operator).
 Then you'd say
 
 result, newpos = struct.unpack('l', self.__buf, self.__pos)
 
 Please be aware, I don't have a need myself for this feature - my
 interest is as a potential reader of others' code...

I think that adding more magical format characters is probably not
doing the readers of this code a service.

I do like the idea of not introducing an extra level of tuple to
accommodate the position return value but instead make it the last
item in the tuple when using unpack_at().

Then the definition would be:

def unpack_at(fmt, buf, pos):
size = calcsize(fmt)
end = pos + size
data = buf[pos:end]
if len(data)  size:
raise struct.error(not enough data for format)
# if data is too long that would be a bug in buf[pos:size] and
cause an error below
ret = unpack(fmt, data)
ret = ret + (end,)
return ret

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] an idea for improving struct.unpack api

2005-01-07 Thread Michael Hudson
Bob Ippolito [EMAIL PROTECTED] writes:

 On Jan 6, 2005, at 8:17, Michael Hudson wrote:

 Ilya Sandler [EMAIL PROTECTED] writes:

 A problem:

 The current struct.unpack api works well for unpacking C-structures
 where
 everything is usually unpacked at once, but it
 becomes  inconvenient when unpacking binary files where things
 often have to be unpacked field by field. Then one has to keep track
 of offsets, slice the strings,call struct.calcsize(), etc...

 IMO (and E), struct.unpack is the primitive atop which something more
 sensible is built.  I've certainly tried to build that more sensible
 thing at least once, but haven't ever got the point of believing what
 I had would be applicable to the general case... maybe it's time to
 write such a thing for the standard library.

 This is my ctypes-like attempt at a high-level interface for struct.
 It works well for me in macholib:
 http://svn.red-bean.com/bob/py2app/trunk/src/macholib/ptypes.py

Unsurprisingly, that's fairly similar to mine :)

Cheers,
mwh

-- 
  If trees could scream, would we be so cavalier about cutting them
  down? We might, if they screamed all the time, for no good reason.
-- Jack Handey
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] an idea for improving struct.unpack api

2005-01-07 Thread Ilya Sandler
I will try to respond to all comments at once.

But first a clarification:
   -I am not trying to design a high-level API on top of existing
   struct.unpack and
   -I am not trying to design a replacement for struct.unpack
   (If I were to replace struct.unpack(), then I would probably go along
  the lines of StructReader suggested by Raymond)

I view struct module as a low-level (un)packing library on top on which
a more complex stuff can be built and I am simply suggesting a way to
improve this low level functionality...


  We could have an optional offset argument for
 
  unpack(format, buffer, offset=None)
 
  the offset argument is an object which contains a single integer field
  which gets incremented inside unpack() to point to the next byte.

 As for passing offset implies the length is calcsize(fmt) sub-concept,
 I find that slightly more controversial.  It's convenient,
 but somewhat ambiguous; in other cases (e.g. string methods) passing a
 start/offset and no end/length means to go to the end.

I am not sure I agree: in most cases starting offset and no
length/end just means: start whatever you are doing at this offset and
stop it whenever you are happy..

At least that's the way I was alway thinking about functions like
string.find() and friends

Suggested struct.unpack() change seems to fit this mental model very well

 the offset argument is an object which contains a single integer field
 which gets incremented inside unpack() to point to the next byte.

 I find this just too magical.

Why would it be magical? There is no guessing of user intentions involved.
The function simply  returns/uses an extra piece of information if the user
 asks for it. And the function already computes this piece of
information..


 It's only useful when you're specifically unpacking data bytes that are
  compactly back to back (no filler e.g. for alignment purposes)

Yes, but it's a very common case when dealing with binary files formats.

Eg. I just looked at xdrlib.py code and it seems that almost every
invocation of struct._unpack would shrink from 3 lines to 1 line of code

(i = self.__pos
self.__pos = j = i+4
data = self.__buf[i:j]
return struct.unpack('l', data)[0]

would become:
return struct.unpack('l', self.__buf, self.__pos)[0]
)


There are probably other places in stdlib which would benefit from this
api and stdlib does not deal with binary files that much..

and pays some conceptual price -- introducing a new specialized type
 to play the role of mutable int

but the user does not have to pay anything if he does not need it! The
change is backward compatible. (Note that just supporting int offsets
would eliminate slicing, but it would not eliminate other annoyances,
and it's  possible to support both Offset and int args, is it worth the
hassle?)

 and having an argument mutated, which is not usual in Python's library.

Actually, it's so common that we simply stop noticing it :-)
Eg. when we call a superclass's method:
  SuperClass.__init__(self)

So, while I agree that there is an element of unusualness in the
suggested unpack() API, this element seems pretty small to me

 All in all, I suspect that something like.
 hdrsize = struct.calcsize(hdr_fmt)
 itemsize = struct.calcsize(item_fmt)
 reclen = length_of_each_record
 rec = binfile.read(reclen)
 hdr = struct.unpack(hdr_fmt, rec, 0, hdrsize)
 for offs in itertools.islice(xrange(hdrsize, reclen, itemsize),
hdr[0]):
 item = struct.unpack(item_fmt, rec, offs, itemsize)
 # process item
might be a better compromise

I think I again disagree: your example is almost as verbose as the current
unpack() api and you still need to call calcsize() explicitly and I
don't think there is any chance of gaining any noticeable
perfomance benefit. Too little gain to bother with any changes...


 struct.pack/struct.unpack is already one of my least-favourite parts
 of the stdlib.  Of the modules I use regularly, I pretty much only ever
 have to go back and re-read the struct (and re) documentation because
 they just won't fit in my brain. Adding additional complexity to them
 seems like a net loss to me.

Net loss to the end programmer? But if he does not need new
functionality he doesnot have to use it! In fact, I started with providing
an example of how new api makes client code simpler


 I'd much rather specify the format as something like a tuple of values -
(INT, UINT, INT, STRING) (where INT c are objects defined in the
struct module). This also then allows users to specify their own formats
if they have a particular need for something

I don't disagree, but I think it's orthogonal to offset issue


Ilya




On Thu, 6 Jan 2005, Raymond Hettinger wrote:

 [Ilya Sandler]
  A problem:
 
  The current struct.unpack api works well for unpacking C-structures
 where
  everything is usually unpacked at once, but it
  becomes  inconvenient when unpacking binary files where things
  often have to 

Re: [Python-Dev] an idea for improving struct.unpack api

2005-01-07 Thread Ilya Sandler
 How about making offset a standard integer, and change the signature to
 return  tuple when it is used:
  item, offset = unpack(format, rec, offset) # Partial unpacking

Well, it would work well when unpack results are assigned to individual
vars:

   x,y,offset=unpack( ii, rec, offset)

but it gets more complicated if you have something like:
   coords=unpack(10i, rec)

How would you pass/return offsets here? As an extra element in coords?
   coords=unpack(10i, rec, offset)
   offset=coords.pop()

But that would be counterintuitive and somewhat inconvinient..

Ilya




On Sat, 8 Jan 2005, Nick Coghlan wrote:

 Ilya Sandler wrote:
  item=unpack( , rec, offset)

 How about making offset a standard integer, and change the signature to 
 return a
 tuple when it is used:

item = unpack(format, rec) # Full unpacking
offset = 0
item, offset = unpack(format, rec, offset) # Partial unpacking

 The second item in the returned tuple being the offset of the first byte after
 the end of the unpacked item.

 Cheers,
 Nick.

 --
 Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
 ---
  http://boredomandlaziness.skystorm.net
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/ilya%40bluefir.net

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] an idea for improving struct.unpack api

2005-01-06 Thread Bob Ippolito
On Jan 6, 2005, at 8:17, Michael Hudson wrote:
Ilya Sandler [EMAIL PROTECTED] writes:
A problem:
The current struct.unpack api works well for unpacking C-structures 
where
everything is usually unpacked at once, but it
becomes  inconvenient when unpacking binary files where things
often have to be unpacked field by field. Then one has to keep track
of offsets, slice the strings,call struct.calcsize(), etc...
IMO (and E), struct.unpack is the primitive atop which something more
sensible is built.  I've certainly tried to build that more sensible
thing at least once, but haven't ever got the point of believing what
I had would be applicable to the general case... maybe it's time to
write such a thing for the standard library.
This is my ctypes-like attempt at a high-level interface for struct.  
It works well for me in macholib:  
http://svn.red-bean.com/bob/py2app/trunk/src/macholib/ptypes.py

-bob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] an idea for improving struct.unpack api

2005-01-05 Thread Raymond Hettinger
[Ilya Sandler]
 A problem:
 
 The current struct.unpack api works well for unpacking C-structures
where
 everything is usually unpacked at once, but it
 becomes  inconvenient when unpacking binary files where things
 often have to be unpacked field by field. Then one has to keep track
 of offsets, slice the strings,call struct.calcsize(), etc...

Yes.  That bites.


 Eg. with a current api unpacking  of a record which consists of a
 header followed by a variable  number of items would go like this
 
  hdr_fmt=
  item_fmt=
  item_size=calcsize(item_fmt)
  hdr_size=calcsize(hdr_fmt)
  hdr=unpack(hdr_fmt, rec[0:hdr_size]) #rec is the record to unpack
  offset=hdr_size
  for i in range(hdr[0]): #assume 1st field of header is a counter
item=unpack( item_fmt, rec[ offset: offset+item_size])
offset+=item_size
 
 which is quite inconvenient...
 
 
 A  solution:
 
 We could have an optional offset argument for
 
 unpack(format, buffer, offset=None)
 
 the offset argument is an object which contains a single integer field
 which gets incremented inside unpack() to point to the next byte.
 
 so with a new API the above code could be written as
 
  offset=struct.Offset(0)
  hdr=unpack(, offset)
  for i in range(hdr[0]):
 item=unpack( , rec, offset)
 
 When an offset argument is provided, unpack() should allow some bytes
to
 be left unpacked at the end of the buffer..
 
 
 Does this suggestion make sense? Any better ideas?

Rather than alter struct.unpack(), I suggest making a separate class
that tracks the offset and encapsulates some of the logic that typically
surrounds unpacking:

r = StructReader(rec)
hdr = r('')
for item in r.getgroups('', times=rec[0]):
   . . .

It would be especially nice if it handled the more complex case where
the next offset is determined in-part by the data being read (see the
example in section 11.3 of the tutorial):

r = StructReader(open('myfile.zip', 'rb'))
for i in range(3):  # show the first 3 file headers
fields = r.getgroup('LLLHH', offset=14)
crc32, comp_size, uncomp_size, filenamesize, extra_size = fields
filename = g.getgroup('c', offset=16, times=filenamesize)
extra = g.getgroup('c', times=extra_size)
r.advance(comp_size)
print filename, hex(crc32), comp_size, uncomp_size

If you come up with something, I suggest posting it as an ASPN recipe
and then announcing it on comp.lang.python.  That ought to generate some
good feedback based on other people's real world issues with
struct.unpack().


Raymond Hettinger

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com