Re: [Tutor] Reading binary files #2

eShopping Mon, 09 Feb 2009 11:42:07 -0800

Hi Bob

some replies below. One thing I noticed with the "full" file wasthat I ran into problems when the number of records was 10500, andthe file read got misaligned. Presumably 10500 is still within therange of int?


Best regards

Alun


At 17:49 09/02/2009, bob gailer wrote:

etrade.griffi...@dsl.pipex.com wrote:
Hi
following last week's discussion with Bob Gailer about readingunformatted FORTRAN files, I have attached an example of the filein ASCII format and the equivalent unformatted version.
Thank you. It is good to have real data to work with.
Below is some code that works OK until it gets to a data item thathas no additional associated data, then seems to have got 4 bytesahead of itself.
Thank you. It is good to have real code to work with.
I though I had trapped this but it appears not. I think the issueis asociated with "newline" characters or the unformatted equivalent.
I think not, But we will see.
I fail to see where the problem is. The data printed below seems toagree with the files you sent. What am I missing?

When I run the program it exits in the middle but should run throughto the end. The output to the console was

236 ('\x00\x00\x00\x10', 'DATABEGI', 0, 'MESS','\x00\x00\x00\x10\x00\x00\x00\x10')264 ('TIME', ' \x00\x00\x00\x01', 1380270412, '\x00\x00\x00\x10','\x00\x00\x00\x04\x00\x00\x00\x00')

Here "TIME" is in vals[0] when it should be in vals[1] and so on. Ifound the problem earlier today and I re-wrote the main loop asfollows (before I saw your helpful coding style comments):


while stop < nrec:

    # extract data structure

    start, stop = stop, stop + struct.calcsize('4s8si4s4s')
    vals = struct.unpack('>4s8si4s4s', data[start:stop])
    items.extend(vals[1:4])
    print stop, vals

    # define format of subsequent data

    nval = int(vals[2])

    if vals[3] == 'INTE':
        fmt_string = '>i'
    elif vals[3] == 'CHAR':
        fmt_string = '>8s'
    elif vals[3] == 'LOGI':
        fmt_string = '>i'
    elif vals[3] == 'REAL':
        fmt_string = '>f'
    elif vals[3] == 'DOUB':
        fmt_string = '>d'
    elif vals[3] == 'MESS':
        fmt_string = '>%ds' % nval
    else:
        print "Unknown data type ... exiting"
        print items[-40:]
        sys.exit(0)

    # leading spaces

    if nval > 0:
        start, stop = stop, stop + struct.calcsize('4s')
        vals = struct.unpack('4s', data[start:stop])

    # extract data

    for i in range(0,nval):
        start, stop = stop, stop + struct.calcsize(fmt_string)
        vals = struct.unpack(fmt_string, data[start:stop])
        items.extend(vals)

    # trailing spaces

    if nval > 0:
        start, stop = stop, stop + struct.calcsize('4s')
        vals = struct.unpack('4s', data[start:stop])

Now I get this output

232 ('\x00\x00\x00\x10', 'DATABEGI', 0, 'MESS', '\x00\x00\x00\x10')
256 ('\x00\x00\x00\x10', 'TIME    ', 1, 'REAL', '\x00\x00\x00\x10')

and the script runs to the end

FWIW a few observations re coding style and techniques.

1) put the formats in a dictionary before the while loop:
formats = {'INTE': '>i', 'CHAR': '>8s', 'LOGI': '>i', 'REAL': '>f','DOUB': '>d', 'MESS': ''>d,}
2) retrieve the format in the while loop from the dictionary:
format = formats[vals[3]]


Neat!!

3) condense the 3 infile lines:
data = open("test.bin","rb").read()

I still don't quite trust myself to "chain" functions together, but Iguess that's lack of practice

4) nrec is a misleading name (to me it means # of records), nbyteswould be better.


Agreed

5) Be consistent with the format between calcsize and unpack:
struct.calcsize('>4s8si4s8s')

6) Use meaningful variable names instead of val for the unpacked data:
blank, name, length, typ = struct.unpack ... etc


Will do

7) The format for MESS should be '>d' rather than '>%dd' % nval.When nval is 0 the for loop will make 0 cycles.

Wasn't sure about that one. "MESS" implies string but I wasn't surewhat to do about a zero-length string

8) You don't have a format for DATA (BEGI); therefore the priorformat (for CHAR) is being applied. The formats are the same so itdoes not matter but could be confusing later.

DATABEGI should be a keyword to indicate the start of the "proper"data which has format MESS (ie string). You did make me look againat the MESS format and it should be '>%ds' % nval and not '>%dd' % nval



_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Reading binary files #2

Reply via email to