On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote: > Hi! > I have a really long binary file that I want to read. > The way I am doing it now is: > > for i in xrange(N): # N is about 10,000,000 > time = struct.unpack('=HHHH', infile.read(8)) > # do something > tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))
I assume that is supposed to be infile.read() > # do something > > Each loop takes about 0.2 ms in my computer, which means the whole for loop > takes 2000 seconds. You're reading 400 million bytes, or 400MB, in about half an hour. Whether that's fast or slow depends on what the "do something" lines are doing. > I would like it to run faster. > Do you have any suggestions? Disk I/O is slow, so don't read from files in tiny little chunks. Read a bunch of records into memory, then process them. # UNTESTED! rsize = 8 + 32 # record size for i in xrange(N//1000): buffer = infile.read(rsize*1000) # read 1000 records at once for j in xrange(1000): # process each record offset = j*rsize time = struct.unpack('=HHHH', buffer[offset:offset+8]) # do something tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize]) # do something (Now I'm just waiting for somebody to tell me that file.read() already buffers reads...) -- Steven D'Aprano -- http://mail.python.org/mailman/listinfo/python-list