Hi all, I'm writing in Python for about 2 weeks (moved from Perl) I've ported one of my modules which is a parser for a binary format (see link bellow for the format specs)
http://etidweb.tamu.edu/cdrom0/image/stdf/spec.pdf In the first version of the parser I was reading exactly the amount of data I need to parse For example 4 bytes per each header The STDF files tend to be about 40+ MB size with 100K+ records, so I had at least 1 disk read per record, sometimes 2. Obviously it's not an efficient way to do it. I've created a buffer which reads 4K chunks per read and then the module parses the data. If the buffer becomes less then 512B I read another chunk and so on. Reducing 100K+ reads to around 15K reads should improve the performance. For some reason it did not happen. I've played with the chunk size, but nothing came out of it. Is there a Python specific way to optimise reading from disk? I'm using Python 2.5.2 with Ubuntu 8.10 32bit Thanks in advance.
-- http://mail.python.org/mailman/listinfo/python-list