Improve module performance by reducing disk reads

kian tern Mon, 30 Mar 2009 00:52:10 -0700

Hi all,

I'm writing in Python for about 2 weeks (moved from Perl)
I've ported one of my modules which is a parser for a binary format (see
link bellow for the format specs)


http://etidweb.tamu.edu/cdrom0/image/stdf/spec.pdf

In the first version of the parser I was reading exactly the amount of data
I need to parse
For example 4 bytes per each header
The STDF files tend to be about 40+ MB size with 100K+ records, so I had at
least 1 disk read per record, sometimes 2.

Obviously it's not an efficient way to do it.
I've created a buffer which reads 4K chunks per read and then the module
parses the data.
If the buffer becomes less then 512B I read another chunk and so on.

Reducing 100K+ reads to around 15K reads should improve the performance.
For some reason it did not happen.
I've played with the chunk size, but nothing came out of it.
Is there a Python specific way to optimise reading from disk?

I'm using Python 2.5.2 with Ubuntu 8.10 32bit

Thanks in advance.

--
http://mail.python.org/mailman/listinfo/python-list

Improve module performance by reducing disk reads

Reply via email to