Re: Improve module performance by reducing disk reads

Dave Angel Mon, 30 Mar 2009 06:18:37 -0700

Since the email contains no code, I can only assume you're using thebult-in open() call, and file.read(). If you didn't specify a bufsize,a "system default" is used. I seriously doubt if 0 is the default.

Since it's already buffered, additional buffering by you might havelittle effect, or even negative effect. I'd suggest explicitlyspecifying a buffer size in the open() call, starting with 4096, as agood starting place. Then I'd do benchmarks with larger and smallervalues, to see what differences it might make.

The only time additional buffering tends to be useful is if you know thefile usage pattern, and it's predictable and not sequential. Even then,it's good to know the underlying buffer's behavior, so that youroptimizations are not at cross purposes.


I'd expect your performance problems are elsewhere.

kian tern wrote:

Hi all,

I'm writing in Python for about 2 weeks (moved from Perl)
I've ported one of my modules which is a parser for a binary format (see
link bellow for the format specs)

http://etidweb.tamu.edu/cdrom0/image/stdf/spec.pdf

In the first version of the parser I was reading exactly the amount of data
I need to parse
For example 4 bytes per each header
The STDF files tend to be about 40+ MB size with 100K+ records, so I had at
least 1 disk read per record, sometimes 2.

Obviously it's not an efficient way to do it.
I've created a buffer which reads 4K chunks per read and then the module
parses the data.
If the buffer becomes less then 512B I read another chunk and so on.

Reducing 100K+ reads to around 15K reads should improve the performance.
For some reason it did not happen.
I've played with the chunk size, but nothing came out of it.
Is there a Python specific way to optimise reading from disk?

I'm using Python 2.5.2 with Ubuntu 8.10 32bit

Thanks in advance.

--
http://mail.python.org/mailman/listinfo/python-list

Re: Improve module performance by reducing disk reads

Reply via email to