Since the email contains no code, I can only assume you're using the
bult-in open() call, and file.read(). If you didn't specify a bufsize,
a "system default" is used. I seriously doubt if 0 is the default.
Since it's already buffered, additional buffering by you might have
little effect, or even negative effect. I'd suggest explicitly
specifying a buffer size in the open() call, starting with 4096, as a
good starting place. Then I'd do benchmarks with larger and smaller
values, to see what differences it might make.
The only time additional buffering tends to be useful is if you know the
file usage pattern, and it's predictable and not sequential. Even then,
it's good to know the underlying buffer's behavior, so that your
optimizations are not at cross purposes.
I'd expect your performance problems are elsewhere.
kian tern wrote:
Hi all,
I'm writing in Python for about 2 weeks (moved from Perl)
I've ported one of my modules which is a parser for a binary format (see
link bellow for the format specs)
http://etidweb.tamu.edu/cdrom0/image/stdf/spec.pdf
In the first version of the parser I was reading exactly the amount of data
I need to parse
For example 4 bytes per each header
The STDF files tend to be about 40+ MB size with 100K+ records, so I had at
least 1 disk read per record, sometimes 2.
Obviously it's not an efficient way to do it.
I've created a buffer which reads 4K chunks per read and then the module
parses the data.
If the buffer becomes less then 512B I read another chunk and so on.
Reducing 100K+ reads to around 15K reads should improve the performance.
For some reason it did not happen.
I've played with the chunk size, but nothing came out of it.
Is there a Python specific way to optimise reading from disk?
I'm using Python 2.5.2 with Ubuntu 8.10 32bit
Thanks in advance.
--
http://mail.python.org/mailman/listinfo/python-list