[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: ... > > > files (you see "huge" is really relative ;-)) on 2-4GB RAM boxes and > > > setting a big buffer (1GB or more) reduces the wall time by 30 to 50% > > > compared to the default value. BerkeleyDB should have a buffering > > Out of curiosity, what OS and FS are you using? On a well-tuned FS and > > Fedora Core 4 and ext 3. Is there something I should do to the FS?
In theory, nothing. In practice, this is strange. > Which should I do? How much buffer should I allocate? I have a box > with 2GB memory. I'd be curious to see a read-only loop on the file, opened with (say) 1MB of buffer vs 30MB vs 1GB -- just loop on the lines, do a .split() on each, and do nothing with the results. What elapsed times do you measure with each buffersize...? If the huge buffers confirm their worth, it's time to take a nice critical look at what other processes you're running and what all are they doing to your disk -- maybe some daemon (or frequently-run cron entry, etc) is out of control...? You could try running the benchmark again in single-user mode (with essentially nothing else running) and see how the elapsed-time measurements change... Alex -- http://mail.python.org/mailman/listinfo/python-list