Infinity77 wrote:
Hi All,
I am trying to speed up some code which reads a bunch of data from
a disk file. Just for the fun of it, I thought to try and use parallel
I/O to split the reading of the file between multiple processes.
Although I have been warned that concurrent access by multiple
processes to the same file may actually slow down the reading of the
file, I was curious to try some timings by varying the number of
processes which read the file. I know almost nothing of
multiprocessing, so I was wondering if anyone had some very simple
snippet of code which demonstrates how to read a file using
multiprocessing.
My idea was to create a "big" file by doing:
fid = open("somefile.txt", "wb")
fid.write("HELLO\n"*1e7)
fid.close()
and then using fid.seek() to point every process I start to a position
inside the file and start reading from there. For example, with 4
processes and a 10 MB file, I would tell the first process to read
from byte 0 to byte 2.5 million, the second one from 2.5 million to 5
million and so on. I just have an academic curiosity :-D
Any suggestion is very welcome, either to the approach or to the
actual implementation. Thank you for your help.
Andrea.
If the thing you would want to speed up is the processing of the file
(and not the IO), I would make one process actually read the file, and
feed the other processes with the data from the file through a queue.
--
http://mail.python.org/mailman/listinfo/python-list