On Wed, Dec 31, 2008 at 15:32, Blanchette, Marco
<[email protected]>wrote:
> Dear all,
>
> I am trying to speed up a very long procedure that I need to run on
> multiple files and though that I could multithread different jobs on
> different files across multiple CPUs. For some reason that I don't really
> get, I only achieve very small time gain. I have included my script which
> essentially repeat the same function, extractSeq() on multiple files using a
> maximum of four threads.
>
> I would really appreciate if I could finally understand how to use threads
> to speed up some of my lengthy scripts.
Yeah, the problem is these things are let lose without some nice computer
Science back grounds ;(
The "better" method in my opinion (and sorry for the pseudo code) is:
main_begin
init_list(@file_list)
for i in 1 2 3 4
do
thread_create(&worker())
done
join_threads
end_main
Procedure worker()
start
local flag=1;
while(flag)
do
mutex_lock(lock)
next=read_and_remove_from_list(@file_list)
mutex_unlock(lock)
if (next == NULL)
then
flag=0
else
process(next)
fi
done
cleanup_thread_stuff()
return
Notice that you don't have any delays like sleep etc. (which could also
cause bad lags if the actuall processing for hundreds or more files takes
less than a second per file..) and keep the read_and_remove_from_list short
and simple, then you'll see optimal speed increase.. also since disk reads
are involved, you might find that a number_of_cpus+extra_threads (given RAM
is sufficient) would give better results... the number for extra_threads to
be determine with emperical evaluation on the specific system etc.