On Wed, Dec 31, 2008 at 15:32, Blanchette, Marco
<m...@stowers-institute.org>wrote:

> Dear all,
>
> I am trying to speed up a very long procedure that I need to run on
> multiple files and though that I could multithread different jobs on
> different files across multiple CPUs. For some reason that I don't really
> get, I only achieve very small time gain. I have included my script which
> essentially repeat the same function, extractSeq() on multiple files using a
> maximum of four threads.
>
> I would really appreciate if I could finally understand how to use threads
> to speed up some of my lengthy scripts.


Yeah, the problem is these things are let lose without some nice computer
Science back grounds ;(

The "better" method in my opinion (and sorry for the pseudo code) is:


main_begin
 init_list(@file_list)

 for i in 1 2 3 4
 do
  thread_create(&worker())
 done

 join_threads

end_main

Procedure worker()
start
 local flag=1;

 while(flag)
   do
    mutex_lock(lock)
    next=read_and_remove_from_list(@file_list)
    mutex_unlock(lock)
    if (next == NULL)
     then
       flag=0
     else
       process(next)
     fi
  done
  cleanup_thread_stuff()
return


Notice that you don't have any delays like sleep etc. (which could also
cause bad lags if the actuall processing for hundreds or more files takes
less than a second per file..) and keep the read_and_remove_from_list short
and simple, then you'll see optimal speed increase.. also since disk reads
are involved, you might find that a number_of_cpus+extra_threads (given RAM
is sufficient) would give better results... the number for extra_threads to
be determine with emperical evaluation on the specific system etc.

Reply via email to