On Wed, Dec 31, 2008 at 15:32, Blanchette, Marco <m...@stowers-institute.org>wrote:
> Dear all, > > I am trying to speed up a very long procedure that I need to run on > multiple files and though that I could multithread different jobs on > different files across multiple CPUs. For some reason that I don't really > get, I only achieve very small time gain. I have included my script which > essentially repeat the same function, extractSeq() on multiple files using a > maximum of four threads. > > I would really appreciate if I could finally understand how to use threads > to speed up some of my lengthy scripts. Yeah, the problem is these things are let lose without some nice computer Science back grounds ;( The "better" method in my opinion (and sorry for the pseudo code) is: main_begin init_list(@file_list) for i in 1 2 3 4 do thread_create(&worker()) done join_threads end_main Procedure worker() start local flag=1; while(flag) do mutex_lock(lock) next=read_and_remove_from_list(@file_list) mutex_unlock(lock) if (next == NULL) then flag=0 else process(next) fi done cleanup_thread_stuff() return Notice that you don't have any delays like sleep etc. (which could also cause bad lags if the actuall processing for hundreds or more files takes less than a second per file..) and keep the read_and_remove_from_list short and simple, then you'll see optimal speed increase.. also since disk reads are involved, you might find that a number_of_cpus+extra_threads (given RAM is sufficient) would give better results... the number for extra_threads to be determine with emperical evaluation on the specific system etc.