On 9/22/06, George Woltman <[EMAIL PROTECTED]> wrote: > > At 09:54 AM 9/14/2006, George Woltman wrote: > >1) Give each thread a contiguous set of data blocks to crunch through. > >Thread 1 gets blocks 0 - 127, thread 2 gets 128 - 255, etc. > > > >2) Have the 8 threads start on data blocks 0-7. Put a lock around the > >carry code such that it insures that data blocks are processed in order. > > > >I lean toward the second solution. Does anyone have any real-world > >experiences or alternative suggestions? > > Upon further reflection, I think the first solution is the only attractive > one. > Both methods should have the same theoretical throughput, but when there > isn't one CPU for each thread, then the first solution is better. Here's > why: > > Say there are 7 available CPUs for 8 threads. Threads 0 to 6 start > processing > blocks 0-6 and prefetching blocks 8-14. Now all 7 threads stall and the > thread processing block 7 is run destroying all that nice prefetching we > did.
Would it be possible to detect that one thread is finishing much slower than the others and rearange the work among fewer threads? _______________________________________________ Prime mailing list [email protected] http://hogranch.com/mailman/listinfo/prime
