On 9/22/06, George Woltman <[EMAIL PROTECTED]> wrote:
>
> At 09:54 AM 9/14/2006, George Woltman wrote:
> >1)  Give each thread a contiguous set of data blocks to crunch through.
> >Thread 1 gets blocks 0 - 127, thread 2 gets 128 - 255, etc.
> >
> >2)  Have the 8 threads start on data blocks 0-7.  Put a lock around the
> >carry code such that it insures that data blocks are processed in order.
> >
> >I lean toward the second solution.  Does anyone have any real-world
> >experiences or alternative suggestions?
>
> Upon further reflection, I think the first solution is the only attractive
> one.
> Both methods should have the same theoretical throughput, but when there
> isn't one CPU for each thread, then the first solution is better.  Here's
> why:
>
> Say there are 7 available CPUs for 8 threads. Threads 0 to 6 start
> processing
> blocks 0-6 and prefetching blocks 8-14.  Now all 7 threads stall and the
> thread processing block 7 is run destroying all that nice prefetching we
> did.


Would it be possible to detect that one thread is finishing much slower than
the others and rearange the work among fewer threads?
_______________________________________________
Prime mailing list
[email protected]
http://hogranch.com/mailman/listinfo/prime

Reply via email to