Re: [Prime] Multi-threaded carry propagation

Jason Papadopoulos Fri, 15 Sep 2006 02:55:16 -0700

Quoting Brian Beesley <[EMAIL PROTECTED]>:

> There's another effect here which might get overlooked - and which 
> Either way there's a bottleneck which is at the very least potential -
> between L1 and L2 in Intel systems, and between L2 and RAM in AMD 
> systems. I believe this bottleneck may dominate threaded LL test 
> performance, just as it does in single CPU systems when the memory 
> bandwidth is less than the CPU may demand.


Yes, this will probably be critical in a multithreaded LL squaring. The
only way around it is to arrange the code so that only a few cores are
hitting main memory heavily, while the other cores handling the other
chunks of the FFT already have their datasets in cache and crunch away
on them. 

If memory serves, prime95 already almost completely overlaps main memory 
latency with useful work, so one possibly more straightforward way to
multithread the FFT is to add a third, horribly complicated pass where
all the fine-granularity multiprocessing and nonlocal memory access takes 
place, then recurse to a largely intact version of the current code once
things fit within a given size working set. It will probably only be a win
if the third pass does a significant fraction (~30% :) of the total work.

jasonp

------------------------------------------------------
This message was sent using BOO.net's Webmail.
http://www.boo.net/
_______________________________________________
Prime mailing list
[email protected]
http://hogranch.com/mailman/listinfo/prime

Re: [Prime] Multi-threaded carry propagation

Reply via email to