Re: [Prime] Multi-threaded carry propagation

Guillermo Ballester Valor Thu, 14 Sep 2006 15:08:39 -0700

El Jueves, 14 de Septiembre de 2006 15:54, George Woltman escribió:
> At 12:52 AM 9/6/2006, Jason wrote:
> >| |     |****************|            | |     |****************|
> >|
> >|c|     | block i after  | propagate  |c|     | block i+1 after|
> >|a|     | itwiddle, IFFT | carries    |a|     | itwiddle, IFFT |
> >|r|-->  |    and IDWT    | for ea --> |r|-->  |    and IDWT    | --> ...
> >|r|     |    weights     | row        |r|     |    weights     |
> >|y|     |****************|            |y|     |****************|
> >
> >The point is that carries can be propagated while each block is being
> >processed, until you finish the last block. Then all the carries have
> >to "carriage return" to the next row, the last carry wraps around to
> >the first row like before, and the carry vector propagates 10 elements
> >into the first block
>
> This is the scheme prime95 currently uses - and as you noted it introduces
> a data dependence.  In pass 2, the blocks are truly independent and I can
> dole a data block to each thread as needed.  Very easy and efficient.
> In pass 1, with the data dependence introduced by carry propagation,
> I can see two choices (as an example say there are 1024 data blocks
> and 8 threads):
>
> 1)  Give each thread a contiguous set of data blocks to crunch through.
> Thread 1 gets blocks 0 - 127, thread 2 gets 128 - 255, etc.  Two major
> disadvantages:  a) if one CPU core is busy with something else then
> 7 threads finish and 6 then sit idle waiting for the 8th thread to finish.
> b) more memory for carry buffers and more overhead propagating final
> carries.
>
This is the schema that basically uses Glucas.  Lets see an example,


Asumme a first FFT radix 8,  in a 8192 array data (doubles) and 8 threads.

In last inverse FFT the separation of data are 1024 (in double units) .

It proceses the 8 complex data (last inverse FFT) . Then proceed to multilpy  
by inverse DWT factors, actually 2 steps because now are considered double. 
Then two carry propagation and the DWT again.  With the same 8 pairs of 
doubles, now taken as complex, it perform first direct FFT. 

Thread 0 will process data 0-127, 1024-1151, ...., 7158-7285
Thread 1 will process data 128-255, 1152-1280, ....,  
....
Thread 7 will process data 896-1023, ......, 8054-8191

Well, note that when a thread finishes its task, there are remaing carries 
that are not processed yet . The 8 carry arrays, (each of 8 doubles)  are 
processed as follows

the last carry array of thread 7 to double 0 , 1024 ... 7158 processed by 
thread 0
the last carry array of thread 0 to double 128, 1152, .... processed by thread 
1
...
the last carry array of thread 6 to double 896,  ... processed by thread 7

In this last carry process, the carries are multiplied by direct DWT factor, 
performed first direct FFT and the result ADDED to the array as said above.
 
Also note that it have to wait that all threads finished the first step to 
perform the last carry step. 

> 2)  Have the 8 threads start on data blocks 0-7.  Put a lock around the
> carry code such that it insures that data blocks are processed in order.
> The theory is that all 8 threads race to the carry code, but then the lock
> "spaces them apart" such that thread 1 does carries for block 0 and starts
> data block 9, then thread 2 starts carries for data block 1 and then starts
> data block 10.
> The major negative for this method is that there is a limit to the number
> of threads that can be kept busy.  If the code is spending 3 units of time
> in the FFT code and 1 unit of time in the carry code, then only 4 threads
> can be kept busy this way.
>
> I lean toward the second solution.  Does anyone have any real-world
> experiences or alternative suggestions?

I've not tried this second option.

Guillermo

>
> _______________________________________________
> Prime mailing list
> [email protected]
> http://hogranch.com/mailman/listinfo/prime

-- 
Guillermo Ballester Valor
[EMAIL PROTECTED]
Ogijares, Granada  SPAIN
Public GPG KEY http://www.oxixares.com/~gbv/pubgpg.html
 
_______________________________________________
Prime mailing list
[email protected]
http://hogranch.com/mailman/listinfo/prime

Re: [Prime] Multi-threaded carry propagation

Reply via email to