On Wednesday 23 March 2005 00:25, John R Pierce wrote:
>
> There's also a series of 128 bit 'media' instructions, one of particular
> interest, PMULUDQ, does two 32x32->64bit multiplies at once, generating a
> pair of 64bit values in a 128 bit 'media register'  (of which there is a
> flat file of 16 128bit registers)
>
> MULPD does a pair of 64bit floating multiplies in one instruction.  unclear
> what the pipeline timing on these is.

Looks similar in performance potential to SSE2. Which we already have in 
32-bit P4 family processors.
>
> ok, the 128bit version of PMULUDQ has a 4 clock latency and can execute
> every other clock, so a 2.4Ghz am64 can, in theory, execute 2.4 BILLION
> 64x64->128 bit integer multiplies per second.

Umm, depending on the pipleining - if you get 2 instructions in 2 clocks what 
you say is right, if you need the latency to prepare the execution units then 
you get 2 instructions in 6 clocks :(

> 64x64->128 bit integer multiplies per second.   4 of these does a complete
> 128x128->256 bit, 16 does 256x256->512 bit, etc etc.

Not quite. You need 4 N-bit add-with-carry instructions as well as 4 
N-bit-by-N-bit multiplies to emulate a 2N-bit-by-2N-bit multiply. Though if 
multiplies are expensive you can regroup as 6 ADCs and only 3 MULs. In any 
case as numbers get bigger you are going to have to work the address space 
used to store inputs & output much harder than FFT needs to.
>
> tell me thats not better than the FPU stuff where there's rounding problems

32,000,000 bit x 32,000,000 bit -> 64,000,000 bit - even by your logic, which 
I think is _very_ optimistic - will take 500,000 x 500,000 = 250,000,000,000 
cycles i.e. approximately 100 seconds per iteration - without the overhead of 
reducing mod 2^p-1 or the (completly trivial) subtraction of 2 from the 
result.

Problem is that "schoolboy long multiplication" is O(n^2). Improved 
integer algorithms which don't use FFT can improve that markedly - e.g. 
Toom-Cook which is O(n.2^sqrt(2 log n).log n). FFT is O(n.log n) - that's why 
it's used!

Regards
Brian Beesley

_______________________________________________
Prime mailing list
[email protected]
http://hogranch.com/mailman/listinfo/prime

Reply via email to