Quoting John R Pierce <[EMAIL PROTECTED]>:

> ok, the 128bit version of PMULUDQ has a 4 clock latency and can execute every
> other clock, so a 2.4Ghz am64 can, in theory, execute 2.4 BILLION 64x64->128
> bit integer multiplies per second.   4 of these does a complete 128x128->256
> bit, 16 does 256x256->512 bit, etc etc.
> 
> tell me thats not better than the FPU stuff where there's rounding problems?

That's not better than the FPU where there's rounding problems.

An FPU multiply is one instruction. Using integer multiplies in an FFT-like
setting requires several multiplies and several auxiliary operations. For
platforms with high-performance integer multiplication, the speed difference
is only a factor of 3-5. Multiply by 10 for platforms with crappy integer
multiply speed (of which there are many).

jasonp

------------------------------------------------------
This message was sent using BOO.net's Webmail.
http://www.boo.net/
_______________________________________________
Prime mailing list
[email protected]
http://hogranch.com/mailman/listinfo/prime

Reply via email to