Not much advantage. Prime95 uses floating point arithmetic. 64-bit integer calculations (EM64T) will speed trial factoring but that's all.
well, the am64 mode also has additional registers, which can hugely help avoid extra load/store cycles in complex code.
actually, a fast 64x64->128 multiply would be awesome for LL testing, but I'm not sure they have this. Ok, it DOES have both signed and unsigned 64*64 bits => 128 bits. phew.
There's also a series of 128 bit 'media' instructions, one of particular interest, PMULUDQ, does two 32x32->64bit multiplies at once, generating a pair of 64bit values in a 128 bit 'media register' (of which there is a flat file of 16 128bit registers)
MULPD does a pair of 64bit floating multiplies in one instruction. unclear what the pipeline timing on these is.
ok, the 128bit version of PMULUDQ has a 4 clock latency and can execute every other clock, so a 2.4Ghz am64 can, in theory, execute 2.4 BILLION 64x64->128 bit integer multiplies per second. 4 of these does a complete 128x128->256 bit, 16 does 256x256->512 bit, etc etc.
tell me thats not better than the FPU stuff where there's rounding problems ? _______________________________________________ Prime mailing list [email protected] http://hogranch.com/mailman/listinfo/prime
