Warren Nagourney wrote:
I found that the xlc compiler produces code that runs about 15%-20%  
faster than gcc. This is on a program with a moderate amount of  
single precision arithmetic. The interesting thing is that the same  
(scalar) code runs about 30% faster on the spu as it does on the ppu  
(with the same 15% advantage to xlc). Provided the code and data fit  
in the local store of the spu, it seems better to use the spu  for  
scalar code! Compared to a G4, the ppu is more or less equivalent to  
a 1.2 GHz processor and the spu is equivalent to a 1.7-2 GHz G4.

  
I'm not surprised that it is this fast with single-precision arithmetic, even as scalar code.  Do you have any data for double-precision? 

For single-precision arithmetic, the SPU can process a new value each clock tick (it's a 6-cycle calculation, but fully-pipeline-able).  For double-precision, not only does it take 13 cycles for the calculation, but it completely stalls the processor for the first 6 cycles!  It's almost as bad as a branch!

Jon
--
_______________________________________________
yellowdog-general mailing list
[email protected]
http://lists.terrasoftsolutions.com/mailman/listinfo/yellowdog-general
HINT: to Google archives, try  '<keywords> site:terrasoftsolutions.com'

Reply via email to