Re: Pipeline question

2011-08-15 Thread Fred van der Windt
I need 27 instructions (maybe more because of two JO after TROT) see end of this post. But I am sure this code is faster than what was used before. The simple relation 256 input bytes processed by 27 instructions to create 192 bytes - should speed up the whole precess by a factor (even if

Re: Pipeline question

2011-08-15 Thread Martin Trübner
Fred, The code is more than 4 times slower than the original 'process 8 bytes using RISBG' variant. Oups- memory accesses are killing you. I do everything in less than 1K (plus TR-tables read) so or the TR performance. After I had completed the code I did stumble over a some

Re: Pipeline question

2011-08-15 Thread Blaicher, Chris
L1 cache should keep up with the processor. It is 128K or 512 lines of L1 cache, so if the data you are working on will fit in L1 and not roll it, then that should not be your problem. L1.5 is about 8 times slower than L1, if memory serves me correctly. The other thing to consider is the

Re: Pipeline question

2011-08-15 Thread David Bond
On Mon, 15 Aug 2011 09:43:37 -0500, Blaicher, Chris wrote: L1.5 is about 8 times slower than L1, if memory serves me correctly. There is essentially no delay in fetching from L1 cache. The 1 cycle that it takes is included in the execution cycle. The L1.5 cache delay is about 13 cycles. The L2

AUTO: paul mallett is out of the office (returning 18/08/2011)

2011-08-15 Thread paul mallett
I am out of the office until 18/08/2011. For PD Tools Issues please contact Adrian Simcock. For other Icing products please contact John Gallagher. Note: This is an automated response to your message Re: Pipeline question sent on 16/8/11 0:43:37. This is the only notification you will receive