I need 27 instructions (maybe more because of two JO after TROT) see end of
this post.
But I am sure this code is faster than what was used before. The simple
relation 256 input bytes processed by 27 instructions to create 192 bytes -
should speed up the whole precess by a factor (even if
Fred,
The code is more than 4 times slower than the original 'process 8
bytes using RISBG' variant.
Oups-
memory accesses are killing you.
I do everything in less than 1K (plus TR-tables read) so
or the TR performance.
After I had completed the code I did stumble over a some
L1 cache should keep up with the processor. It is 128K or 512 lines of L1
cache, so if the data you are working on will fit in L1 and not roll it, then
that should not be your problem.
L1.5 is about 8 times slower than L1, if memory serves me correctly.
The other thing to consider is the
On Mon, 15 Aug 2011 09:43:37 -0500, Blaicher, Chris wrote:
L1.5 is about 8 times slower than L1, if memory serves me correctly.
There is essentially no delay in fetching from L1 cache. The 1 cycle that
it takes is included in the execution cycle. The L1.5 cache delay is about
13 cycles. The L2
I am out of the office until 18/08/2011.
For PD Tools Issues please contact Adrian Simcock. For other Icing products
please contact John Gallagher.
Note: This is an automated response to your message Re: Pipeline
question sent on 16/8/11 0:43:37.
This is the only notification you will receive