On Sun, May 6, 2012 at 9:09 PM, Pekon Gupta <[email protected]> wrote: > >> >> > What Julius and me are talking about here is the cache itself >> >> > containing some init stage (in hw) which would loop through the >> >> > cache tags and invalidate them on reset. >> > >> > >> > [Pekon]: In above scenario (H/W based invalidation of caches at reset), >> > Do >> > you plan to stall the cpu, till your H/W loop is complete ? >> >> A very pertinent question. The answer is that of course you have to. >> >> Again, I reiterate that I'm against putting in FSMs to do stuff which >> is not strictly necessary. It really provides no performance gain >> (aside from skipping an init loop at reset, a period during which we >> can live with a bit of delay and not the focus of optimisation) and >> adds complexity. > > > [Pekon]: One feedback here.. > > [S/W invalidation of cache] In most of the systems, the CPU clock speeds are > very slow, just after reset, as clocks are "directly" sourced usually from > some oscillator (crustal or RC). However, once CPU is out from reset, it > programs the PLL(s), increases its clock speeds by multiple folds, and then > invalidates the cache using S/W loop. > Example: invalidating 1024 cache lines @ 100MHz(10ns time-period) might take > 1024x10ns=10240ns=10.24us (micro sec) > (considering back-2-back WRITES with minimum latency, as L1 caches are at > least latency from CPU) > > [H/W invalidation of cache]: As your H/W FSM would run before CPU boots and > programs PLL, so your H/W FSM would still be running on slow oscillator > clock. Thus, it might take more time to invalidate than S/W. > Example: invalidating 1024 cache lines @ 25MHz(40ns time-period) might take > 1024x40ns=40960ns=40.96us (micro sec)
Good point. > > So, i think its better if H/W FSM is controllable by a "self-clearing" > Software bit in Cache Registers. > Example: write to a bit starts ur FSM, and the bit gets automatically > cleared when cache is invalidated. > I think this would be more useful, as: > (a) S/W has more control on when to initiate invalidation of caches, like > after programming PLLs > (b) Frees CPU to execute next instruction in pipeline, while H/W FSM is > invalidating caches in parallel. (b) is probably only true for data cache, as for instruction cache you'll be immediately putting a request to it for the next instruction (and then presumably must wait for the invalidation FSM to complete.) I'm still not convinced this is a feature we want. It's needless complication of the hardware implementation for marginal gain. Cheers Julius _______________________________________________ OpenRISC mailing list [email protected] http://lists.openrisc.net/listinfo/openrisc
