Dear all, we (Simon Schulze and I) are heavily working on a cache coherent version of mor1kx at the moment. We are more or less done with the basic snooping version for bus-based systems. Next step is the integration with our directory-based L2-cache coherency.
As we are about to push this stuff to my github repository next days I am still concerned about openrisc/mor1kx and our mor1kx diverging too much. Precisely, the most crucial problem is the LSU: * The LSU contains a store buffer. A data write pushes data to the store buffer and writes it to the cache. * The store buffer can be problematic with regards to coherency and (yet openrisc undefined) consistency model. You can imagine a delayed write can overwrite a concurrent write on the same cache block etc. * This can be no problem if the consistency model allows, but the cache should not think its in modified state until really sure (can be a pain in the *** as you will find out sometime later) * There are of course various ways to avoid this in the current setup, but they become arbitrary complex one you think them through * In the naive (and most transparent) implementation, the cache performs the writes itself. It first accesses the bus and then updates the tag memory when the write was successfull. This is what we did. We have excessively been thinking about this problem and then unfortunately removed the store buffer and most of the (honestly: confusing) wiring in the LSU. While we pack it up for a first version now I see two realistic options to not diverge (i.e., allow for FEATURE_MULTICORE-based common modules): 1. Have two cappucino lsu+cache implementations which are instantiated based on the activated multicore feature. Alternatively this may be done in the modules with a massive number of "generate if(FEATURE_MULTICORE) ... else ... endgenerate"/"assign ... = !FEATURE_MULTICORE ? .. : ..." etc., where I definitely would prefer the first.. 2. We move the LSU behind the cache, so that we have a linear chain CPU->MMU->Cache->SB->Bus-IF. This may cost an extra cycle here and there as the way to the store buffer may cross a register or so. The clear advantage is that LSU and Cache can stay common for baseline and multicore variant. From a pragmatic point of view the first one seems the easiest. From a non-divergent standpoint the second might be better. What are your opinions? We would do the work, but for option 2 we of course would like to see the path to upstream. So if you don't see a chance for this change, we will stick to option 1 what is also perfectly fine with us. Bye, Stefan
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ OpenRISC mailing list [email protected] http://lists.openrisc.net/listinfo/openrisc
