Dear all,

we (Simon Schulze and I) are heavily working on a cache coherent version
of mor1kx at the moment. We are more or less done with the basic
snooping version for bus-based systems. Next step is the integration
with our directory-based L2-cache coherency.

As we are about to push this stuff to my github repository next days I
am still concerned about openrisc/mor1kx and our mor1kx diverging too much.

Precisely, the most crucial problem is the LSU:
 * The LSU contains a store buffer. A data write pushes data to the
store buffer and writes it to the cache.
 * The store buffer can be problematic with regards to coherency and
(yet openrisc undefined) consistency model. You can imagine a delayed
write can overwrite a concurrent write on the same cache block etc.
 * This can be no problem if the consistency model allows, but the cache
should not think its in modified state until really sure (can be a pain
in the *** as you will find out sometime later)
 * There are of course various ways to avoid this in the current setup,
but they become arbitrary complex one you think them through
 * In the naive (and most transparent) implementation, the cache
performs the writes itself. It first accesses the bus and then updates
the tag memory when the write was successfull. This is what we did.

We have excessively been thinking about this problem and then
unfortunately removed the store buffer and most of the (honestly:
confusing) wiring in the LSU.

While we pack it up for a first version now I see two realistic options
to not diverge (i.e., allow for FEATURE_MULTICORE-based common modules):

1. Have two cappucino lsu+cache implementations which are instantiated
based on the activated multicore feature. Alternatively this may be done
in the modules with a massive number of "generate if(FEATURE_MULTICORE)
... else ... endgenerate"/"assign ... = !FEATURE_MULTICORE ? .. : ..."
etc., where I definitely would prefer the first..

2. We move the LSU behind the cache, so that we have a linear chain
CPU->MMU->Cache->SB->Bus-IF. This may cost an extra cycle here and there
as the way to the store buffer may cross a register or so. The clear
advantage is that LSU and Cache can stay common for baseline and
multicore variant.

From a pragmatic point of view the first one seems the easiest. From a
non-divergent standpoint the second might be better.

What are your opinions? We would do the work, but for option 2 we of
course would like to see the path to upstream. So if you don't see a
chance for this change, we will stick to option 1 what is also perfectly
fine with us.

Bye,
Stefan


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
OpenRISC mailing list
[email protected]
http://lists.openrisc.net/listinfo/openrisc

Reply via email to