RE: NUCA Cache implementation in SimFlex 4

el06041 Sun, 27 Mar 2011 08:58:10 -0700

Hello and thanks for your response.

Following your guidelines, I have been examining during the past fewdays the source code, of the simulators and the actual components aswell, so as to get a more in-depth information of the implementations.

In particular, I have been examining the CMP.L2SharedNUCA.Inorder modelthat comes with flexus-4.0. For the rest of this e-mail, I will bereferring to this model


---------------------------------------------

First of all, I have designed a layout of the architecture, in tryingto understand the interconnection between the different components. Thisis based on the wiring.cpp file of the simulator and can be found here:


https://pithos.grnet.gr/pithos/rest/[email protected]/files/SimFlex/CMP.L2SharedNUCA.Inorder-layout.pdf

If I am correct:

* The Feeder provides the instructions

* The Fetcher fetches the instructions and forwards them to L1Instruction Cache and to BPWarm

* The BPWarm component must be the Branch Predictor

* The Execute component - very obviously - executes the instructionsand requests data from L1 Data Cache.


* L1 Instruction / Data cache components: obvious

* L2 cache: is the component I have to configure in order to implementa shared NUCA cache (correct?)* NetMapper: is the splitter that distributes requests among thecomponents

* Memory: must be the memory below L2 (i.e. RAM)

Have I understood the architecture correctly?
Moreover, what is the purpose of the "NIC" and "Network" components?

As I have seen:

* The Network component is an instance of a "NetShim/MemoryNetwork"component, though it is not very obvious what is its relevancy with theL2 Cache.

* Concerning the NIC, I guess it must be a Network InterfaceController. I have taken a look in the MultiNic component folder, whereI've seen that it has multiple implementations: MultiNic1, MultiNic2,MultiNic3, MultiNic4 and a general MultiNicX, where it must hold generalimplementations.I have seen that the various implementations have been definingdifferent values for FLEXUS_MULTI_NIC_NUMPORTS: does it have to do withthe number of the components the NIC is connected with?


---------------------------------------------

On the L2 cache:

I have seen a sample configuration in wiring.cpp ofL2SharedNUCA.Inorder, as well asflexus-4.0/components/CMPCache/CMPCache.hpp.

In wiring.cpp, the parameter theL2Cfg.Cores.initialize(64) initializes64 cores. What are these cores and how are they related to the CPU coresor the 64 banks of the cache, which are initialized attheL2Cfg.Banks.initialize(64)?

What is more, I am trying to figure out where following issues aredefined:* The mapping between "CPU cores - L2 Banks", that is to which core iseach CPU mapped to.* Replacement/Migration policies. I have only noticed that thecoherence policy is inflexus-4.0/components/CMPCache/NonInclusiveMESIPolicy.cpp, if I amcorrect.

Finally, I have found in flexus-4.0/components/CMPCache/RTDirectory.hppthe following scheme:


Physical address layout:
+---------+--------------+------+-------------+----------------------------+

+---------+--------------+------+-------------+----------------------------+

|<------ setLowShift------->|

                                |<----------->|
                                   setMaskLow

Where I have not understood the purpose of the fields R Index High/Low.

I presume, by the way, that due to the presence "Bank" field, theplacement policy of the data in the corresponding NUCA Banks must bestatic,i.e. every block will be *initially* always placed in the same bank,according to its address.


---------------------------------------------

On the Network component:

As I have seen in the wiring.cpp file, the parameter"theNetworkCfg.NetworkTopologyFile.initialize()" selects the topologyfile that will be used for the network.

An example file is 16x3-mesh.topology (it can be initially found inL2ShadedNUCA.OoO folder), which defines a 4x4 grid of "switches", whereeach switch has 4 ports to interconnect with other switches and 3 portsthat connect the switch with nodes (so in total (4x4)x3 nodes).

I have understood the topology and the routing tables that are beingdefined, but I have not understood how these nodes and switches arerelated to the L2-NUCA cache, if there is any relationship at all.


---------------------------------------------


Thank you in advance for your help.
I will be glad to provide any additional information, you might need.

-George


On Wed, 23 Mar 2011 16:47:14 +0000, Djordje Jevdjic wrote:

Hello,

Thanks for your message.

Concerning your first question: yes, all the messages exchanged
through this list are in one of those archives. For technical reasons
we decided to split them into two separate archives (the old and the
new archive).

Regarding your second question: I don't think you need to implement
anything to have a NUCA simulator. NUCA systems have
been already implemented (actually, almost all simulators we use in
Flexus are NUCA simulators).
The ones you listed bellow (
flexus-4.0/simulators/CMP.L2SharedNUCA.Inorder and
flexus-4.0/simulators/CMP.L2SharedNUCA.OoO)
are examples of such architectures with a shared and tiled L2 cache.
So, things have already been implemented, there's no need to
reimplement it.

However, if you are interested to know more details of the
implementations, you can look at the source code and find some useful
comments there.
If you are examining the source code, it's a good idea to look at the
code of individual components included in the simulator, not the
simulator directory itself.
You might also want to check the getting started guide on our
website. Besides that and the Simflex publications, we don't maintain
any further documentation.

Also, keep in mind that the current version of Flexus works only with
Simics 3. Whatever you try to do with Simics 4 highly likely will not
work.
We are planning to move to Simics 4 soon.

Regards,
Djordje

RE: NUCA Cache implementation in SimFlex 4

Reply via email to