[Simflex] Private L2 caches design

Lide Duan Fri Mar 2 01:02:58 2007

Hi Jared,

Thank you for your reply!


I read through the source code of the components used in DSMFlex, trying to
catch a main idea of this simulator. However, I still have some questions.

1. How can I run this simulator? I copied the generated .so file from
simulators/DSMFlex/ to SIMICS_ROOT/x86-linux/lib/ , just as what I did with
CMPFlex. I could see the new module flexus-DSMFlex-v9-iface-gcc when
"list-modules" in simics console, but if I load this module into a check
point and run the simulated machine, it showed as blow:

FATAL ERROR: No such file or directory (2): rxx_cntl: failed to open
microcode file he.rom***  Simics getting shaky, switching to 'safe' mode.
Simics (main thread) received a segmentation fault. Will try to recuperate.

Seems that I need to copy a microcode file into some simics directory due to
the usage of Protocol Engines in this simulator. Where should I place this
microcode file "he.rom"? Besides, do I need to modify the simics boot script
because of the new memory structure?

2. You mentioned that I can re-tune some parameters of DSMFlex to get the
two levels of private cache. Certainly, each core in DSMFlex has its own
private L1 and L2, but the memory is also distributed, different from the
shared memory in CMPFlex. Actually, I want to implement a CMP cache system
which has private L1 and private L2 caches but a shared memory to all cores,
just like CMPFlex. If so, can I still use the DSMFlex? The Local
Engine/Protocol Engine/Directory used in DSMFlex make a directory for each
block in the memory, responsible for the coherence of the distributed
memories. But I think I need to maintain the coherence of the private L2
caches in my implementation. What do you think?

Thanks,
Lide

On 2/27/07, Jared C. Smolens <[email protected]> wrote:
>
>
> Hi Lide,
>
> If you only want to have two levels of cache (L1 & L2, both private to
> each
> core and no shared cache), you might actually be able to use the DSMFlex
> simulators, after re-tuning for on-chip CMP latencies/bandwidth.
>
> The Cache/CmpCache components are used for "timing" simulations, whereas
> the TraceFlex simulator's Fast* components are for "functional"
> simulations
> (where all cache transactions are atomic and have zero latency).  If you
> want correct coherence with timing, you will have to use the
> Cache/CmpCaches.
>
> 1. The snoop/request channels exist to prevent races between requests and
> acknowledgements which can occur in timing simulations.  The "snoop"
> channel is a high priority channel for acknowledgement and eviction
> messages, while the request channel sends request messages.   Prioritizing
> the snoop channel allows older requests to complete before starting new
> ones, avoiding deadlock scenarios.
>
> The Fast components have no concurrency and, therefore, don't need these
> channels.  Their implementation is also far simpler because of this.
>
> 2. I'm not sure on this one.
>
> 3. We have found that DMA and non-allocating writes are important for
> correctly modeling cache behaviors of I/O-intensive commercial workloads.
>
> - Jared
>
> Excerpts From "Lide Duan" <[email protected]>:
> [Simflex] Private L2 caches design: "Lide Duan" <[email protected]>
> >Hi all,
> >
> >I am trying to implement a two level CMP cache design, which has private
> L1
> >and private L2 caches, based on the components provided by Flexus. The
> >existing simulator CMPFlex has private L1 cache (Cache component) and
> shared
> >L2 cache (CmpCache component), both having cache contorllers but
> different
> >cache controller implementations. In this case, The shared L2 cache is
> >responsible for the coherence among different private L1 caches. I have
> read
> >the souce codes in these components, and I think I can use the Cache
> >component as my private L2 cache if only modifying the ports to connnect
> to
> >the L1 caches in the front side and the shared bus in the back side.
> >However, how can I maintain the coherence of the private L2 caches? I
> >noticed that the TraceFlex has the same structure as I desired, and it
> uses
> >FastBus component as the interconnection to the different L2 caches to
> >perform the coherence. So I intended to focus on Fastbus rather than
> >CmpCache.
> >
> >1. In CMPFlex, each Cache component has three ports (Request, Snoop, Out)
> in
> >both front and back sides, but FastBus has only two ports (FromCaches,
> >ToSnoops) in front side. How can I connect them? or What are the main
> >functions of the various ports, respectively?
> >2. In TraceFlex, Fastbus isn't connected to the memory, so the back side
> >ports (Writes, Reads, Evictions, etc.) are not used, right? Why is that?
> >3. There are also two ports (DMA, NonAllocateWrite) in FastBus connected
> to
> >the feeder. What are they used for? Do I need to use them in my
> >implementation?
> >Most likely, I will implement a new component as an external shared bus
> >connected to the L2 caches, just like what FastBus does in TraceFlex. But
> I
> >am worrying about the correctness of the coherence. Do you have any
> >suggestion to simplify the implementation?
> >
> >Any help would be appreciated!
> >
> >Regards,
> >Lide
>
> _______________________________________________
> SimFlex mailing list
> [email protected]
> https://sos.ece.cmu.edu/mailman/listinfo/simflex
> SimFlex web page: http://www.ece.cmu.edu/~simflex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://sos.ece.cmu.edu/pipermail/simflex/attachments/20070302/97241c56/attachment.html
From jsmolens+ at ece.cmu.edu  Fri Mar  2 12:22:38 2007
From: jsmolens+ at ece.cmu.edu (Jared C. Smolens)
List-Post: [email protected]
Date: Fri Mar  2 12:22:42 2007
Subject: [Simflex] Private L2 caches design
Message-ID: <037609823.1172856...@miura>


Hi Lide,

1. The *.rom files should be copied into the current working directory in 
which you start simics.   I don't think you should have to change the 
simics startup scripts to make this work.

2. By "shared memory" do you want to model a shared cache or simply shared 
DRAM?

A shared L3 cache will require some effort, because the CmpCache's 
coherence protocol currently assumes a single level of cache above it.  One 
way to correct this is to guarantee inclusion between the L1 and L2 (this 
is not enforced by the Cache component, but it's possible to change).  

Alternatively, if you only need DRAM with a constant distance from the 
cores, you could remove or disable the network component of the DSM 
simulator.  If you keep the directories and protocol engine, you should be 
able to model a CMP with cores/L1/L2 + some on-chip coherent network + 
external memory.

- Jared

Excerpts From "Lide Duan" <[email protected]>:
 Re: [Simflex] Private L2 caches des: "Lide Duan" <[email protected]>
>Hi Jared,
>
>Thank you for your reply!
>
>I read through the source code of the components used in DSMFlex, trying 
to
>catch a main idea of this simulator. However, I still have some questions.
>
>1. How can I run this simulator? I copied the generated .so file from
>simulators/DSMFlex/ to SIMICS_ROOT/x86-linux/lib/ , just as what I did 
with
>CMPFlex. I could see the new module flexus-DSMFlex-v9-iface-gcc when
>"list-modules" in simics console, but if I load this module into a check
>point and run the simulated machine, it showed as blow:
>
>FATAL ERROR: No such file or directory (2): rxx_cntl: failed to open
>microcode file he.rom***  Simics getting shaky, switching to 'safe' mode.
>Simics (main thread) received a segmentation fault. Will try to 
recuperate.
>
>Seems that I need to copy a microcode file into some simics directory due 
to
>the usage of Protocol Engines in this simulator. Where should I place this
>microcode file "he.rom"? Besides, do I need to modify the simics boot 
script
>because of the new memory structure?
>
>2. You mentioned that I can re-tune some parameters of DSMFlex to get the
>two levels of private cache. Certainly, each core in DSMFlex has its own
>private L1 and L2, but the memory is also distributed, different from the
>shared memory in CMPFlex. Actually, I want to implement a CMP cache system
>which has private L1 and private L2 caches but a shared memory to all 
cores,
>just like CMPFlex. If so, can I still use the DSMFlex? The Local
>Engine/Protocol Engine/Directory used in DSMFlex make a directory for each
>block in the memory, responsible for the coherence of the distributed
>memories. But I think I need to maintain the coherence of the private L2
>caches in my implementation. What do you think?
>
>Thanks,
>Lide
>
>On 2/27/07, Jared C. Smolens <[email protected]> wrote:
>>
>>
>> Hi Lide,
>>
>> If you only want to have two levels of cache (L1 & L2, both private to
>> each
>> core and no shared cache), you might actually be able to use the DSMFlex
>> simulators, after re-tuning for on-chip CMP latencies/bandwidth.
>>
>> The Cache/CmpCache components are used for "timing" simulations, whereas
>> the TraceFlex simulator's Fast* components are for "functional"
>> simulations
>> (where all cache transactions are atomic and have zero latency).  If you
>> want correct coherence with timing, you will have to use the
>> Cache/CmpCaches.
>>
>> 1. The snoop/request channels exist to prevent races between requests 
and
>> acknowledgements which can occur in timing simulations.  The "snoop"
>> channel is a high priority channel for acknowledgement and eviction
>> messages, while the request channel sends request messages.   
Prioritizing
>> the snoop channel allows older requests to complete before starting new
>> ones, avoiding deadlock scenarios.
>>
>> The Fast components have no concurrency and, therefore, don't need these
>> channels.  Their implementation is also far simpler because of this.
>>
>> 2. I'm not sure on this one.
>>
>> 3. We have found that DMA and non-allocating writes are important for
>> correctly modeling cache behaviors of I/O-intensive commercial 
workloads.
>>
>> - Jared
>>
>> Excerpts From "Lide Duan" <[email protected]>:
>> [Simflex] Private L2 caches design: "Lide Duan" <[email protected]>
>> >Hi all,
>> >
>> >I am trying to implement a two level CMP cache design, which has 
private
>> L1
>> >and private L2 caches, based on the components provided by Flexus. The
>> >existing simulator CMPFlex has private L1 cache (Cache component) and
>> shared
>> >L2 cache (CmpCache component), both having cache contorllers but
>> different
>> >cache controller implementations. In this case, The shared L2 cache is
>> >responsible for the coherence among different private L1 caches. I have
>> read
>> >the souce codes in these components, and I think I can use the Cache
>> >component as my private L2 cache if only modifying the ports to 
connnect
>> to
>> >the L1 caches in the front side and the shared bus in the back side.
>> >However, how can I maintain the coherence of the private L2 caches? I
>> >noticed that the TraceFlex has the same structure as I desired, and it
>> uses
>> >FastBus component as the interconnection to the different L2 caches to
>> >perform the coherence. So I intended to focus on Fastbus rather than
>> >CmpCache.
>> >
>> >1. In CMPFlex, each Cache component has three ports (Request, Snoop, 
Out)
>> in
>> >both front and back sides, but FastBus has only two ports (FromCaches,
>> >ToSnoops) in front side. How can I connect them? or What are the main
>> >functions of the various ports, respectively?
>> >2. In TraceFlex, Fastbus isn't connected to the memory, so the back 
side
>> >ports (Writes, Reads, Evictions, etc.) are not used, right? Why is 
that?
>> >3. There are also two ports (DMA, NonAllocateWrite) in FastBus 
connected
>> to
>> >the feeder. What are they used for? Do I need to use them in my
>> >implementation?
>> >Most likely, I will implement a new component as an external shared bus
>> >connected to the L2 caches, just like what FastBus does in TraceFlex. 
But
>> I
>> >am worrying about the correctness of the coherence. Do you have any
>> >suggestion to simplify the implementation?
>> >
>> >Any help would be appreciated!
>> >
>> >Regards,
>> >Lide
>>
>> _______________________________________________
>> SimFlex mailing list
>> [email protected]
>> https://sos.ece.cmu.edu/mailman/listinfo/simflex
>> SimFlex web page: http://www.ece.cmu.edu/~simflex
>>

[Simflex] Private L2 caches design

Reply via email to