Hi Jared, Thank you for your reply!
I read through the source code of the components used in DSMFlex, trying to catch a main idea of this simulator. However, I still have some questions. 1. How can I run this simulator? I copied the generated .so file from simulators/DSMFlex/ to SIMICS_ROOT/x86-linux/lib/ , just as what I did with CMPFlex. I could see the new module flexus-DSMFlex-v9-iface-gcc when "list-modules" in simics console, but if I load this module into a check point and run the simulated machine, it showed as blow: FATAL ERROR: No such file or directory (2): rxx_cntl: failed to open microcode file he.rom*** Simics getting shaky, switching to 'safe' mode. Simics (main thread) received a segmentation fault. Will try to recuperate. Seems that I need to copy a microcode file into some simics directory due to the usage of Protocol Engines in this simulator. Where should I place this microcode file "he.rom"? Besides, do I need to modify the simics boot script because of the new memory structure? 2. You mentioned that I can re-tune some parameters of DSMFlex to get the two levels of private cache. Certainly, each core in DSMFlex has its own private L1 and L2, but the memory is also distributed, different from the shared memory in CMPFlex. Actually, I want to implement a CMP cache system which has private L1 and private L2 caches but a shared memory to all cores, just like CMPFlex. If so, can I still use the DSMFlex? The Local Engine/Protocol Engine/Directory used in DSMFlex make a directory for each block in the memory, responsible for the coherence of the distributed memories. But I think I need to maintain the coherence of the private L2 caches in my implementation. What do you think? Thanks, Lide On 2/27/07, Jared C. Smolens <[email protected]> wrote: > > > Hi Lide, > > If you only want to have two levels of cache (L1 & L2, both private to > each > core and no shared cache), you might actually be able to use the DSMFlex > simulators, after re-tuning for on-chip CMP latencies/bandwidth. > > The Cache/CmpCache components are used for "timing" simulations, whereas > the TraceFlex simulator's Fast* components are for "functional" > simulations > (where all cache transactions are atomic and have zero latency). If you > want correct coherence with timing, you will have to use the > Cache/CmpCaches. > > 1. The snoop/request channels exist to prevent races between requests and > acknowledgements which can occur in timing simulations. The "snoop" > channel is a high priority channel for acknowledgement and eviction > messages, while the request channel sends request messages. Prioritizing > the snoop channel allows older requests to complete before starting new > ones, avoiding deadlock scenarios. > > The Fast components have no concurrency and, therefore, don't need these > channels. Their implementation is also far simpler because of this. > > 2. I'm not sure on this one. > > 3. We have found that DMA and non-allocating writes are important for > correctly modeling cache behaviors of I/O-intensive commercial workloads. > > - Jared > > Excerpts From "Lide Duan" <[email protected]>: > [Simflex] Private L2 caches design: "Lide Duan" <[email protected]> > >Hi all, > > > >I am trying to implement a two level CMP cache design, which has private > L1 > >and private L2 caches, based on the components provided by Flexus. The > >existing simulator CMPFlex has private L1 cache (Cache component) and > shared > >L2 cache (CmpCache component), both having cache contorllers but > different > >cache controller implementations. In this case, The shared L2 cache is > >responsible for the coherence among different private L1 caches. I have > read > >the souce codes in these components, and I think I can use the Cache > >component as my private L2 cache if only modifying the ports to connnect > to > >the L1 caches in the front side and the shared bus in the back side. > >However, how can I maintain the coherence of the private L2 caches? I > >noticed that the TraceFlex has the same structure as I desired, and it > uses > >FastBus component as the interconnection to the different L2 caches to > >perform the coherence. So I intended to focus on Fastbus rather than > >CmpCache. > > > >1. In CMPFlex, each Cache component has three ports (Request, Snoop, Out) > in > >both front and back sides, but FastBus has only two ports (FromCaches, > >ToSnoops) in front side. How can I connect them? or What are the main > >functions of the various ports, respectively? > >2. In TraceFlex, Fastbus isn't connected to the memory, so the back side > >ports (Writes, Reads, Evictions, etc.) are not used, right? Why is that? > >3. There are also two ports (DMA, NonAllocateWrite) in FastBus connected > to > >the feeder. What are they used for? Do I need to use them in my > >implementation? > >Most likely, I will implement a new component as an external shared bus > >connected to the L2 caches, just like what FastBus does in TraceFlex. But > I > >am worrying about the correctness of the coherence. Do you have any > >suggestion to simplify the implementation? > > > >Any help would be appreciated! > > > >Regards, > >Lide > > _______________________________________________ > SimFlex mailing list > [email protected] > https://sos.ece.cmu.edu/mailman/listinfo/simflex > SimFlex web page: http://www.ece.cmu.edu/~simflex > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://sos.ece.cmu.edu/pipermail/simflex/attachments/20070302/97241c56/attachment.html From jsmolens+ at ece.cmu.edu Fri Mar 2 12:22:38 2007 From: jsmolens+ at ece.cmu.edu (Jared C. Smolens) List-Post: [email protected] Date: Fri Mar 2 12:22:42 2007 Subject: [Simflex] Private L2 caches design Message-ID: <037609823.1172856...@miura> Hi Lide, 1. The *.rom files should be copied into the current working directory in which you start simics. I don't think you should have to change the simics startup scripts to make this work. 2. By "shared memory" do you want to model a shared cache or simply shared DRAM? A shared L3 cache will require some effort, because the CmpCache's coherence protocol currently assumes a single level of cache above it. One way to correct this is to guarantee inclusion between the L1 and L2 (this is not enforced by the Cache component, but it's possible to change). Alternatively, if you only need DRAM with a constant distance from the cores, you could remove or disable the network component of the DSM simulator. If you keep the directories and protocol engine, you should be able to model a CMP with cores/L1/L2 + some on-chip coherent network + external memory. - Jared Excerpts From "Lide Duan" <[email protected]>: Re: [Simflex] Private L2 caches des: "Lide Duan" <[email protected]> >Hi Jared, > >Thank you for your reply! > >I read through the source code of the components used in DSMFlex, trying to >catch a main idea of this simulator. However, I still have some questions. > >1. How can I run this simulator? I copied the generated .so file from >simulators/DSMFlex/ to SIMICS_ROOT/x86-linux/lib/ , just as what I did with >CMPFlex. I could see the new module flexus-DSMFlex-v9-iface-gcc when >"list-modules" in simics console, but if I load this module into a check >point and run the simulated machine, it showed as blow: > >FATAL ERROR: No such file or directory (2): rxx_cntl: failed to open >microcode file he.rom*** Simics getting shaky, switching to 'safe' mode. >Simics (main thread) received a segmentation fault. Will try to recuperate. > >Seems that I need to copy a microcode file into some simics directory due to >the usage of Protocol Engines in this simulator. Where should I place this >microcode file "he.rom"? Besides, do I need to modify the simics boot script >because of the new memory structure? > >2. You mentioned that I can re-tune some parameters of DSMFlex to get the >two levels of private cache. Certainly, each core in DSMFlex has its own >private L1 and L2, but the memory is also distributed, different from the >shared memory in CMPFlex. Actually, I want to implement a CMP cache system >which has private L1 and private L2 caches but a shared memory to all cores, >just like CMPFlex. If so, can I still use the DSMFlex? The Local >Engine/Protocol Engine/Directory used in DSMFlex make a directory for each >block in the memory, responsible for the coherence of the distributed >memories. But I think I need to maintain the coherence of the private L2 >caches in my implementation. What do you think? > >Thanks, >Lide > >On 2/27/07, Jared C. Smolens <[email protected]> wrote: >> >> >> Hi Lide, >> >> If you only want to have two levels of cache (L1 & L2, both private to >> each >> core and no shared cache), you might actually be able to use the DSMFlex >> simulators, after re-tuning for on-chip CMP latencies/bandwidth. >> >> The Cache/CmpCache components are used for "timing" simulations, whereas >> the TraceFlex simulator's Fast* components are for "functional" >> simulations >> (where all cache transactions are atomic and have zero latency). If you >> want correct coherence with timing, you will have to use the >> Cache/CmpCaches. >> >> 1. The snoop/request channels exist to prevent races between requests and >> acknowledgements which can occur in timing simulations. The "snoop" >> channel is a high priority channel for acknowledgement and eviction >> messages, while the request channel sends request messages. Prioritizing >> the snoop channel allows older requests to complete before starting new >> ones, avoiding deadlock scenarios. >> >> The Fast components have no concurrency and, therefore, don't need these >> channels. Their implementation is also far simpler because of this. >> >> 2. I'm not sure on this one. >> >> 3. We have found that DMA and non-allocating writes are important for >> correctly modeling cache behaviors of I/O-intensive commercial workloads. >> >> - Jared >> >> Excerpts From "Lide Duan" <[email protected]>: >> [Simflex] Private L2 caches design: "Lide Duan" <[email protected]> >> >Hi all, >> > >> >I am trying to implement a two level CMP cache design, which has private >> L1 >> >and private L2 caches, based on the components provided by Flexus. The >> >existing simulator CMPFlex has private L1 cache (Cache component) and >> shared >> >L2 cache (CmpCache component), both having cache contorllers but >> different >> >cache controller implementations. In this case, The shared L2 cache is >> >responsible for the coherence among different private L1 caches. I have >> read >> >the souce codes in these components, and I think I can use the Cache >> >component as my private L2 cache if only modifying the ports to connnect >> to >> >the L1 caches in the front side and the shared bus in the back side. >> >However, how can I maintain the coherence of the private L2 caches? I >> >noticed that the TraceFlex has the same structure as I desired, and it >> uses >> >FastBus component as the interconnection to the different L2 caches to >> >perform the coherence. So I intended to focus on Fastbus rather than >> >CmpCache. >> > >> >1. In CMPFlex, each Cache component has three ports (Request, Snoop, Out) >> in >> >both front and back sides, but FastBus has only two ports (FromCaches, >> >ToSnoops) in front side. How can I connect them? or What are the main >> >functions of the various ports, respectively? >> >2. In TraceFlex, Fastbus isn't connected to the memory, so the back side >> >ports (Writes, Reads, Evictions, etc.) are not used, right? Why is that? >> >3. There are also two ports (DMA, NonAllocateWrite) in FastBus connected >> to >> >the feeder. What are they used for? Do I need to use them in my >> >implementation? >> >Most likely, I will implement a new component as an external shared bus >> >connected to the L2 caches, just like what FastBus does in TraceFlex. But >> I >> >am worrying about the correctness of the coherence. Do you have any >> >suggestion to simplify the implementation? >> > >> >Any help would be appreciated! >> > >> >Regards, >> >Lide >> >> _______________________________________________ >> SimFlex mailing list >> [email protected] >> https://sos.ece.cmu.edu/mailman/listinfo/simflex >> SimFlex web page: http://www.ece.cmu.edu/~simflex >>
