An HTML attachment was scrubbed... URL: http://sos.ece.cmu.edu/pipermail/simflex/attachments/20051108/28a2221b/attachment.html From shanlu at cs.uiuc.edu Tue Nov 8 00:46:55 2005 From: shanlu at cs.uiuc.edu (shan) List-Post: [email protected] Date: Tue Nov 8 00:47:40 2005 Subject: [Simflex] Re: Debugging message In-Reply-To: <[email protected]> Message-ID: <[email protected]>
Maybe I can help to answer some of the questions :-), after have bothered Tom and this mail-list so much recently. Q2: LOAD/STORE request means request issued by cpu to (L1)cache; read/write requests are issued by cache to lower level cache. Fetch is instruction fetch. Q3: I added some DBG instruction in the file SimicsTracer.hpp, function real_hier_operate, under the "Case 3" if (is_fetch), so that every fetched instruction is shown. However, I guess Tom should have better solution. Q4: I think L2 is shared. A MOSI based directory is kept in each L2 cache line. Each L2 cache line records the current state of this cache line and who is the owner, who is the sharer of this line. That's all that I know. :-) _____ From: [email protected] [mailto:[email protected]] On Behalf Of lu peng Sent: Monday, November 07, 2005 9:42 PM To: [email protected] Cc: [email protected] Subject: [Simflex] Re: Debugging message Hi Tom, Thanks for your reply. I forgot to copy the generated .so file to the simics's directory. <http://graphics.hotmail.com/i.p.emcrook.gif> . Now it's ok to show my debug messages. I am trying to understand the cache structure of Simflex. I read the ISCA 00 paper for Piranha protocol. Serveral questions: 1. It said they didn't have inclusion property. Does simflex have this property in the CMPFlex model? 2. I noticed that there are many types of requests in the debug info: MemoryMessage[Write Request], MemoryMessage[Read Request], MemoryMessage[Fetch Request], MemoryMessage[Load Request], MemoryMessage[Store Request], etc. Are they just the synonyms of load/store or have specific meanings? Seems Fetch request is for instruction cache read. 3. I tried to trace some addresses. However, sometime I can't find which cpu generated the requests at the begining. Only I can find that the first appearance is from, e. g., (CacheController.hpp: 1045){290}- sendBack_Request. Why? Did the cpu write a virtual address before put a request into the request queue and it came back with a physical address? Does {290} mean the number of the bus cycle? 4. Dose the CMFFlex implement a shared L2 cache? If so, I am trying to implement a NUCA cache based on it. Do I need modify anything related Piranha protocol? It seems only maintain the coherence for L1 caches. My plan is to find the L2 read/write source code and let all L2 read/write operations to search a directory, which maps the phycial address of a cache block to its real position. Then read/write the mapped banks and get the latency. Is this enough? Seems that it's not necessary to modify the source code in PiranhaCacheControllerImpl::performOperation(). Am I correct? 5. Does Simflex support a pure private L2 cache scheme? 6. If you have more doc describing the source code, especially for the cache structure, please let me read it. Thanks a lot, Lu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://sos.ece.cmu.edu/pipermail/simflex/attachments/20051107/0cc58fee/attachment.html From twenisch at ece.cmu.edu Wed Nov 9 18:31:56 2005 From: twenisch at ece.cmu.edu (Thomas Wenisch) List-Post: [email protected] Date: Wed Nov 9 18:31:32 2005 Subject: [SimFlex]x86 tracer init bug? and other problems In-Reply-To: <[email protected]> References: <[email protected]> Message-ID: <pine.lnx.4.53l-ece.cmu.edu.0511091536030.22...@dalmore.ece.cmu.edu> Hi Shan, Sorry for the slow reply - I have been busy preparing the Flexus 2.0 release (which should be out later today). On Mon, 7 Nov 2005, shan wrote: > Hi Tom, > > I have several problems. > > Second, however, I still have that 'watchdog' assertion violation problem. > If I disable that checking, my program can proceed and looks ok (till some > point). I have checked the trace and I feel a little bit confused. My > program, at that time, has 2 threads. Definitely, there would be 2 idle > processors and they would definitely stall longer than the watchdog > threshold. Is my understanding somehow incorrect? We talked about this issue here. You are right that given the way the simulator currently works, you will always end up with watchdog timeouts if one of your CPUs is idle. I believe the best fix is to place a cap on the maximum number of stall cycles Flexus will ever return to Simics. The cap should be larger than any real stall the memory system might produce. This fix is fairly accurate from the modeling point of view, and is a general solution to the unbounded stall issue. An alternative fix would be to create a special case for the HLT instruction, and not accumulate stalls for HLTs. Although this fix may be more accurate, it is less general, as there may be other special cases where Flexus can accumulate an unbounded number of stalls, and these cases will invariably result in deadlocks. To make the fix, add a test on the number of stall cycles in SimicsTracer.hpp:trace_mem_hier_operate(). 5000 cycles is probably a decent cap - I can't think of any way the memory system could legitimately stall an access for more than 5000 cycles. I am putting this fix into our codebase, so you will be able to pick it up in the release later today. > > > > Third, everytime I load flexus-CMPFlexus-x86... module, it succeeds at about > one third possibility and the simics would get segment fault in other two > third cases. I have checked the source file and it seems that there maybe a > bug in SimicsTrace.hpp line 118~line125 (init function). In the code, in > sparc version, an 'thePhysIO' object would first be read by > SIM_get_attribute. However, in x86 version, it seems that thePhysIO is used > without initialization. I am not sure if my understanding is correct. I have > tried to include the whole part about thePhysIO into the sparc version, not > handling PhysIO at all in x86 version. I am not sure if my modification is > correct. After I did this, at least flex-CMPFlexus-x86 module loading will > always succeed. But, I don't know if not handling PhysIO would cause other > problem. > thePhysIO is not used for x86. I moved the rest of the code that uses it into FLEXUS_TARGET_IS(v9) blocks, which should hopefully resolve the seg-fault. > > > Fourth, after comment the watchdog assertion. I tried my toy code. It can > run for some while, however, later it will never terminate. (I did not make > the modification described in Third issue in these experiments) > > In one case, I find it has successfully go through all code, maybe right > before the whole program's 'return 0'. But it just stalls there and the > executor keeps issuing one exactly the same memory request. The memory > request is replied by memory correctly, but the executor just keeps issuing > the same memory request. (I attached the conf file and debug.out as > debugnov7_joinedfinish.** for this). > > In another case, it stops at some ealier stage (I am not sure exactly > where in the source code). This time, the executor keeps issuing a fixed set > of memory request in turn. Just like an infinite loop. (I attached the conf > file and debug.out as debugnov7_oldx86deadloop.** for this too). > I am not sure about these infinite loops. These could be the idle loop of the OS after your application has finished. I would take a look at the PCs that Simics is feeding in during these loops, and see if they are OS code. You can probably modify some of the debug code in SimicsTracer to print out the virtual PCs. > > > Tom, when you have time, can you help me to look at these? Is my PhysIO > modification correct? what's wrong with that infinite loop? > > Sorry to bring you trouble again. :-) > > thaks very much > > shan > > From twenisch at ece.cmu.edu Wed Nov 9 18:53:06 2005 From: twenisch at ece.cmu.edu (Thomas Wenisch) List-Post: [email protected] Date: Wed Nov 9 18:52:39 2005 Subject: [Simflex] Re: Debugging message In-Reply-To: <[email protected]> References: <[email protected]> Message-ID: <pine.lnx.4.53l-ece.cmu.edu.0511091832110.4...@dalmore.ece.cmu.edu> Hi Lu, On Tue, 8 Nov 2005, lu peng wrote: > > Hi Tom, > > Thanks for your reply. I forgot to copy the generated .so file to the > simics's directory. [i.p.emcrook.gif] . Now it's ok to show my debug messages. > > I am trying to understand the cache structure of Simflex. I read the ISCA 00 > paper for Piranha protocol. Serveral questions: > > 1. It said they didn't have inclusion property. Does simflex have this > property in the CMPFlex model? Just as in Piranha, the SimFlex CMP-cache does not maintain inclusion. The uni-processor/DSM caches also do not maintain inclusion. > > 2. I noticed that there are many types of requests in the debug info: > MemoryMessage[Write Request], MemoryMessage[Read Request], > MemoryMessage[Fetch Request], > MemoryMessage[Load Request], MemoryMessage[Store Request], etc. Are they just > the synonyms of load/store or have specific meanings? Seems Fetch request is > for > instruction cache read. As Shan said, Load and Store are issued by the processor, Read and Write are issued by a cache. There is also an Upgrade - A cache issues an Upgrade (instead of a Write) if it has valid data for a line, but does not have write permission. Fetch indicates an instruction as you guessed. > > 3. I tried to trace some addresses. However, sometime I can't find which cpu > generated the requests at the begining. Only I can find that the first > appearance is > from, e. g., (CacheController.hpp: 1045){290}- sendBack_Request. Why? Did > the cpu write a virtual address before put a request into the request queue > and it came > back with a physical address? Does {290} mean the number of the bus cycle? Some debug statements do not include Comp(*this) or an equivalent to cause the debug system to include the name/id of the component issuing the request. Debug statements that lack the component reference would need to be individually fixed to print out their node id. All references within the memory heirarchy use physical addresses. Numbers in {} in debug output indicate the cycle number since the start of simulation. > > 4. Dose the CMFFlex implement a shared L2 cache? If so, I am trying to > implement a NUCA cache based on it. Do I need modify anything related Piranha > protocol? It > seems only maintain the coherence for L1 caches. My plan is to find the L2 > read/write source code and let all L2 read/write operations to search a > directory, which > maps the phycial address of a cache block to its real position. Then > read/write the mapped banks and get the latency. Is this enough? Seems that > it's not necessary > to modify the source code in PiranhaCacheControllerImpl::performOperation(). > Am I correct? CMPFlex uses a shared L2. I don't know if you need to change the Piranha protocol to implement NUCA. You may be able to modify calcDelay() in CacheController.hpp to change the latency of cache requests based on their type and the location you assign to them in the NUCA. This is probably the simplest change that will give you a pretty good model of a NUCA cache. > > 5. Does Simflex support a pure private L2 cache scheme? UniFlex ( and DSMFlex, which will be available in 2.0 ) use private L2 caches. > > 6. If you have more doc describing the source code, especially for the cache > structure, please let me read it. The slides from the SimFlex tutorial (which we are giving in Barcelona at MICRO on Saturday), will be available on the web later this evening. > > Thanks a lot, > > Lu > Regards, -Tom Wenisch > > > > From twenisch at ece.cmu.edu Wed Nov 9 18:55:15 2005 From: twenisch at ece.cmu.edu (Thomas Wenisch) List-Post: [email protected] Date: Wed Nov 9 18:54:45 2005 Subject: [Simflex] Re: Debugging message In-Reply-To: <[email protected]> References: <[email protected]> Message-ID: <pine.lnx.4.53l-ece.cmu.edu.0511091853570.4...@dalmore.ece.cmu.edu> Hi Shan, Thanks for replying to Lu's questions. There more people we can get to answer questions on the list, the faster the overall response time for everyone will be. Best Regards, -Tom Wenisch On Mon, 7 Nov 2005, shan wrote: > Maybe I can help to answer some of the questions :-), after have bothered > Tom and this mail-list so much recently. > From shanlu at cs.uiuc.edu Wed Nov 9 21:18:03 2005 From: shanlu at cs.uiuc.edu (shan) List-Post: [email protected] Date: Wed Nov 9 22:17:34 2005 Subject: [Simflex] will speculative memory accesses be simulated? In-Reply-To: <pine.lnx.4.53l-ece.cmu.edu.0511091853570.4...@dalmore.ece.cmu.edu> Message-ID: <[email protected]> Hi Tom, I have a question about branch misprediction simulation. How will the branch mis-prediction time be charged? I guess at each branch instruction, we know whether this one will be mis-predicted or not. If we know our prediction will be wrong, will the simulation proceeds along the wrong direction and speculatively fetch data from cache? A related question is: will Simics gives flexus speculative executing instructions? or the 'feeder' actually only gets always correct instructions from simics? Thanks Shan
